The Memory Problem in AI
Current AI systems have no persistent memory. Each conversation starts fresh. Each query recomputes attention over its entire context. This fundamental limitation means:
- No learning across sessions: Every conversation reinvents the wheel
- Computational waste: Identical attention patterns recomputed endlessly
- Context constraints: Hard limits on what the model can “remember”
The ARMS Insight
The breakthrough is recognizing that AI memory works like spatial partitioning in game engines:
Traditional: State → Project → Index → Retrieve → Reconstruct
(lossy at each step)
ARMS: State → Store AT coordinates → Retrieve → Inject directly
(native representation preserved)
The Game Engine Principle
Game engines don’t check all objects for collision—they partition space (octree, BSP trees). ARMS doesn’t search all memories—it partitions attention space hierarchically.
Same principle. Same efficiency gains. Different domain.
Core Philosophy
The hierarchy IS the access pattern.
Position IS relationship.
The space IS the index.
ARMS is not a database. It’s a computational attention manifold—a hippocampus-inspired hierarchical container system.
Architecture
Hexagonal Design
ARMS uses hexagonal (ports and adapters) architecture with clean domain separation:
EXTERNAL WORLD
│
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ PyTorch │ │ Python │ │ CLI │
│ Adapter │ │ Client │ │ Adapter │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└───────────────────┼───────────────────┘
│
▼
┌────────────────────┐
│ CORE DOMAIN │
│ (Pure Logic) │
└─────────┬──────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ STORAGE │ │ INDEX │ │ LATENCY │
│ (Tiered) │ │ (4096D) │ │ (Probes) │
└──────────┘ └──────────┘ └──────────┘
Six Domains
-
CORE DOMAIN - Pure business logic, no I/O
- State representation
- Coordinate mathematics
- Hierarchy rules (chunk → doc → session → domain)
-
STORAGE DOMAIN - Tier management
- RAM / NVMe / Archive
- Promotion and eviction
- Memory mapping
-
INDEX DOMAIN - Spatial lookup
- 4096-dimensional indexing
- Temporal indexing
- Query execution
-
LATENCY DOMAIN - Runtime performance
- Tier latency probing
- Budget allocation
- Capacity tracking
-
INJECTION DOMAIN - Model integration
- PyTorch hook integration
- State format conversion
- Injection point selection
-
QUERY DOMAIN - External API
- Clean interface design
- Request validation
- Response formatting
The Container Model
ARMS stores attention states like Docker images—complete snapshots that can be loaded and run:
Container Hierarchy:
Level 0: Global (all memory)
Level 1: Domains (AI Research, DevOps, Business, etc.)
Level 2: Sessions (conversations within domain)
Level 3: Documents (logical groupings)
Level 4: Chunks (leaf nodes - actual attention states)
Each container stores:
- Centroid: 4096-dimensional mean of descendants
- Children: Pointers to child containers
- Timestamp: For temporal locality
- Metadata: Context, source, relationships
Validated Results: ARM Prototype
The ARM (Attention Reasoning Manifold) prototype validates the core ARMS concepts:
Key Metrics
| Metric | Value |
|---|---|
| Retrieval Accuracy | 100% |
| Compression Ratio | 5,372× |
| Cross-topic Similarity | -0.33 (excellent discrimination) |
| Scale Invariance | Proven across 50-400 token contexts |
How It Works
- ContrastiveARMEncoder - Projects hidden states to 4096-dim coordinates using contrastive learning (InfoNCE loss)
- HierarchicalMemory - Multi-level memory with chunking
- MultiScaleTrainer - Trains encoder on varied context lengths
- CoordinateStore - FAISS-based vector index
The Math
Attention states are projected to coordinates:
# Hidden state (768-dim) → Coordinate (4096-dim)
coordinate = encoder(hidden_state) # Learned projection
# Store at coordinate position
store.add(coordinate, attention_state)
# Retrieve by proximity
retrieved = store.nearest(query_coordinate, k=10)
Why This Works
Attention patterns have properties that make them ideal for coordinate storage:
- Sparse - ~90% of weights are prunable (less than 1% magnitude)
- Redundant - Similar queries produce similar patterns
- Cacheable - Patterns stable across related queries
- Compressible - 5,000× reduction with minimal loss
Design Principles
The .kkrieger Principle
The legendary 96KB video game that fits in a boot sector. Secret: store generators, not assets.
ARMS stores coordinate + metadata, not raw tensors. The coordinate IS the compressed representation.
Minimal Core: 5 Primitives
Point // Position in 4096-dim space
Id // Unique identifier (u128)
Blob // Opaque payload (attention state)
Proximity // Distance relationship
Merge // Composition operation
Everything else builds on these five primitives.
Implementation Status
Scaffold Complete (January 2026)
- 64 unit tests passing
- All 5 primitives implemented
- Trait contracts defined
- Memory and flat index adapters complete
Build Order
Phase 1: CORE (complete)
└── Pure logic, fully testable
Phase 2: PORTS (complete)
└── StoragePort, IndexPort, LatencyPort contracts
Phase 3: ADAPTERS (in progress)
├── RAM Storage Adapter ✓
├── NVMe Storage Adapter
├── Spatial Index Adapter ✓
└── System Probe Adapter
Phase 4: INBOUND PORTS
├── Query Port (Python bindings)
├── Injection Port (PyTorch integration)
└── Admin Port (CLI)
Phase 5: INTEGRATION
└── Wire everything, integration tests
Related Research
CMS: Computational Memory Snapshots
Extended validation showing:
- 25× context extension (4K model → 100K effective)
- 84.8% compute reduction
- 18.4× compression ratio
CIT: Compute Image Tokenizer
Experimental approach encoding attention as images:
- Vision transformers can “read” attention patterns
- 100-1000× compression potential
- Human-inspectable memory visualizations
Conclusion
ARMS demonstrates that AI memory can be spatial rather than sequential. By storing attention states at coordinate positions and exploiting hierarchical structure, we achieve:
- O(log n) retrieval with 100% accuracy
- 5,000× compression ratios
- Exact state restoration (no reconstruction loss)
- Cross-session persistence (true long-term memory)
The model’s ARMS—reaching across time to grasp its thoughts.
Get Started
ARMS is available as the arms-hat crate, combining ARMS memory architecture with HAT indexing:
- Rust:
cargo add arms-hat— crates.io | docs.rs - Python:
pip install arms-hat(coming soon) - Source: github.com/automate-capture/hat
Part of ongoing research into computational memory systems at Automate Capture.