Skip to main content
Lab Memory SystemsHATARMSContext ExtensionAttention Mechanisms

ARMS & HAT Memory Lab

Interactive demonstrations of the Hierarchical Attention Tree (HAT) and Attention Reasoning Memory Store (ARMS) systems. Explore how AI memory can be spatial, hierarchical, and persistent.

4
Experiments
3
Active
0
Completed

Experiments

HAT Tree Visualization

active

Interactive 3D visualization of the Hierarchical Attention Tree structure showing how attention states are organized by session, document, and chunk

Three.jsReact Three FiberD3

4096D Coordinate Explorer

active

Explore how attention states map to coordinates in high-dimensional space. Visualize clustering and discrimination between topics

t-SNEUMAPD3

Memory Retrieval Demo

active

Query the HAT index and see how beam search navigates the tree to find relevant attention states

PythonSentence Transformers

Compression Analyzer

planned

Analyze attention pattern sparsity and compression ratios across different model architectures

PyTorchTransformers

About This Lab

The ARMS & HAT Memory Lab provides interactive tools to explore our breakthrough research in AI memory systems. These experiments demonstrate how:

  • HAT organizes attention states into navigable hierarchical trees
  • ARMS stores states at coordinate positions in 4096-dimensional space
  • Memory retrieval achieves 100% recall with O(log n) complexity

The Core Innovations

HAT: Structure Over Learning

Traditional vector databases (HNSW, Annoy, FAISS) learn topology from data. HAT exploits known structure:

Traditional:  Points → Learn topology → Navigate
HAT:          Points → Use known hierarchy → Navigate directly

Result: 100% recall vs 70% for HNSW
        70× faster construction

ARMS: Position Is Memory

Traditional memory systems project states to lower dimensions, losing information at each step. ARMS stores states at their actual coordinate positions:

Traditional:  State → Project → Index → Retrieve → Reconstruct (lossy)
ARMS:         State → Store AT coords → Retrieve → Inject (lossless)

Result: 5,372× compression with exact restoration

Interactive Experiments

1. HAT Tree Visualization

Explore the hierarchical structure of a HAT index:

  • Global root containing all memory
  • Session nodes (conversation boundaries)
  • Document nodes (topic groupings)
  • Chunk leaves (individual attention states)

See how centroids propagate up the tree and how beam search navigates down.

2. 4096D Coordinate Explorer

Visualize how attention states cluster in high-dimensional space:

  • t-SNE projections showing topic separation
  • UMAP embeddings revealing structure
  • Cross-topic similarity: -0.33 (excellent discrimination)

3. Memory Retrieval Demo

Watch HAT find memories in real-time:

  1. Enter a query
  2. See beam search expand candidates at each level
  3. Observe centroid similarity scores
  4. View retrieved attention states

4. Compression Analyzer (Coming Soon)

Analyze why attention patterns are highly compressible:

  • ~90% of weights have less than 1% magnitude
  • Similar queries → similar patterns (cacheable)
  • Compression potential: 5,000-18,000×

Key Metrics

SystemMetricValue
HATRecall@10100%
HATBuild Time vs HNSW70× faster
HATQuery Latency3.1ms
ARMSCompression Ratio5,372×
ARMSCross-topic Similarity-0.33
CombinedContext Extension6× (10K → 60K+)

The Hippocampus Model

Our architecture mirrors human memory:

Human MemorySystem Equivalent
Working memory (7±2 items)Current context window
Short-term memoryRecent session containers
Long-term episodicHAT hierarchical storage
Memory consolidationConsolidation phases (α/β/δ/θ)
Hippocampal indexingCentroid-based routing

Technical Stack

  • Core: Rust (performance-critical paths)
  • Bindings: PyO3 (Python integration)
  • Index: Custom HAT + FAISS baseline
  • Encoder: Sentence Transformers
  • Visualization: React Three Fiber + D3

Getting Started

from arms_hat import HatIndex

# Create index
index = HatIndex.cosine(1536)

# Start a conversation (session)
index.new_session()

# Add messages
for message in conversation:
    embedding = encoder.encode(message)
    index.add(embedding, message)

# Query memory
query = encoder.encode("What did we discuss about X?")
results = index.near(query, k=10)

# Retrieve with 100% accuracy
for state_id, similarity in results:
    print(f"Found: {state_id} @ {similarity:.3f}")

Research Applications

  • Persistent conversation memory - Cross-session context
  • Knowledge graph construction - Structured fact extraction
  • Model debugging - Inspect attention patterns
  • Compute caching - Skip redundant attention computation
  • Multi-agent memory - Shared attention manifolds

This lab is part of ongoing research at Automate Capture. Experiments are interactive and run in your browser where possible.