Lab Memory SystemsHATARMSContext ExtensionAttention Mechanisms

ARMS & HAT Memory Lab

Interactive demonstrations of the Hierarchical Attention Tree (HAT) and Attention Reasoning Memory Store (ARMS) systems. Explore how AI memory can be spatial, hierarchical, and persistent.

Experiments

Active

Completed

Experiments

HAT Tree Visualization

active

Interactive 3D visualization of the Hierarchical Attention Tree structure showing how attention states are organized by session, document, and chunk

Three.jsReact Three FiberD3

4096D Coordinate Explorer

active

Explore how attention states map to coordinates in high-dimensional space. Visualize clustering and discrimination between topics

t-SNEUMAPD3

Memory Retrieval Demo

active

Query the HAT index and see how beam search navigates the tree to find relevant attention states

PythonSentence Transformers

Compression Analyzer

planned

Analyze attention pattern sparsity and compression ratios across different model architectures

PyTorchTransformers

Try It Now

Launch Infinite Context Demo →

Infinite Context combines ARMS + HAT into an accessible system for unlimited context memory. Try the live demo on Hugging Face Spaces, or explore the source on GitHub.

About This Lab

The ARMS & HAT Memory Lab provides interactive tools to explore our breakthrough research in AI memory systems. These experiments demonstrate how:

HAT organizes attention states into navigable hierarchical trees
ARMS stores states at coordinate positions in 4096-dimensional space
Memory retrieval achieves 100% recall with O(log n) complexity

The Core Innovations

HAT: Structure Over Learning

Traditional vector databases (HNSW, Annoy, FAISS) learn topology from data. HAT exploits known structure:

Traditional:  Points → Learn topology → Navigate
HAT:          Points → Use known hierarchy → Navigate directly

Result: 100% recall vs 70% for HNSW
        70× faster construction

ARMS: Position Is Memory

Traditional memory systems project states to lower dimensions, losing information at each step. ARMS stores states at their actual coordinate positions:

Traditional:  State → Project → Index → Retrieve → Reconstruct (lossy)
ARMS:         State → Store AT coords → Retrieve → Inject (lossless)

Result: 5,372× compression with exact restoration

Interactive Experiments

1. HAT Tree Visualization

Explore the hierarchical structure of a HAT index:

Global root containing all memory
Session nodes (conversation boundaries)
Document nodes (topic groupings)
Chunk leaves (individual attention states)

See how centroids propagate up the tree and how beam search navigates down.

2. 4096D Coordinate Explorer

Visualize how attention states cluster in high-dimensional space:

t-SNE projections showing topic separation
UMAP embeddings revealing structure
Cross-topic similarity: -0.33 (excellent discrimination)

3. Memory Retrieval Demo

Watch HAT find memories in real-time:

Enter a query
See beam search expand candidates at each level
Observe centroid similarity scores
View retrieved attention states

4. Compression Analyzer (Coming Soon)

Analyze why attention patterns are highly compressible:

~90% of weights have less than 1% magnitude
Similar queries → similar patterns (cacheable)
Compression potential: 5,000-18,000×

Key Metrics

System	Metric	Value
HAT	Recall@10	100%
HAT	Build Time vs HNSW	70× faster
HAT	Query Latency	3.1ms
ARMS	Compression Ratio	5,372×
ARMS	Cross-topic Similarity	-0.33
Combined	Context Extension	6× (10K → 60K+)

The Hippocampus Model

Our architecture mirrors human memory:

Human Memory	System Equivalent
Working memory (7±2 items)	Current context window
Short-term memory	Recent session containers
Long-term episodic	HAT hierarchical storage
Memory consolidation	Consolidation phases (α/β/δ/θ)
Hippocampal indexing	Centroid-based routing

Technical Stack

Core: Rust (performance-critical paths)
Bindings: PyO3 (Python integration)
Index: Custom HAT + FAISS baseline
Encoder: Sentence Transformers
Visualization: React Three Fiber + D3

Getting Started

Rust (arms-hat crate)

cargo add arms-hat

use arms_hat::{HatIndex, DistanceMetric};

let mut index = HatIndex::new(1536, DistanceMetric::Cosine);
index.new_session();
index.add(&embedding, &data)?;
let results = index.near(&query, 10)?;

Python (infinite-context)

pip install infinite-context

from infinite_context import InfiniteContext

ctx = InfiniteContext()
ctx.add("What we discussed earlier...")
results = ctx.search("What did we discuss?", k=10)

See the infinite-context repo for full documentation.

Research Applications

Persistent conversation memory - Cross-session context
Knowledge graph construction - Structured fact extraction
Model debugging - Inspect attention patterns
Compute caching - Skip redundant attention computation
Multi-agent memory - Shared attention manifolds

This lab is part of ongoing research at Automate Capture. Experiments are interactive and run in your browser where possible.