Skip to main content
Research Paper • January 2026

HAT : Hierarchical Attention Tree

Extending LLM Context Through Structural Memory

Lucas Young • Automate Capture Research

100%
Recall@10
vs 70% for HNSW
70×
Faster Build
than HNSW
3.1ms
Query Latency
at 60K tokens
60K+
Token Context
from 10K native
HAT Summary Results - 100% recall, 70x faster build, 3.1ms latency

Key results: HAT achieves 100% recall while being 70× faster to build than HNSW

Architecture

HAT exploits the known hierarchy in AI conversations

HAT Architecture - Hierarchical tree structure with sessions, documents, and chunks
Session
Conversation boundary
Document
Topic groupings
Chunk
Leaf nodes with embeddings

100% Recall vs 70% for HNSW

On hierarchically-structured AI conversation data, HAT achieves perfect recall where HNSW struggles.

The key insight: HNSW learns topology from data, treating all points as unstructured. HAT exploits the known structure that AI workloads inherently have.

HAT Recall@10 100%
HNSW Recall@10 70%
HAT vs HNSW Recall Comparison
HAT vs HNSW Build Time Comparison

70× Faster Index Construction

HAT builds indexes in milliseconds, not seconds. Critical for real-time applications.

HAT
30ms
HNSW
2.1s

HAT vs Traditional RAG

Different problems, different solutions. HAT solves retrievable compute, not retrievable knowledge.

HAT vs RAG Comparison - Different approaches for different problems

An Artificial Hippocampus

HAT mirrors human memory architecture - functioning as an artificial hippocampus for AI systems.

Working memory = Current context window
Short-term memory = Recent session containers
Long-term episodic = HAT hierarchical storage
Memory consolidation = HAT consolidation phases
HAT Hippocampus Analogy - Memory architecture comparison

Beam Search Query Algorithm

O(log n) complexity through hierarchical beam search

HAT Beam Search Algorithm Visualization
# HAT Query Algorithm
1. Start at root
2. At each level, score children by cosine(query, centroid)
3. Keep top-b candidates (beam width)
4. Return top-k from leaf level
Complexity: O(b · d · c) = O(log n) when balanced
HAT Scale Performance - Maintains 100% recall at all scales

Scales Without Degradation

HAT maintains 100% recall across all tested scales while HNSW degrades significantly.

Scale HAT HNSW
500 100% 55%
1000 100% 44.5%
2000 100% 67.5%
5000 100% 55%

Sleep-Inspired Consolidation

Inspired by sleep-staged memory consolidation, HAT maintains index quality through incremental phases

HAT Consolidation Phases - Light, Medium, Deep, Full

End-to-End LLM Integration

A 10K context model achieves 100% recall on 60K+ tokens with 3.1ms latency

HAT LLM Integration Pipeline
60K
Total Tokens
17%
Native Context
100%
HAT Recall
3.1ms
Latency

Quick Start

Get started with HAT in Rust or Python

Rust cargo add arms-hat
use arms_hat::{HatIndex, DistanceMetric};

// Create index (1536 dims for OpenAI embeddings)
let mut index = HatIndex::new(1536, DistanceMetric::Cosine);

// Add embeddings with automatic hierarchy
index.add(&embedding);

// Session/document management
index.new_session();
index.new_document();

// Query - returns top 10 nearest neighbors
let results = index.query(&query_embedding, 10);

// Persistence
index.save("memory.hat")?;
let loaded = HatIndex::load("memory.hat")?;
Python pip install arms-hat
from arms_hat import HatIndex

# Create index (1536 dims for OpenAI embeddings)
index = HatIndex.cosine(1536)

# Add messages with automatic hierarchy
index.add(embedding)

# Session/document management
index.new_session()
index.new_document()

# Query - returns top 10 nearest neighbors
results = index.near(query_embedding, k=10)

# Persistence
index.save("memory.hat")
loaded = HatIndex.load("memory.hat")

Citation

@article{hat2026,
  title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
  author={Young, Lucas},
  year={2026},
  url={https://research.automate-capture.com/hat}
}

Ready to Extend Your Context?

HAT is open source and ready for production use.