Research Paper • January 2026

HAT : Hierarchical Attention Tree

Extending LLM Context Through Structural Memory

Andrew Young • Automate Capture Research

Install from crates.io View on GitHub Read Paper (PDF)

100%

Recall@10

vs 70% for HNSW

70×

Faster Build

than HNSW

3.1ms

Query Latency

at 60K tokens

60K+

Token Context

from 10K native

HAT Summary Results - 100% recall, 70x faster build, 3.1ms latency

Key results: HAT achieves 100% recall while being 70× faster to build than HNSW

Architecture

HAT exploits the known hierarchy in AI conversations

Session

Conversation boundary

Document

Topic groupings

Chunk

Leaf nodes with embeddings

100% Recall vs 70% for HNSW

On hierarchically-structured AI conversation data, HAT achieves perfect recall where HNSW struggles.

The key insight: HNSW learns topology from data, treating all points as unstructured. HAT exploits the known structure that AI workloads inherently have.

HAT Recall@10 100%

HNSW Recall@10 70%

70× Faster Index Construction

HAT builds indexes in milliseconds, not seconds. Critical for real-time applications.

HAT

30ms

HNSW

2.1s

HAT vs Traditional RAG

Different problems, different solutions. HAT solves retrievable compute, not retrievable knowledge.

HAT vs RAG Comparison - Different approaches for different problems

An Artificial Hippocampus

HAT mirrors human memory architecture - functioning as an artificial hippocampus for AI systems.

→ Working memory = Current context window

→ Short-term memory = Recent session containers

→ Long-term episodic = HAT hierarchical storage

→ Memory consolidation = HAT consolidation phases

HAT Hippocampus Analogy - Memory architecture comparison

Beam Search Query Algorithm

O(log n) complexity through hierarchical beam search

# HAT Query Algorithm
1. Start at root
2. At each level, score children by cosine(query, centroid)
3. Keep top-b candidates (beam width)
4. Return top-k from leaf level
Complexity: O(b · d · c) = O(log n) when balanced

HAT Scale Performance - Maintains 100% recall at all scales

Scales Without Degradation

HAT maintains 100% recall across all tested scales while HNSW degrades significantly.

Scale	HAT	HNSW
500	100%	55%
1000	100%	44.5%
2000	100%	67.5%
5000	100%	55%

Sleep-Inspired Consolidation

Inspired by sleep-staged memory consolidation, HAT maintains index quality through incremental phases

HAT Consolidation Phases - Light, Medium, Deep, Full

End-to-End LLM Integration

A 10K context model achieves 100% recall on 60K+ tokens with 3.1ms latency

60K

Total Tokens

17%

Native Context

100%

HAT Recall

3.1ms

Latency

Quick Start

Get started with HAT in Rust or Python

Rust cargo add arms-hat

use arms_hat::{HatIndex, DistanceMetric};

// Create index (1536 dims for OpenAI embeddings)
let mut index = HatIndex::new(1536, DistanceMetric::Cosine);

// Add embeddings with automatic hierarchy
index.add(&embedding);

// Session/document management
index.new_session();
index.new_document();

// Query - returns top 10 nearest neighbors
let results = index.query(&query_embedding, 10);

// Persistence
index.save("memory.hat")?;
let loaded = HatIndex::load("memory.hat")?;

Python pip install arms-hat

from arms_hat import HatIndex

# Create index (1536 dims for OpenAI embeddings)
index = HatIndex.cosine(1536)

# Add messages with automatic hierarchy
index.add(embedding)

# Session/document management
index.new_session()
index.new_document()

# Query - returns top 10 nearest neighbors
results = index.near(query_embedding, k=10)

# Persistence
index.save("memory.hat")
loaded = HatIndex.load("memory.hat")

crates.io docs.rs GitHub Hugging Face

Citation

@article{hat2026,
  title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
  author={Young, Andrew},
  year={2026},
  url={https://research.automate-capture.com/hat}
}

Ready to Extend Your Context?

HAT is open source and ready for production use.

Get Started on GitHub Download Paper (PDF)