Research Publications
Our peer-reviewed publications, working papers, and research findings spanning AI memory systems, neural network architectures, and computational efficiency.
Featured Research
Blades: Compositional Capability Enhancement Through Hidden State Injection
We introduce Blades, a framework for enhancing neural network capabilities through hidden state injection between specialized models. Unlike fine-tuning, model merging, or ensembling, Blades enable hot-swappable capability composition within a single forward pass, requiring no additional training. Through systematic experimentation across four model pairs, we identify the conditions for successful capability transfer: matched hidden dimensions, late-layer injection at network depth 87.5%, gated feature selection, and domain-coherent blade stacking. Under these conditions, capability composition can achieve emergent performance exceeding either source model alone. Key finding: Injecting reasoning capabilities from Phi-4-mini-reasoning into MediPhi achieves 69.6% accuracy on medical reasoning tasks, compared to 55.4% for MediPhi alone (+14.2% absolute improvement). We further establish seven validated principles for capability transfer, including the N-4 layer rule for optimal injection depth and domain coherence for multi-blade synergy (+27.8% for same-domain, -27.8% for cross-domain interference).
Learned Routers Don't Learn: Statistical Evidence for Expert Miscalibration in Mixture-of-Experts Models
We present empirical evidence that learned routers in Mixture-of-Experts (MoE) transformer models are miscalibrated with respect to expert quality. Using a per-layer expert isolation methodology with log-probability scoring and rigorous multiple comparison correction (Benjamini-Hochberg step-up FDR), we demonstrate that: (1) experts have statistically significant domain specialization (207/896 expert-layer-domain combinations survive BH-FDR at alpha = 0.05), (2) the learned router ignores this specialization (Fisher z-averaged Spearman rho = -0.017 between natural routing probability and expert quality), and (3) a single expert (E2) is moderately preferred across all domains (~20% above uniform) despite never being the best expert for any domain tested. We propose semantic routing replacement and cross-model expert grafting as zero-training alternatives, and discuss implications for MoE architecture design.
Sparse Pathways: Domain-Aware Neuron Routing for Efficient Transformer Inference
We demonstrate that transformer FFN neurons exhibit strong domain-specific activation patterns that scale with model size. Analyzing 6 models across 2.7B to 1T parameters, we discover a near-perfect correlation (r = 0.999) between model scale and neuron specialization, with larger models dedicating increasingly more neurons to domain-specific computation. Phi-2 (2.7B) shows 30.9% specialized neurons and 1.45x potential speedup; K2-Think (32B) shows 68.2% specialization and 3.14x potential speedup. We characterize layer roles (syntax processing in early layers, semantic computation in late layers) and validate that neuron outputs are preserved under 5-15% sparsity (cosine similarity 0.999+). This work reveals fundamental scaling laws for efficient inference: sparse pathways become MORE effective at frontier scale, not less.
Claude Opus 4.6 on 1stProof: Evaluating AI on Research-Level Mathematics
We evaluate Claude Opus 4.6 on the 1stProof Benchmark—10 research-level mathematics problems with encrypted answers revealed February 13, 2026. Using 6 prompting strategies across 80+ API calls, we find the model produces complete, rigorous proofs for all 10 questions. We discover that extended thinking mode with maximum effort exhausts 100% of output tokens on reasoning alone, and develop a two-phase continuation approach to capture both deep thinking (128K tokens) and final responses. Answer verification pending official release.
ARMS: A Computational Attention Manifold for Persistent AI Memory
We introduce ARMS (Attention Reasoning Memory Store), a hexagonal architecture for storing and retrieving computed AI attention states in native high-dimensional space. Unlike traditional approaches that project, index, retrieve, and reconstruct states—losing information at each step—ARMS stores states at their actual coordinate positions enabling exact restoration. Inspired by game engine spatial partitioning and the biological hippocampus, ARMS achieves O(log n) retrieval with 100% accuracy through hierarchical container trees. Our validated ARM prototype demonstrates 5,372× compression ratios with perfect reconstruction, proving that attention manifolds are both sparse and cacheable.
Hierarchical Attention Tree: Extending LLM Context Through Structural Memory
We present the Hierarchical Attention Tree (HAT), a novel index structure that extends the effective context of language models by an order of magnitude. A model with 10K native context achieves 100% recall on 60K+ token conversations through hierarchical attention state storage and retrieval, with 3.1ms average latency. Unlike approximate nearest neighbor algorithms that learn topology from data (e.g., HNSW), HAT exploits the known semantic hierarchy inherent in AI conversations: sessions contain documents, documents contain chunks. Our experiments demonstrate 100% recall vs 70% for HNSW on hierarchically-structured data, 70× faster index construction, and that simple centroid-based routing outperforms geometric sophistication.