Skip to main content
Research Paper • March 2026

Blades : Compositional Capability Enhancement

Threading Specialized Computation Through Universal Architecture

Andrew Young • Automate Capture Research

+14.2%
Accuracy Improvement
Emergent Enhancement
0
Training Required
Zero Training
87.5%
Optimal Injection Depth
N-4 Layer Rule
7
Validated Principles
Transfer Principles
Blades Experiment Summary - Results across three phases showing capability transfer, layer optimization, and multi-blade synergy

Experiment summary: Phase 1 demonstrates +14.2% accuracy improvement with matched dimensions, Phase 2 identifies N-4 layer as optimal injection point, Phase 3 shows synergistic effects in same-domain blade combinations

Architecture: Hidden State Injection

Injecting specialized capabilities through hidden state at optimal network depths

Blades Architecture - Capability Transfer showing source model, injection layer, and target model
Source Model
Specialized capability provider
Injection Layer
Optimal depth point (N-4)
Target Model
Universal model receiving capability

The Injection Formula

Mathematical formulation of hidden state enhancement

# Hidden State Injection Formula
htarget = htarget + α · g(w) ⊙ hsource
where:
htarget = target model hidden state at injection layer
hsource = source model hidden state at injection layer
α = blending coefficient (typically 1.0)
g(w) = learned projection matrix (if dimensions differ)
⊙ = element-wise multiplication (Hadamard product)

The formula additively combines the target model's representations with a gated projection of the source model's specialized knowledge, enabling zero-training capability transfer.

Layer Optimization: The N-4 Sweet Spot

Optimal injection depth is at 87.5% of network depth, corresponding to layer N-4 in standard transformer architectures.

Earlier layers (75%) show significant degradation (-7.3%), while later layers (93.75%) fail to retain baseline performance (-4.9%). Layer 28 (N-4) achieves the best balance between capability integration and interference minimization.

Layer 24 (75% depth) 48.1%
Layer 28 (87.5% depth) 67.8%
Layer 30 (93.75% depth) 60.5%
Layer Optimization Results - N-4 position shows optimal accuracy balance

Phase 1: Capability Transfer Experiments

Identifying which dimensional relationships enable successful capability transfer

Exp Source → Target Dimension Change Result
T01 CLIP → GPT-2 512 → 768 (+49%) No effect
T02 CLIP → Gemma-270M 768 → 640 (-17%) No effect
T03 MediPhi → Gemma-270M 3072 → 640 (-79%) Degradation
T04 Phi-4-reasoning → MediPhi 3072 → 3072 (0%) +14.2% ✓

T04 succeeds because both source (Phi-4-reasoning) and target (MediPhi) have matching 3072-dimension embeddings and compatible architectures. Dimensional mismatch and domain divergence prevent successful transfer.

Phase 3: Multi-Blade Synergy

Combining multiple specialized capabilities through successive injections

Multi-Blade Synergy Matrix - Same-domain combinations show synergistic effects
Blade Combination Target Model Synergy Domain Type
medical + medical_pubmed MediPhi +27.8% Same
medical + medical_pubmed Clinical +22.2% Same
medical_clinical + medical_pubmed MediPhi +16.7% Same
reasoning + medical MediPhi -27.8% Cross

Related specialized capabilities show strong positive synergy (+27.8% for medical combinations). Cross-domain blade combinations (reasoning + medical) produce negative interference, indicating capability specialization requires domain coherence.

Seven Transfer Principles

Validated principles for successful hidden state injection

1

Dimensional Matching

Equal source and target embedding dimensions enable direct state transfer without projection artifacts

2

Domain Alignment

Source and target models trained on similar domains show significantly higher transfer success

3

Depth Criticality

Injection at N-4 position (87.5% depth) maximizes capability retention while minimizing interference

4

Architecture Similarity

Transformer architectures with comparable attention mechanisms transfer more effectively

5

Synergy Over Composition

Related specialized capabilities amplify each other; cross-domain combinations often degrade performance

6

Zero-Shot Viability

Capabilities transfer immediately without fine-tuning, enabling rapid experimentation

7

Scaling Preservation

Transfer success scales consistently across model sizes when dimensional matching is maintained

Citation

@article{blades2026,
  title={Blades: Compositional Capability Enhancement Through Hidden State Injection},
  author={Young, Andrew},
  year={2026},
  month={March},
  url={https://research.automate-capture.com/blades}
}

Extend Your Model Capabilities

Blades enables zero-training capability transfer through hidden state injection.