Distillarium
Distill any task into a pocket-sized Spirit. Pure model. No API.
Distill any teacher LLM into a 20–50M-param task-specific model that runs on CPU, edge, or browser with zero API dependency at inference.
Distillarium vs LoRA / fine-tuning
| LoRA / fine-tuning | Distillarium | |
|---|---|---|
| Final model size | 7B+ (same as base) | 5M–50M |
| Inference target | GPU | CPU, edge, browser |
| API dependency at inference | Sometimes (hosted base) | Zero |
| Cost at inference | $$$ per call | $0 |
| Cold start | Seconds | Milliseconds |
| Best for | Generalist capability extension | Single-task production |
Quickstart
# 1. Install
pip install distillarium[gemini]
# 2. Distill — uses the reference Needle recipe
distillery distill recipes/needle.tool-calling-v1.yaml
# 3. Bottle the Spirit for deployment
distillery bottle spirits/needle.pt --format onnx Expected: ~30 minutes, ~$0.30 in teacher API, 70–80° proof on tool calling.
The argument: distillation, not fine-tuning
Fine-tuning ships a 7B-param model that still wants a GPU. Distillation ships a 20M-param model that runs anywhere. For single-task production, the gap is enormous.
Vocabulary
Every concept maps to something real. The metaphor is load-bearing — it makes the package teachable.
The trained, bottled model — the output of a run.
Seed corpus the teacher generates training data from.
YAML config: teacher, mash, student arch, cuts, still, tasting, bottling.
The training run itself.
Train / eval / test data splits.
Discarded noise / kept core / borderline cases.
Held-out accuracy. The higher the proof, the more concentrated.
Auto-generated eval report — strengths, weaknesses, failure cases.
Continued training, fine-tuning, RLHF refresh.
Export to ONNX, GGUF, or browser-WASM.
Library of Spirits (public or private).
How this production was born.
Spawned from the tool-calling reproducibility cluster — multiple Radar items converged on the question of how small a model could be and still call tools reliably. The lab built it; this is what shipped.
Every production carries a permanent attestation receipt through the Research Radar Protocol — so the path from paper to ship is verifiable, on chain.