Skip to main content
← All Productions
Open Source • Python • 2026 · v0.1 · Live

Distillarium

Distill any task into a pocket-sized Spirit. Pure model. No API.

Distill any teacher LLM into a 20–50M-param task-specific model that runs on CPU, edge, or browser with zero API dependency at inference.

Reference Spirit
Needle (20.7M)
Compression
~72,000×
Distillation cost
$0.30
Distillation time
27 min · 1× RTX 5090
Tool-name accuracy
78%
Arg-key F1
0.73
On-disk size
249 MB
CPU latency
~45 ms median
Inference target
CPU · Edge · Browser

Distillarium vs LoRA / fine-tuning

LoRA / fine-tuning Distillarium
Final model size 7B+ (same as base) 5M–50M
Inference target GPU CPU, edge, browser
API dependency at inference Sometimes (hosted base) Zero
Cost at inference $$$ per call $0
Cold start Seconds Milliseconds
Best for Generalist capability extension Single-task production

Quickstart

# 1. Install
pip install distillarium[gemini]

# 2. Distill — uses the reference Needle recipe
distillery distill recipes/needle.tool-calling-v1.yaml

# 3. Bottle the Spirit for deployment
distillery bottle spirits/needle.pt --format onnx

Expected: ~30 minutes, ~$0.30 in teacher API, 70–80° proof on tool calling.

The argument: distillation, not fine-tuning

Fine-tuning ships a 7B-param model that still wants a GPU. Distillation ships a 20M-param model that runs anywhere. For single-task production, the gap is enormous.

Vocabulary

Every concept maps to something real. The metaphor is load-bearing — it makes the package teachable.

Spirit

The trained, bottled model — the output of a run.

Mash

Seed corpus the teacher generates training data from.

Recipe

YAML config: teacher, mash, student arch, cuts, still, tasting, bottling.

The Still

The training run itself.

Cuts

Train / eval / test data splits.

Heads / Hearts / Tails

Discarded noise / kept core / borderline cases.

Proof

Held-out accuracy. The higher the proof, the more concentrated.

Tasting Notes

Auto-generated eval report — strengths, weaknesses, failure cases.

Aging in Casks

Continued training, fine-tuning, RLHF refresh.

Bottling

Export to ONNX, GGUF, or browser-WASM.

The Cellar

Library of Spirits (public or private).

Provenance

How this production was born.

Spawned from the tool-calling reproducibility cluster — multiple Radar items converged on the question of how small a model could be and still call tools reliably. The lab built it; this is what shipped.

Every production carries a permanent attestation receipt through the Research Radar Protocol — so the path from paper to ship is verifiable, on chain.