Open Source • Python • 2026 · v0.1 · Live

Distillarium

Distill any task into a pocket-sized Spirit. Pure model. No API.

Distill any teacher LLM into a 20–50M-param task-specific model that runs on CPU, edge, or browser with zero API dependency at inference.

Visit distillarium.app GitHub PyPI

Reference Spirit

Needle (20.7M)

Compression

~72,000×

Distillation cost

$0.30

Distillation time

27 min · 1× RTX 5090

Tool-name accuracy

78%

Arg-key F1

0.73

On-disk size

249 MB

CPU latency

~45 ms median

Inference target

CPU · Edge · Browser

Distillarium vs LoRA / fine-tuning

	LoRA / fine-tuning	Distillarium
Final model size	7B+ (same as base)	5M–50M
Inference target	GPU	CPU, edge, browser
API dependency at inference	Sometimes (hosted base)	Zero
Cost at inference	$$$ per call	$0
Cold start	Seconds	Milliseconds
Best for	Generalist capability extension	Single-task production

Quickstart

# 1. Install
pip install distillarium[gemini]

# 2. Distill — uses the reference Needle recipe
distillery distill recipes/needle.tool-calling-v1.yaml

# 3. Bottle the Spirit for deployment
distillery bottle spirits/needle.pt --format onnx

Expected: ~30 minutes, ~$0.30 in teacher API, 70–80° proof on tool calling.

The argument: distillation, not fine-tuning

Fine-tuning ships a 7B-param model that still wants a GPU. Distillation ships a 20M-param model that runs anywhere. For single-task production, the gap is enormous.

Vocabulary

Every concept maps to something real. The metaphor is load-bearing — it makes the package teachable.

Spirit

The trained, bottled model — the output of a run.

Mash

Seed corpus the teacher generates training data from.

Recipe

YAML config: teacher, mash, student arch, cuts, still, tasting, bottling.

The Still

The training run itself.

Cuts

Train / eval / test data splits.

Heads / Hearts / Tails

Discarded noise / kept core / borderline cases.

Proof

Held-out accuracy. The higher the proof, the more concentrated.

Tasting Notes

Auto-generated eval report — strengths, weaknesses, failure cases.

Aging in Casks

Continued training, fine-tuning, RLHF refresh.

Bottling

Export to ONNX, GGUF, or browser-WASM.

The Cellar

Library of Spirits (public or private).

Provenance

How this production was born.

Spawned from the tool-calling reproducibility cluster — multiple Radar items converged on the question of how small a model could be and still call tools reliably. The lab built it; this is what shipped.

Every production carries a permanent attestation receipt through the Research Radar Protocol — so the path from paper to ship is verifiable, on chain.

Open Distillarium.

Visit distillarium.app See all Productions