Skip to main content

What this is

The cookbook’s training.recipes.distillation_loop trains one student on its own rollouts while one or more frozen teachers score those exact sampled tokens. The dense training signal is the per-token logprob gap between the selected teacher and the sampling student:
teacher_logprob - sampling_logprob
The recipe feeds that signal into the Training API’s built-in importance_sampling loss. This is useful when you want on-policy distillation with token-level feedback instead of offline SFT traces or final-answer-only rewards.

Single-teacher distillation

Use teacher_model when every prompt should be scored by the same teacher:
from training.recipes.distillation_loop import Config, main
from training.utils import DeployConfig, TrainerConfig

cfg = Config(
    log_path="./distillation_logs",
    base_model="accounts/fireworks/models/qwen3-8b",
    teacher_model="accounts/fireworks/models/qwen3-32b",
    dataset="/path/to/prompts.jsonl",
    trainer=TrainerConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ),
    deployment=DeployConfig(tokenizer_model="Qwen/Qwen3-8B"),
    max_rows=100,
    epochs=1,
)

main(cfg)
If teacher_model is a base model, the recipe creates a frozen teacher deployment for scoring. If it is already an inference model or deployment resource, the recipe uses it directly.

Routed multi-teacher distillation

Use multi_teacher when different prompts should be scored by different teachers. This is routed MOPD: each prompt is scored by exactly one teacher, selected by a string value in the dataset row.
from training.recipes.distillation_loop import Config, main
from training.utils import DeployConfig, TrainerConfig
from training.utils.distillation import MultiTeacherConfig, TeacherConfig

cfg = Config(
    log_path="./mopd_logs",
    base_model="accounts/fireworks/models/qwen3p5-35b-a3b",
    teacher_model="",
    dataset="/path/to/routed_prompts.jsonl",
    multi_teacher=MultiTeacherConfig(
        route_key="teacher",
        teachers=[
            TeacherConfig(
                model="accounts/fireworks/models/qwen3p5-35b-a3b",
                route_value="math-teacher",
                tokenizer_model="Qwen/Qwen3.5-35B-A3B",
            ),
            TeacherConfig(
                model="accounts/fireworks/models/qwen3p5-35b-a3b",
                route_value="arithmetic-teacher",
                tokenizer_model="Qwen/Qwen3.5-35B-A3B",
            ),
        ],
    ),
    trainer=TrainerConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3p5-35b-a3b-256k-lora",
    ),
    deployment=DeployConfig(tokenizer_model="Qwen/Qwen3.5-35B-A3B"),
    lora_rank=8,
    prompt_groups_per_step=2,
    completions_per_prompt=1,
)

main(cfg)
Current routed MOPD is not teacher blending. The recipe does not average teacher probabilities, average teacher logits, or run multiple teachers for the same prompt. It routes each row to one configured teacher.

Dataset format

The distillation recipe reads JSONL rows. For routed MOPD, the dataset must include the route key you configure on MultiTeacherConfig.route_key. The default route key is teacher. Required fields:
FieldTypeDescription
messageslist[dict]Student-visible OpenAI-style chat messages. The student samples from this prompt.
teacherstrDefault route key for routed MOPD. The value must exactly match one configured TeacherConfig.route_value. If route_value is unset, the value must match that teacher’s model.
Optional fields:
FieldTypeDescription
teacher_messageslist[dict]Teacher-side prompt used for scoring. If omitted, the selected teacher scores the student rollout under messages.
expected_answerstrOptional answer metadata for eval callbacks and smoke checks.
extra_infodictOptional user metadata. The recipe does not require a specific shape.
Example single-teacher row:
{
  "messages": [
    {"role": "user", "content": "Solve 6 * 7. End with exactly one line: Final: <answer>."}
  ],
  "expected_answer": "42"
}
Example routed MOPD rows:
{"messages":[{"role":"user","content":"Solve 6 * 7. End with Final: <answer>."}],"teacher":"math-teacher","expected_answer":"42"}
{"messages":[{"role":"user","content":"Solve 18 + 24. End with Final: <answer>."}],"teacher":"arithmetic-teacher","expected_answer":"42"}
Example with a privileged teacher prompt:
{
  "messages": [
    {"role": "user", "content": "Solve 6 * 7. End with exactly one line: Final: <answer>."}
  ],
  "teacher": "math-teacher",
  "teacher_messages": [
    {"role": "user", "content": "Solve 6 * 7. The correct answer is 42. Explain briefly, then end with Final: 42."}
  ],
  "expected_answer": "42"
}
If a teacher uses a custom TeacherConfig.teacher_messages_key, rows routed to that teacher should provide that key instead of teacher_messages.

Tokenizer compatibility

Sampled-token distillation scores the student’s sampled token IDs under the teacher. The student and teacher must therefore share a compatible tokenizer and vocabulary. Prefer teachers from the same model family as the student, and set TeacherConfig.tokenizer_model when you want the recipe to validate the teacher tokenizer against DeployConfig.tokenizer_model.

Example scripts

The cookbook includes two distillation examples:
ExamplePathDescription
Privileged-context OPDtraining/examples/distillation/gsm8k_privilegedStudent sees the problem; teacher can see privileged solution context.
Routed MOPD smoketraining/examples/distillation/routed_mopd/train_two_teacher_lora.pyTiny generated dataset with two route labels and a Qwen3.5 35B-A3B LoRA student.
Run the routed smoke example from the cookbook repository:
cd training
FIREWORKS_API_KEY=... \
python examples/distillation/routed_mopd/train_two_teacher_lora.py
The smoke example writes a small JSONL dataset into the run log directory. It is intended to show the required row shape; production runs should provide their own JSONL dataset with the same route-key contract.

Operational notes

  • teacher_replica_count controls replicas for auto-created frozen teacher deployments.
  • teacher_deployment_shape sets the default teacher deployment shape. Individual TeacherConfig.deployment_shape values can override it.
  • Per-teacher metrics such as teacher_route/<slug>/scored and teacher_route/<slug>/inflight are logged so route skew and idle teachers are visible.
  • The adaptive concurrency controller watches the student deployment. If one route dominates the dataset, some teacher deployments may be underused.
  • DISTILLATION_TEACHERS and DISTILLATION_TEACHER_ROUTE_KEY can configure routed teachers for the recipe’s __main__ entrypoint. Legacy OPD_TEACHERS and OPD_TEACHER_ROUTE_KEY names are accepted as fallbacks.

Next steps