> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Cookbook: Distillation

> Single-teacher OPD and routed multi-teacher policy distillation with cookbook recipes.

## What this is

The cookbook's `training.recipes.distillation_loop` trains one student on its own rollouts while one or more frozen teachers score those exact sampled tokens. The dense training signal is the per-token logprob gap between the selected teacher and the sampling student:

```text theme={null}
teacher_logprob - sampling_logprob
```

The recipe feeds that signal into the Training API's built-in `importance_sampling` loss. This is useful when you want on-policy distillation with token-level feedback instead of offline SFT traces or final-answer-only rewards.

## Single-teacher distillation

Use `teacher_model` when every prompt should be scored by the same teacher:

```python theme={null}
from training.recipes.distillation_loop import Config, main
from training.utils import DeployConfig, TrainerConfig

cfg = Config(
    log_path="./distillation_logs",
    base_model="accounts/fireworks/models/qwen3-8b",
    teacher_model="accounts/fireworks/models/qwen3-32b",
    dataset="/path/to/prompts.jsonl",
    trainer=TrainerConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ),
    deployment=DeployConfig(tokenizer_model="Qwen/Qwen3-8B"),
    max_rows=100,
    epochs=1,
)

main(cfg)
```

If `teacher_model` is a base model, the recipe creates a frozen teacher deployment for scoring. If it is already an inference model or deployment resource, the recipe uses it directly.

## Routed multi-teacher distillation

Use `multi_teacher` when different prompts should be scored by different teachers. This is routed MOPD: each prompt is scored by exactly one teacher, selected by a string value in the dataset row.

```python theme={null}
from training.recipes.distillation_loop import Config, main
from training.utils import DeployConfig, TrainerConfig
from training.utils.distillation import MultiTeacherConfig, TeacherConfig

cfg = Config(
    log_path="./mopd_logs",
    base_model="accounts/fireworks/models/qwen3p5-35b-a3b",
    teacher_model="",
    dataset="/path/to/routed_prompts.jsonl",
    multi_teacher=MultiTeacherConfig(
        route_key="teacher",
        teachers=[
            TeacherConfig(
                model="accounts/fireworks/models/qwen3p5-35b-a3b",
                route_value="math-teacher",
                tokenizer_model="Qwen/Qwen3.5-35B-A3B",
            ),
            TeacherConfig(
                model="accounts/fireworks/models/qwen3p5-35b-a3b",
                route_value="arithmetic-teacher",
                tokenizer_model="Qwen/Qwen3.5-35B-A3B",
            ),
        ],
    ),
    trainer=TrainerConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3p5-35b-a3b-256k-lora",
    ),
    deployment=DeployConfig(tokenizer_model="Qwen/Qwen3.5-35B-A3B"),
    lora_rank=8,
    prompt_groups_per_step=2,
    completions_per_prompt=1,
)

main(cfg)
```

<Note>
  Current routed MOPD is not teacher blending. The recipe does not average teacher probabilities, average teacher logits, or run multiple teachers for the same prompt. It routes each row to one configured teacher.
</Note>

## Dataset format

The distillation recipe reads JSONL rows. For routed MOPD, the dataset must include the route key you configure on `MultiTeacherConfig.route_key`. The default route key is `teacher`.

Required fields:

| Field      | Type         | Description                                                                                                                                                                         |
| ---------- | ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `messages` | `list[dict]` | Student-visible OpenAI-style chat messages. The student samples from this prompt.                                                                                                   |
| `teacher`  | `str`        | Default route key for routed MOPD. The value must exactly match one configured `TeacherConfig.route_value`. If `route_value` is unset, the value must match that teacher's `model`. |

Optional fields:

| Field              | Type         | Description                                                                                                         |
| ------------------ | ------------ | ------------------------------------------------------------------------------------------------------------------- |
| `teacher_messages` | `list[dict]` | Teacher-side prompt used for scoring. If omitted, the selected teacher scores the student rollout under `messages`. |
| `expected_answer`  | `str`        | Optional answer metadata for eval callbacks and smoke checks.                                                       |
| `extra_info`       | `dict`       | Optional user metadata. The recipe does not require a specific shape.                                               |

Example single-teacher row:

```json theme={null}
{
  "messages": [
    {"role": "user", "content": "Solve 6 * 7. End with exactly one line: Final: <answer>."}
  ],
  "expected_answer": "42"
}
```

Example routed MOPD rows:

```json theme={null}
{"messages":[{"role":"user","content":"Solve 6 * 7. End with Final: <answer>."}],"teacher":"math-teacher","expected_answer":"42"}
{"messages":[{"role":"user","content":"Solve 18 + 24. End with Final: <answer>."}],"teacher":"arithmetic-teacher","expected_answer":"42"}
```

Example with a privileged teacher prompt:

```json theme={null}
{
  "messages": [
    {"role": "user", "content": "Solve 6 * 7. End with exactly one line: Final: <answer>."}
  ],
  "teacher": "math-teacher",
  "teacher_messages": [
    {"role": "user", "content": "Solve 6 * 7. The correct answer is 42. Explain briefly, then end with Final: 42."}
  ],
  "expected_answer": "42"
}
```

If a teacher uses a custom `TeacherConfig.teacher_messages_key`, rows routed to that teacher should provide that key instead of `teacher_messages`.

## Tokenizer compatibility

Sampled-token distillation scores the student's sampled token IDs under the teacher. The student and teacher must therefore share a compatible tokenizer and vocabulary. Prefer teachers from the same model family as the student, and set `TeacherConfig.tokenizer_model` when you want the recipe to validate the teacher tokenizer against `DeployConfig.tokenizer_model`.

## Example scripts

The cookbook includes two distillation examples:

| Example                | Path                                                                   | Description                                                                      |
| ---------------------- | ---------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| Privileged-context OPD | `training/examples/distillation/gsm8k_privileged`                      | Student sees the problem; teacher can see privileged solution context.           |
| Routed MOPD smoke      | `training/examples/distillation/routed_mopd/train_two_teacher_lora.py` | Tiny generated dataset with two route labels and a Qwen3.5 35B-A3B LoRA student. |

Run the routed smoke example from the cookbook repository:

```bash theme={null}
cd training
FIREWORKS_API_KEY=... \
python examples/distillation/routed_mopd/train_two_teacher_lora.py
```

The smoke example writes a small JSONL dataset into the run log directory. It is intended to show the required row shape; production runs should provide their own JSONL dataset with the same route-key contract.

## Operational notes

* `teacher_replica_count` controls replicas for auto-created frozen teacher deployments.
* `teacher_deployment_shape` sets the default teacher deployment shape. Individual `TeacherConfig.deployment_shape` values can override it.
* Per-teacher metrics such as `teacher_route/<slug>/scored` and `teacher_route/<slug>/inflight` are logged so route skew and idle teachers are visible.
* The adaptive concurrency controller watches the student deployment. If one route dominates the dataset, some teacher deployments may be underused.
* `DISTILLATION_TEACHERS` and `DISTILLATION_TEACHER_ROUTE_KEY` can configure routed teachers for the recipe's `__main__` entrypoint. Legacy `OPD_TEACHERS` and `OPD_TEACHER_ROUTE_KEY` names are accepted as fallbacks.

## Next steps

* [Cookbook Reference](/fine-tuning/training-api/cookbook/reference) - config classes and common recipe fields
* [Loss Functions](/fine-tuning/training-api/loss-functions) - built-in and custom Training API losses
* [Weight sync](/fine-tuning/training-api/cookbook/weight-sync) - how updated weights reach serving deployments
