Fireworks Agent can write a task-specific evaluator from your dataset alone. Two flavors:Documentation Index
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
Use this file to discover all available pages before exploring further.
- SFT evaluators — a Python evaluator (
evaluator.py) plus a spec (eval_spec.md) that Agent uses to score candidates during a subsequent SFT sweep in the same session. - RFT evaluators — an Eval Protocol
@evaluation_testevaluator ready to drive a Reinforcement Fine-Tuning job.
SFT evaluators
What you get
Agent generates two artifacts in the session workspace:outputs/eval_spec.md— a human-readable spec describing what the evaluator checks (the contract: what counts as correct, how partial credit works, edge cases).outputs/evaluator.py— a Python evaluator that takes a model’s outputs and the dataset’s ground truth and returns scores.
eval_spec.md and evaluator.py contents in chat so you can review them before they’re used downstream.
Example session instructions
Author an evaluator only:Where evaluator authoring lives in the 7-phase pipeline: When evaluator authoring runs as a standalone session, phases 3–7 of the standard pipeline don’t apply; the session writes
outputs/evaluator.py + outputs/eval_spec.md and stops. When you chain authoring into SFT in the same session, those artifacts feed phase 5 (Evaluation) of the follow-on training pipeline — used to score candidates during phase 3 and again for direct evaluation of the final model. (RFT evaluators are saved to your Fireworks account and then used by Managed Fine-Tuning’s RFT path, not by Agent.) See How Agent runs a training job.outputs/evaluator.py and outputs/eval_spec.md without re-authoring them, and reuses the staged dataset paths so the dataset is downloaded only once.
Multi-turn handoff
If you want fine-grained control of the handoff, structure your two instructions like this:RFT evaluators
Agent authors RFT evaluators but does not run RFT training. This workflow produces and validates the Eval Protocol evaluator file, then registers it with your Fireworks account. The actual RFT training job runs through Managed Fine-Tuning’s RFT path — not from an Agent session.
What you get
An Eval Protocol@evaluation_test evaluator file, validated end-to-end, ready to drop into a Reinforcement Fine-Tuning job. The plan includes the concrete evaluator code, validation commands, and the command to save the evaluator to Fireworks.
This is purpose-built for tasks where you can score model outputs against reference data — math problems, code generation, structured-output extraction, agentic workflows with verifiable side effects.
Example session instruction
Handing off to RFT training
Once the evaluator is saved, run the RFT job through Managed Fine-Tuning — see the Reinforcement Fine-Tuning Overview and Evaluators concepts. For example:Workflow summary
Dataset inspection
Agent stages the dataset locally, samples records, and infers the evaluator contract from data plus your scoring intent. Agent will not finalize an evaluator without successfully staging readable data.
Spec and code generation
For SFT, Agent writes both
eval_spec.md (the contract) and evaluator.py (the implementation) and self-checks that both are non-empty before finishing. For RFT, Agent writes a single Eval Protocol @evaluation_test file and self-checks that it’s non-empty and that validation succeeds.Review and approval
Agent surfaces the artifacts inline in chat. For RFT, Agent also presents a plan with validation and save commands and asks for one approval.
Hand off (optional)
If your instruction asks for downstream SFT, Agent continues into the SFT workflow in the same session and reuses the just-authored evaluator — no re-downloading, no re-authoring. RFT training itself runs through Managed Fine-Tuning, not from an Agent session.
When to use which
| Use case | Workflow |
|---|---|
| You want an evaluator Agent can use to score candidates during an SFT sweep, with optional auto-continue into SFT | SFT evaluator authoring (run end-to-end by Agent) |
| You want an Eval Protocol evaluator to drive an RFT job | RFT evaluator authoring (Agent writes and saves the evaluator; RFT training runs through Managed Fine-Tuning) |
| You don’t have a clear notion of “correct” yet | Start with validation-loss-only SFT on Agent SFT and add an evaluator later |
Agent crib notes
- Required input: dataset ID. Agent also wants your scoring intent in plain English — “check whether the answer matches ground truth”, “verify the JSON has the right schema”, etc.
- For SFT evaluators, ask for both authoring and SFT in the same instruction to get same-session evaluator reuse for free.
- For RFT evaluators, expect a plan + cost approval before the evaluator is saved to your Fireworks account. The Agent session ends after the evaluator is saved. Hand off to Managed Fine-Tuning’s RFT path to run the actual RFT training job.
- Agent surfaces the generated
eval_spec.mdandevaluator.pyinline in chat after authoring — relay them to the user. - All evaluator artifacts live under
outputs/in the session workspace and can be inspected viafirectl session get <id>if needed.