Skip to main content

What this is

Use this as the default lifecycle: bootstrap a trainer, run iterative updates, export checkpoints, then sample through deployment endpoints for realistic evaluation.

How to use these APIs

  • tinker.ServiceClient: Connect your local loop to the trainer service.
  • TrainingClient.forward_backward / forward_backward_custom: Compute gradients with built-in or custom objectives.
  • TrainingClient.optim_step: Apply optimization updates.
  • TrainingClient.save_weights_for_sampler: Export serving-compatible checkpoints.
  • Fireworks.deployments.*: Hotload and evaluate checkpoints under serving conditions.

Workflow

  1. Create or reuse a service-mode trainer job.
  2. Instantiate a training client from the trainer endpoint.
  3. Run repeated train steps with objective + optimizer updates.
  4. Save sampler checkpoints on cadence.
  5. Hotload the serving deployment and sample evaluation prompts.
  6. Record metrics and decide whether to continue or branch experiments.

End-to-end examples

Bootstrap training client

trainer_job = fw.reinforcement_fine_tuning_steps.create(
    training_config={
        "base_model": "accounts/fireworks/models/qwen3-8b",
        "lora_rank": 0,
        "max_context_length": 4096,
    },
    extra_body={"serviceMode": True, "keepAlive": False},
)
service = tinker.ServiceClient(base_url=trainer_job.direct_route_handle, api_key="<FIREWORKS_API_KEY>")
training_client = service.create_lora_training_client(base_model="accounts/fireworks/models/qwen3-8b", rank=0)

Run train-step with custom objective

def objective(data, logprobs_list):
    loss = compute_objective(data=data, logprobs_list=logprobs_list)
    return loss, {"loss": float(loss.item())}

for step in range(total_steps):
    batch = build_batch(step)
    training_client.forward_backward_custom(batch, objective).result()
    training_client.optim_step(
        tinker.AdamParams(learning_rate=1e-5, beta1=0.9, beta2=0.999, eps=1e-8, weight_decay=0.01)
    ).result()

Checkpoint, hotload, and evaluate

checkpoint = training_client.save_weights_for_sampler(f"step_{step:05d}").result()
wait_for_hotload_ready(deployment_id="research-loop-serving")
hotload_deployment(checkpoint.path)
responses = sample_with_deployment(prompts=eval_prompts)
score = evaluate_responses(responses)
print({"step": step, "eval_score": score})

Operational guidance

  • Service-mode trainer jobs currently support full-parameter tuning only. When serviceMode=true, set training_config.lora_rank and Tinker client rank to 0.
  • Keep checkpoint intervals predictable so evaluation comparisons are stable.
  • Store the exact prompt set used for each evaluation sweep.

Common pitfalls

  • Sampling from trainer internals instead of deployment endpoints can skew results.
  • Missing checkpoint-to-deployment traceability makes rollback risky.