Skip to main content

What this is

Fireworks Training SDK gives teams a flexibility ladder from managed jobs to fully custom training loops. Start managed for standard objectives, then move to Tinker-compatible loops when you need custom losses, full-parameter updates, and tighter experiment control. Full-parameter RFT is currently in private preview and is only supported through Training SDK workflows.

Why this approach

  • Differentiation: Fireworks combines managed training products and custom loop primitives on one platform, so teams can move from baseline to frontier research without rebuilding infrastructure.
  • When to use managed jobs: choose them for standard objectives and faster productionization with less custom training code.
  • When to use Training SDK loops: choose them when you need custom objectives, full-parameter tuning, or algorithm research beyond built-in managed objectives.
  • Production inference in the loop keeps offline progress aligned with real serving behavior.

System architecture

A control-plane API provisions trainer and deployment resources. Your local loop connects to the trainer service, runs custom train steps, and periodically exports checkpoints to serving for realistic sampling and evaluation.

How to use these APIs

  • Fireworks.reinforcement_fine_tuning_steps.create: Create a trainer endpoint for custom loops.
  • tinker.ServiceClient.create_lora_training_client: Attach a trainable client to the trainer service.
  • TrainingClient.forward_backward_custom + TrainingClient.optim_step: Apply custom losses and update model weights.
  • TrainingClient.save_weights_for_sampler + Fireworks.deployments: Publish checkpoints and evaluate via serving.

Workflow

  1. Choose your mode: managed jobs for standard objectives, or service-mode SDK loops for custom objectives.
  2. If you choose a custom loop, create trainer and deployment resources from the Fireworks SDK.
  3. Connect a training client from your Python loop.
  4. Build batches and compute custom objectives in your code.
  5. Run iterative updates with optimizer and metrics logging.
  6. Checkpoint, hotload deployment, and evaluate sampled behavior.

End-to-end examples

Bootstrap trainer and deployment

from fireworks import Fireworks
import tinker

fw = Fireworks(api_key="<FIREWORKS_API_KEY>", account_id="<ACCOUNT_ID>")
trainer_job = fw.reinforcement_fine_tuning_steps.create(
    training_config={
        "base_model": "accounts/fireworks/models/qwen3-8b",
        "lora_rank": 0,
        "max_context_length": 4096,
        "learning_rate": 1e-5,
        "gradient_accumulation_steps": 4,
    },
    extra_body={"serviceMode": True, "keepAlive": False},
)
service = tinker.ServiceClient(base_url=trainer_job.direct_route_handle, api_key="<FIREWORKS_API_KEY>")
training_client = service.create_lora_training_client(base_model="accounts/fireworks/models/qwen3-8b", rank=0)

deployment = fw.deployments.create(
    deployment_id="research-loop-serving",
    base_model="accounts/fireworks/models/qwen3-8b",
    min_replica_count=0,
    max_replica_count=1,
    enable_hot_load=True,
)

One custom update iteration

def objective(data, logprobs_list):
    loss = compute_custom_objective(logprobs_list)
    metrics = {"loss": float(loss.item())}
    return loss, metrics

training_client.forward_backward_custom(batch, objective).result()
training_client.optim_step(
    tinker.AdamParams(
        learning_rate=1e-5,
        beta1=0.9,
        beta2=0.999,
        eps=1e-8,
        weight_decay=0.01,
    )
).result()

checkpoint = training_client.save_weights_for_sampler("step_0010").result()
hotload_deployment(checkpoint.path)

Operational guidance

  • Service-mode trainer jobs currently support full-parameter tuning only. When serviceMode=true, set training_config.lora_rank and Tinker client rank to 0.
  • Managed objective coverage today includes SFT managed jobs, DPO managed jobs, and managed reinforcement fine-tuning jobs.
  • Local starter scripts for custom implementation are available for GRPO, off-policy GRPO, and DPO in the Python examples repository.
  • SFT is currently documented as a managed-job path (sft-example) rather than a custom-loss local loop starter.
  • Record train-step metrics, checkpoint IDs, and deployment revisions together.
  • Use fixed eval sets to compare policies across checkpoints.

Common pitfalls

  • Evaluating against stale deployments can hide regressions.
  • Under-specified checkpoint metadata makes successful runs hard to reproduce.