Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Training API is currently in private preview. Request early access to get started.
Using a code agent? Clone fw-ai/cookbook. The cookbook includes the skills/dev/ skill, which gives agents repo-specific guidance for setup, debugging, hotload, RL recipe internals, and checkpoint promotion.

What is the Training API?

Fireworks Training API lets you write training logic in plain Python on your local machine while model computation runs on remote GPUs managed by Fireworks. Most users should start from cookbook recipes, the recommended entry point for standard SFT, DPO, GRPO-style training, and async RL loops for agentic RL. Fork a recipe when you want to adapt an existing loop with your own loss, reward, rollout function, data loading, or checkpointing behavior. Use the Direct Training SDK when you need full control over training behavior.
ModeBest forInfrastructure
Cookbook recipesRecommended entry point for adapting existing SFT/DPO/GRPO-style loops, including async RL for agentic RLYou configure and implement simple loss, reward, or rollout functions; platform runs GPUs
Direct Training SDKFull control over training behaviorYou drive the training flow; platform runs GPUs

Who does what

Fireworks handlesCookbook recipes handleDirect Training SDK users implement
GPU provisioning and cluster managementTraining loop structure for supported recipesTraining loop logic (forward_backward_custom + optim_step)
Service-mode trainer lifecycle (create, health-check, reconnect, delete)Resource setup, health checks, reconnect, and cleanupManager/client wiring when working below recipe utilities
Distributed forward pass, backward pass, optimizer executionCommon losses and reward/evaluation plumbingLoss function and batch construction
Checkpoint storage and exportCheckpoint save, resume, promotion, and weight sync helpersCheckpoint calls (save_weights_for_sampler_ext, DCP snapshots)
Inference deployments and hotloadDeployment sampling and serving-integrated evaluation for RL recipesCustom rollout, sampling, and evaluation logic
Preemption recovery and job resumeResume logic for supported recipe checkpointsResume policy and state restoration calls
Distributed training (multi-node, sharding, FSDP)Config surfaces for learning rate, grad accumulation, context length, W&BHyperparameter schedules, data pipeline, and experiment tracking

System architecture

How service-mode training works

Datums

A Datum is the unit of training data sent to the remote GPU. It wraps tokenized input and per-token weights that your loss function needs. Token weights tell the loss function which tokens to train on:
  • 0.0 = prompt token (don’t train on this)
  • 1.0 = response token (train on this)
import tinker
import torch
from tinker_cookbook.supervised.common import datum_from_model_input_weights

tokens = tokenizer.encode("What is 2+2? The answer is 4.")
prompt_len = len(tokenizer.encode("What is 2+2? "))

weights = torch.zeros(len(tokens), dtype=torch.float32)
weights[prompt_len:] = 1.0  # Train on response tokens only

datum = datum_from_model_input_weights(
    tinker.ModelInput.from_ints(tokens),
    weights,
    max_length=4096,
)

Logprobs and forward_backward_custom

When you call forward_backward_custom, the GPU runs a forward pass and returns per-token log-probabilities as PyTorch tensors with requires_grad=True. Your loss function computes a scalar loss, the API calls loss.backward(), and gradients are sent back to the GPU for the model backward pass.
def my_loss_fn(data, logprobs_list):
    loss = compute_something(logprobs_list)
    return loss, {"loss": loss.item()}

result = training_client.forward_backward_custom(datums, my_loss_fn).result()
After accumulating gradients, call optim_step to apply the optimizer update:
import tinker

training_client.optim_step(
    tinker.AdamParams(learning_rate=1e-5, beta1=0.9, beta2=0.999, eps=1e-8, weight_decay=0.01)
).result()

Futures

All training client API calls return futures. Call .result() to block until completion. Without .result(), errors are silently swallowed.

Checkpointing and weight sync

After training, you export checkpoints for serving:
  • Base checkpoint: Full model weights. Use for the first checkpoint.
  • Delta checkpoint: Only the diff from the previous base (~10x smaller). Use for subsequent checkpoints.
Weight sync pushes a checkpoint onto a running inference deployment without restarting it, enabling evaluation under serving conditions during training.

Key APIs

APIPurpose
TrainerJobManagerCreate, resume, reconnect, and delete service-mode trainer jobs
FireworksClientStandalone checkpoint operations such as listing checkpoints or promoting a model without a live training instance
FiretitanServiceClientConnect to a live trainer endpoint and create a FiretitanTrainingClient
FiretitanTrainingClientforward_backward_custom, optim_step, checkpointing methods
DeploymentManagerCreate deployments, weight sync, and warmup
DeploymentSamplerClient-side tokenized sampling from deployments
WeightSyncerManages checkpoint + weight sync lifecycle with delta chaining

Renderers

Chat-template formatting, stop-token handling, and loss-weight masking for SFT/DPO datasets are handled by renderers — pluggable per-model classes that turn raw conversations into the trainer’s Datum shape. Most users never touch a renderer directly; cookbook recipes pick the right one for the base_model you set. If you need to author a new one or debug parity against HuggingFace, the implementation depth lives in the cookbook’s skills/renderer/ skill.

Next steps