> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

> Fireworks Training API — custom training loops with full Python control over objectives, while Fireworks handles distributed GPU infrastructure.

<Info>
  The Training API is currently in **private preview**. [Request early access](https://fireworks.ai/contact-training) to get started.
</Info>

<Tip>
  **Using a code agent?** Clone [fw-ai/cookbook](https://github.com/fw-ai/cookbook). The cookbook includes the [`skills/dev/`](https://github.com/fw-ai/cookbook/tree/main/skills/dev) skill, which gives agents repo-specific guidance for setup, debugging, hotload, RL recipe internals, and checkpoint promotion.
</Tip>

## What is the Training API?

Fireworks Training API lets you write training logic in plain Python on your local machine while model computation runs on remote GPUs managed by Fireworks.

Most users should start from [cookbook recipes](/fine-tuning/training-api/cookbook/overview), the recommended entry point for standard SFT, DPO, GRPO-style training, and async RL loops for agentic RL. Fork a recipe when you want to adapt an existing loop with your own loss, reward, rollout function, data loading, or checkpointing behavior.

Use the Direct Training SDK when you need full control over training behavior.

| Mode                    | Best for                                                                                                  | Infrastructure                                                                            |
| ----------------------- | --------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| **Cookbook recipes**    | Recommended entry point for adapting existing SFT/DPO/GRPO-style loops, including async RL for agentic RL | You configure and implement simple loss, reward, or rollout functions; platform runs GPUs |
| **Direct Training SDK** | Full control over training behavior                                                                       | You drive the training flow; platform runs GPUs                                           |

## Who does what

| Fireworks handles                                                        | Cookbook recipes handle                                                    | Direct Training SDK users implement                              |
| ------------------------------------------------------------------------ | -------------------------------------------------------------------------- | ---------------------------------------------------------------- |
| GPU provisioning and cluster management                                  | Training loop structure for supported recipes                              | Training loop logic (`forward_backward_custom` + `optim_step`)   |
| Service-mode trainer lifecycle (create, health-check, reconnect, delete) | Resource setup, health checks, reconnect, and cleanup                      | Manager/client wiring when working below recipe utilities        |
| Distributed forward pass, backward pass, optimizer execution             | Common losses and reward/evaluation plumbing                               | Loss function and batch construction                             |
| Checkpoint storage and export                                            | Checkpoint save, resume, promotion, and weight sync helpers                | Checkpoint calls (`save_weights_for_sampler_ext`, DCP snapshots) |
| Inference deployments and hotload                                        | Deployment sampling and serving-integrated evaluation for RL recipes       | Custom rollout, sampling, and evaluation logic                   |
| Preemption recovery and job resume                                       | Resume logic for supported recipe checkpoints                              | Resume policy and state restoration calls                        |
| Distributed training (multi-node, sharding, FSDP)                        | Config surfaces for learning rate, grad accumulation, context length, W\&B | Hyperparameter schedules, data pipeline, and experiment tracking |

## System architecture

```mermaid theme={null}
flowchart LR
  local["Your Python Code<br/>(loss function, data loading, metrics)"] <-->|HTTP API| gpu["Fireworks GPUs<br/>(forward pass, backward pass, optimizer)"]
```

## How service-mode training works

### Datums

A **Datum** is the unit of training data sent to the remote GPU. It wraps tokenized input and per-token weights that your loss function needs.

Token weights tell the loss function which tokens to train on:

* **`0.0`** = prompt token (don't train on this)
* **`1.0`** = response token (train on this)

```python theme={null}
import tinker
import torch
from tinker_cookbook.supervised.common import datum_from_model_input_weights

tokens = tokenizer.encode("What is 2+2? The answer is 4.")
prompt_len = len(tokenizer.encode("What is 2+2? "))

weights = torch.zeros(len(tokens), dtype=torch.float32)
weights[prompt_len:] = 1.0  # Train on response tokens only

datum = datum_from_model_input_weights(
    tinker.ModelInput.from_ints(tokens),
    weights,
    max_length=4096,
)
```

### Logprobs and forward\_backward\_custom

When you call `forward_backward_custom`, the GPU runs a forward pass and returns **per-token log-probabilities** as PyTorch tensors with `requires_grad=True`. Your loss function computes a scalar loss, the API calls `loss.backward()`, and gradients are sent back to the GPU for the model backward pass.

```python theme={null}
def my_loss_fn(data, logprobs_list):
    loss = compute_something(logprobs_list)
    return loss, {"loss": loss.item()}

result = training_client.forward_backward_custom(datums, my_loss_fn).result()
```

After accumulating gradients, call `optim_step` to apply the optimizer update:

```python theme={null}
import tinker

training_client.optim_step(
    tinker.AdamParams(learning_rate=1e-5, beta1=0.9, beta2=0.999, eps=1e-8, weight_decay=0.01)
).result()
```

### Futures

All training client API calls return **futures**. Call `.result()` to block until completion. Without `.result()`, errors are silently swallowed.

### Checkpointing and weight sync

After training, you export checkpoints for serving:

* **Base checkpoint**: Full model weights. Use for the first checkpoint.
* **Delta checkpoint**: Only the diff from the previous base (\~10x smaller). Use for subsequent checkpoints.

**Weight sync** pushes a checkpoint onto a running inference deployment without restarting it, enabling evaluation under serving conditions during training.

```mermaid theme={null}
flowchart LR
  train["Train step"] --> save["save_weights_for_sampler_ext"]
  save --> hotload["Hotload onto deployment"]
  hotload --> sample["Sample via deployment"]
  sample --> eval["Evaluate quality"]
  eval --> train
```

## Key APIs

| API                                                                             | Purpose                                                                                                            |
| ------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| [`TrainerJobManager`](/fine-tuning/training-api/reference/trainer-job-manager)  | Create, resume, reconnect, and delete service-mode trainer jobs                                                    |
| [`FireworksClient`](/fine-tuning/training-api/reference/fireworks-client)       | Standalone checkpoint operations such as listing checkpoints or promoting a model without a live training instance |
| [`FiretitanServiceClient`](/fine-tuning/training-api/reference/service-client)  | Connect to a live trainer endpoint and create a `FiretitanTrainingClient`                                          |
| [`FiretitanTrainingClient`](/fine-tuning/training-api/reference/service-client) | `forward_backward_custom`, `optim_step`, checkpointing methods                                                     |
| [`DeploymentManager`](/fine-tuning/training-api/reference/deployment-manager)   | Create deployments, weight sync, and warmup                                                                        |
| [`DeploymentSampler`](/fine-tuning/training-api/reference/deployment-sampler)   | Client-side tokenized sampling from deployments                                                                    |
| [`WeightSyncer`](/fine-tuning/training-api/reference/weight-syncer)             | Manages checkpoint + weight sync lifecycle with delta chaining                                                     |

## Renderers

Chat-template formatting, stop-token handling, and loss-weight masking for SFT/DPO datasets are handled by **renderers** — pluggable per-model classes that turn raw conversations into the trainer's `Datum` shape. Most users never touch a renderer directly; cookbook recipes pick the right one for the `base_model` you set. If you need to author a new one or debug parity against HuggingFace, the implementation depth lives in the cookbook's [`skills/renderer/`](https://github.com/fw-ai/cookbook/tree/main/skills/renderer) skill.

## Next steps

* [Quickstart](/fine-tuning/training-api/quickstart) — get a custom training loop running in minutes
* [Training and Sampling](/fine-tuning/training-api/training-and-sampling) — end-to-end API walkthrough
* [Loss Functions](/fine-tuning/training-api/loss-functions) — built-in and custom loss functions
* [Vision Inputs](/fine-tuning/training-api/vision-inputs) — fine-tune vision-language models with image and text data
* [The Cookbook](/fine-tuning/training-api/cookbook/overview) — ready-to-run recipes for SFT, DPO, ORPO, GRPO/IGPO, and async RL (experimental)
