Interactive cost calculator
Select your model and training configuration to get an instant cost estimate. The calculator uses the following formulas:- Total tokens: Prompts × Epochs × Response candidates × (Max tokens × 0.6)
- GPU hours: (Total tokens ÷ 1M) × (GPU hours per million tokens range, varies by model size)
- Cost: GPU hours × GPU rate per hour
How RFT pricing works
Reinforcement fine-tuning jobs are billed based on GPU-seconds consumed during training. The total cost depends on three main factors:- Model size — Determines how many GPUs are needed and the per-GPU-hour rate
- Training dataset — How much data is processed (dataset size × epochs × rollouts)
- Rollout generation — Token generation during training (max tokens × rollouts per prompt)
Cost formula
The approximate cost of an RFT job can be estimated as: Where GPU-hours depend on: The key variables are:| Variable | Description | How to control |
|---|---|---|
| Num GPUs | GPUs required for the model | Determined by model size |
| Prompts | Number of rows in your dataset | Your dataset size |
| Epochs | Passes through the dataset | --epochs flag (default: 1) |
| Response candidates (n) | Responses generated per prompt | --n flag (default: 4) |
| Avg tokens per rollout | Average response length | --max-tokens flag (default: 2048) |
| Throughput | Tokens generated per second | Determined by model + hardware |
Training time directly translates to cost: Cost = Training time × Num GPUs × GPU-hour rate. Check the pricing page for current GPU-hour rates.
How parameters affect cost
See how each parameter change impacts your total cost relative to a baseline configuration (500 prompts, 1 epoch, n=4, 2048 max tokens):| Change | Cost impact | Explanation |
|---|---|---|
| Double dataset size (1000 prompts) | ~2× | Linear scaling with dataset size |
| Double rollouts (n=8) | ~2× | Linear scaling with rollout count |
| Double max tokens (4096) | ~1.5–2× | More tokens per rollout |
| Add epoch (epochs=2) | ~2× | Full additional pass through data |
| Double LoRA rank (16 → 32) | ~1.2–1.5× | More trainable parameters |
| Halve max tokens (1024) | ~0.5–0.7× | Fewer tokens generated |
| Halve rollouts (n=2) | ~0.5× | Fewer rollouts but less learning signal |
Cost optimization tips
Start with free models
Start with free models
Use models under 16B parameters for initial experimentation. Iterate on your evaluator and dataset with
qwen3-0p6b or llama-v3p1-8b-instruct before moving to larger models.This lets you:- Validate your evaluator logic at zero cost
- Test dataset quality and format
- Tune rollout parameters
- Establish baseline reward curves
Limit max tokens
Limit max tokens
Set
--max-tokens to the minimum needed for your task:- Short outputs (classification, short answers): 256–512 tokens
- Medium outputs (code generation, summaries): 1024–2048 tokens
- Long outputs (detailed analysis, multi-step reasoning): 4096+ tokens
Use 1 epoch first
Use 1 epoch first
Start with 1 epoch (default). Most RFT jobs converge well within a single pass through the data. Add more epochs only if the reward curve is still climbing at the end of training.
Optimize evaluator speed
Optimize evaluator speed
Slow evaluators increase wall-clock training time and therefore cost:
- Keep evaluations under 5 seconds per rollout
- Cache expensive computations
- For remote evaluators, ensure your server can handle concurrent requests
- Avoid unnecessary API calls in your evaluation logic
Curate your dataset
Curate your dataset
A smaller, high-quality dataset often outperforms a larger, noisy one:
- Remove duplicate or near-duplicate prompts
- Ensure prompts are diverse and representative
- Start with 200–500 well-chosen prompts
- Quality over quantity reduces cost while maintaining performance
Example cost scenarios
Scenario 1: Quick prototype (Free)
Scenario 1: Quick prototype (Free)
Goal: Test an evaluator on a small model
Best for: Initial evaluator development and testing.
| Parameter | Value |
|---|---|
| Model | Qwen3 0.6B |
| Dataset | 100 prompts |
| Epochs | 1 |
| Rollouts (n) | 4 |
| Max tokens | 2048 |
| Estimated cost | Free |
| Estimated time | ~15–30 minutes |
Scenario 2: Production training (Free)
Scenario 2: Production training (Free)
Goal: Train a capable model for production use
Best for: Production workloads that can use an 8B model.
| Parameter | Value |
|---|---|
| Model | Llama 3.1 8B Instruct |
| Dataset | 500 prompts |
| Epochs | 1 |
| Rollouts (n) | 4 |
| Max tokens | 2048 |
| Estimated cost | Free |
| Estimated time | ~1–2 hours |
Scenario 3: Large model training (Paid)
Scenario 3: Large model training (Paid)
Goal: Train a large model for maximum quality
Check the Fireworks Pricing page for the current GPU-hour rate. For a 2-hour job on 8 GPUs, multiply: 2 × 8 × (rate per GPU-hour).
| Parameter | Value |
|---|---|
| Model | Llama 3.3 70B Instruct |
| Dataset | 500 prompts |
| Epochs | 1 |
| Rollouts (n) | 4 |
| Max tokens | 2048 |
| Estimated cost | Training hours × 8 GPUs × rate |
| Estimated time | ~1–2 hours |
Scenario 4: High-quality with more rollouts (Paid)
Scenario 4: High-quality with more rollouts (Paid)
Goal: Maximum quality with large model and more rollouts
This is a larger job. The cost scales with training time: more prompts, epochs, rollouts, and tokens all increase total GPU-hours.
| Parameter | Value |
|---|---|
| Model | DeepSeek V3 |
| Dataset | 1000 prompts |
| Epochs | 2 |
| Rollouts (n) | 8 |
| Max tokens | 4096 |
| Estimated cost | Training hours × 8 GPUs × rate |
| Estimated time | ~8–16 hours |
Monitoring costs during training
Cost information is only available after your job completes:- Dashboard: The Fireworks Dashboard displays the final cost on the RFT job page once training finishes
- Training progress: While the job is running, you can monitor elapsed time and estimated completion in the job overview
- Early stopping: You can cancel a job early if needed—the model checkpoint from the last completed step is still usable. The final cost will be calculated based on GPU-seconds consumed up to the cancellation point.