Skip to main content
You can tune models for free on Fireworks. Models under 16B parameters are available for free tuning—when creating a fine-tuning job in the UI, filter for free tuning models in the model selection area on the fine-tuning creation page. If kicking off jobs from the terminal, you can find the model ID from the Model Library.

Interactive cost calculator

Select your model and training configuration to get an instant cost estimate. The calculator uses the following formulas:
  1. Total tokens: Prompts × Epochs × Response candidates × (Max tokens × 0.6)
  2. GPU hours: (Total tokens ÷ 1M) × (GPU hours per million tokens range, varies by model size)
  3. Cost: GPU hours × GPU rate per hour
You can derive wall-clock training time from the estimate as: Training time = GPU hours ÷ Number of GPUs. The GPU hours per million tokens range varies by model size and accounts for variability in model efficiency, system overhead, and actual response lengths. Ranges are based on actual RFT job data.
Order-of-magnitude estimates only. This calculator provides estimates and is not intended for real forecasting or budgeting. Actual costs may vary significantly.

How RFT pricing works

Reinforcement fine-tuning jobs are billed based on GPU-seconds consumed during training. The total cost depends on three main factors:
  1. Model size — Determines how many GPUs are needed and the per-GPU-hour rate
  2. Training dataset — How much data is processed (dataset size × epochs × rollouts)
  3. Rollout generation — Token generation during training (max tokens × rollouts per prompt)

Cost formula

The approximate cost of an RFT job can be estimated as: Cost=GPU-hours×Price per GPU-hour\text{Cost} = \text{GPU-hours} \times \text{Price per GPU-hour} Where GPU-hours depend on: GPU-hoursNum GPUs×(Prompts×Epochs×Rollouts (n)×Avg tokens per rolloutThroughput (tokens/sec))÷3600\text{GPU-hours} \approx \text{Num GPUs} \times \left(\frac{\text{Prompts} \times \text{Epochs} \times \text{Rollouts (n)} \times \text{Avg tokens per rollout}}{\text{Throughput (tokens/sec)}}\right) \div 3600 The key variables are:
VariableDescriptionHow to control
Num GPUsGPUs required for the modelDetermined by model size
PromptsNumber of rows in your datasetYour dataset size
EpochsPasses through the dataset--epochs flag (default: 1)
Response candidates (n)Responses generated per prompt--n flag (default: 4)
Avg tokens per rolloutAverage response length--max-tokens flag (default: 2048)
ThroughputTokens generated per secondDetermined by model + hardware
Training time directly translates to cost: Cost = Training time × Num GPUs × GPU-hour rate. Check the pricing page for current GPU-hour rates.

How parameters affect cost

See how each parameter change impacts your total cost relative to a baseline configuration (500 prompts, 1 epoch, n=4, 2048 max tokens):
ChangeCost impactExplanation
Double dataset size (1000 prompts)~2×Linear scaling with dataset size
Double rollouts (n=8)~2×Linear scaling with rollout count
Double max tokens (4096)~1.5–2×More tokens per rollout
Add epoch (epochs=2)~2×Full additional pass through data
Double LoRA rank (16 → 32)~1.2–1.5×More trainable parameters
Halve max tokens (1024)~0.5–0.7×Fewer tokens generated
Halve rollouts (n=2)~0.5×Fewer rollouts but less learning signal

Cost optimization tips

Use models under 16B parameters for initial experimentation. Iterate on your evaluator and dataset with qwen3-0p6b or llama-v3p1-8b-instruct before moving to larger models.This lets you:
  • Validate your evaluator logic at zero cost
  • Test dataset quality and format
  • Tune rollout parameters
  • Establish baseline reward curves
Set --max-tokens to the minimum needed for your task:
  • Short outputs (classification, short answers): 256–512 tokens
  • Medium outputs (code generation, summaries): 1024–2048 tokens
  • Long outputs (detailed analysis, multi-step reasoning): 4096+ tokens
Every token generated during rollouts costs compute. Don’t use 16384 max tokens if your task only needs 512.
Start with 1 epoch (default). Most RFT jobs converge well within a single pass through the data. Add more epochs only if the reward curve is still climbing at the end of training.
Slow evaluators increase wall-clock training time and therefore cost:
  • Keep evaluations under 5 seconds per rollout
  • Cache expensive computations
  • For remote evaluators, ensure your server can handle concurrent requests
  • Avoid unnecessary API calls in your evaluation logic
Evaluator complexity impact: Simple evaluators (self-contained) have minimal overhead. Evaluators with calls to external services, such as LLM-as-judge use cases or company-specific endpoints, may have variable training time due to rate limits by model providers or other services.
A smaller, high-quality dataset often outperforms a larger, noisy one:
  • Remove duplicate or near-duplicate prompts
  • Ensure prompts are diverse and representative
  • Start with 200–500 well-chosen prompts
  • Quality over quantity reduces cost while maintaining performance

Example cost scenarios

Goal: Test an evaluator on a small model
ParameterValue
ModelQwen3 0.6B
Dataset100 prompts
Epochs1
Rollouts (n)4
Max tokens2048
Estimated costFree
Estimated time~15–30 minutes
Best for: Initial evaluator development and testing.
Goal: Train a capable model for production use
ParameterValue
ModelLlama 3.1 8B Instruct
Dataset500 prompts
Epochs1
Rollouts (n)4
Max tokens2048
Estimated costFree
Estimated time~1–2 hours
Best for: Production workloads that can use an 8B model.
Goal: Train a large model for maximum quality
ParameterValue
ModelLlama 3.3 70B Instruct
Dataset500 prompts
Epochs1
Rollouts (n)4
Max tokens2048
Estimated costTraining hours × 8 GPUs × rate
Estimated time~1–2 hours
Check the Fireworks Pricing page for the current GPU-hour rate. For a 2-hour job on 8 GPUs, multiply: 2 × 8 × (rate per GPU-hour).
Goal: Maximum quality with large model and more rollouts
ParameterValue
ModelDeepSeek V3
Dataset1000 prompts
Epochs2
Rollouts (n)8
Max tokens4096
Estimated costTraining hours × 8 GPUs × rate
Estimated time~8–16 hours
This is a larger job. The cost scales with training time: more prompts, epochs, rollouts, and tokens all increase total GPU-hours.

Monitoring costs during training

Cost information is only available after your job completes:
  1. Dashboard: The Fireworks Dashboard displays the final cost on the RFT job page once training finishes
  2. Training progress: While the job is running, you can monitor elapsed time and estimated completion in the job overview
  3. Early stopping: You can cancel a job early if needed—the model checkpoint from the last completed step is still usable. The final cost will be calculated based on GPU-seconds consumed up to the cancellation point.
If a job is running longer than expected, check your evaluator performance. Slow evaluators are the most common cause of unexpectedly long (and expensive) training runs.

Next steps