Skip to main content
Process large volumes of requests asynchronously at 50% lower cost. Batch API is ideal for:
  • Production-scale inference workloads
  • Large-scale testing and benchmarking
  • Training smaller models with larger ones (distillation guide)
Batch jobs automatically use prompt caching for additional 50% cost savings on cached tokens. Maximize cache hits by placing static content first in your prompts.

Getting Started

Datasets must be in JSONL format (one JSON object per line):Requirements:
  • File format: JSONL (each line is a valid JSON object)
  • Size limit: Under 500MB
  • Required fields: custom_id (unique) and body (request parameters)
Example dataset:
{"custom_id": "request-1", "body": {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}
{"custom_id": "request-2", "body": {"messages": [{"role": "user", "content": "Explain quantum computing"}], "temperature": 0.7}}
{"custom_id": "request-3", "body": {"messages": [{"role": "user", "content": "Tell me a joke"}]}}
Save as batch_input_data.jsonl locally.
You can simply navigate to the dataset tab, click Create Dataset and follow the wizard.Dataset Upload
Navigate to the Batch Inference tab and click “Create Batch Inference Job”. Select your input dataset:BIJ Dataset SelectChoose your model:BIJ Model SelectConfigure optional settings:BIJ Optional Settings
View all your batch inference jobs in the dashboard:BIJ List
Navigate to the output dataset and download the results:BIJ Dataset Download
The output dataset contains two files: a results file (successful responses in JSONL format) and an error file (failed requests with debugging info).

Reference

Batch jobs progress through several states:
StateDescription
VALIDATINGDataset is being validated for format requirements
PENDINGJob is queued and waiting for resources
RUNNINGActively processing requests
COMPLETEDAll requests successfully processed
FAILEDUnrecoverable error occurred (check status message)
EXPIREDExceeded 24-hour limit (completed requests are saved)
  • Base Models – Any model in the Model Library
  • Custom Models – Your uploaded or fine-tuned models
Note: Newly added models may have a delay before being supported. See Quantization for precision info.
  • Per-request limits: Same as Chat Completion API limits
  • Input dataset: Max 500MB
  • Output dataset: Max 8GB (job may expire early if reached)
  • Job timeout: 24 hours maximum
Jobs expire after 24 hours. Completed rows are billed and saved to the output dataset.Resume processing:
firectl create batch-inference-job \
  --continue-from original-job-id \
  --model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --output-dataset-id new-output-dataset
This processes only unfinished/failed requests from the original job.Download complete lineage:
firectl download dataset output-dataset-id --download-lineage
Downloads all datasets in the continuation chain.
  • Validate thoroughly: Check dataset format before uploading
  • Descriptive IDs: Use meaningful custom_id values for tracking
  • Optimize tokens: Set reasonable max_tokens limits
  • Monitor progress: Track long-running jobs regularly
  • Cache optimization: Place static content first in prompts

Next Steps