Skip to main content
Process large volumes of requests asynchronously at 50% lower cost. Batch API is ideal for:
  • Production-scale inference workloads
  • Large-scale testing and benchmarking
  • Training smaller models with larger ones (distillation guide)
Batch jobs automatically use prompt caching for additional 50% cost savings on cached tokens. Maximize cache hits by placing static content first in your prompts.

Getting Started

Datasets must be in JSONL format (one JSON object per line):Requirements:
  • File format: JSONL (each line is a valid JSON object)
  • Size limit: Under 500MB
  • Required fields: custom_id (unique) and body (request parameters)
Example dataset:
{"custom_id": "request-1", "body": {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}
{"custom_id": "request-2", "body": {"messages": [{"role": "user", "content": "Explain quantum computing"}], "temperature": 0.7}}
{"custom_id": "request-3", "body": {"messages": [{"role": "user", "content": "Tell me a joke"}]}}
Save as batch_input_data.jsonl locally.
  • UI
  • firectl
  • HTTP API
You can simply navigate to the dataset tab, click Create Dataset and follow the wizard.Dataset Upload
  • UI
  • firectl
  • HTTP API
Navigate to the Batch Inference tab and click “Create Batch Inference Job”. Select your input dataset:BIJ Dataset SelectChoose your model:BIJ Model SelectConfigure optional settings:BIJ Optional Settings
  • UI
  • firectl
  • HTTP API
View all your batch inference jobs in the dashboard:BIJ List
  • UI
  • firectl
  • HTTP API
Navigate to the output dataset and download the results:BIJ Dataset Download
The output dataset contains two files: a results file (successful responses in JSONL format) and an error file (failed requests with debugging info).

Reference

Batch jobs progress through several states:
StateDescription
VALIDATINGDataset is being validated for format requirements
PENDINGJob is queued and waiting for resources
RUNNINGActively processing requests
COMPLETEDAll requests successfully processed
FAILEDUnrecoverable error occurred (check status message)
EXPIREDExceeded 24-hour limit (completed requests are saved)
  • Base Models – Any model in the Model Library
  • Custom Models – Your uploaded or fine-tuned models
Note: Newly added models may have a delay before being supported. See Default Precisions for precision info.
  • Per-request limits: Same as Chat Completion API limits
  • Input dataset: Max 500MB
  • Output dataset: Max 8GB (job may expire early if reached)
  • Job timeout: 24 hours maximum
Jobs expire after 24 hours. Completed rows are billed and saved to the output dataset.Resume processing:
firectl create batch-inference-job \
  --continue-from original-job-id \
  --model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --output-dataset-id new-output-dataset
This processes only unfinished/failed requests from the original job.Download complete lineage:
firectl download dataset output-dataset-id --download-lineage
Downloads all datasets in the continuation chain.
  • Validate thoroughly: Check dataset format before uploading
  • Descriptive IDs: Use meaningful custom_id values for tracking
  • Optimize tokens: Set reasonable max_tokens limits
  • Monitor progress: Track long-running jobs regularly
  • Cache optimization: Place static content first in prompts

Next Steps

I