- Production-scale inference workloads
- Large-scale testing and benchmarking
- Training smaller models with larger ones (distillation guide)
Getting Started
1. Prepare Your Dataset
1. Prepare Your Dataset
Datasets must be in JSONL format (one JSON object per line):Requirements:Save as
- File format: JSONL (each line is a valid JSON object)
- Size limit: Under 500MB
- Required fields:
custom_id(unique) andbody(request parameters)
batch_input_data.jsonl locally.2. Upload Your Dataset
2. Upload Your Dataset
- UI
- firectl
- HTTP API
You can simply navigate to the dataset tab, click 
Create Dataset and follow the wizard.
3. Create a Batch Job
3. Create a Batch Job
- UI
- firectl
- HTTP API
Navigate to the Batch Inference tab and click “Create Batch Inference Job”. Select your input dataset:
Choose your model:
Configure optional settings:



4. Monitor Your Job
4. Monitor Your Job
- UI
- firectl
- HTTP API
View all your batch inference jobs in the dashboard:

5. Download Results
5. Download Results
- UI
- firectl
- HTTP API
Navigate to the output dataset and download the results:

Reference
Job states
Job states
Batch jobs progress through several states:
| State | Description |
|---|---|
| VALIDATING | Dataset is being validated for format requirements |
| PENDING | Job is queued and waiting for resources |
| RUNNING | Actively processing requests |
| COMPLETED | All requests successfully processed |
| FAILED | Unrecoverable error occurred (check status message) |
| EXPIRED | Exceeded 24-hour limit (completed requests are saved) |
Supported models
Supported models
- Base Models – Any model in the Model Library
- Custom Models – Your uploaded or fine-tuned models
Limits and constraints
Limits and constraints
- Per-request limits: Same as Chat Completion API limits
- Input dataset: Max 500MB
- Output dataset: Max 8GB (job may expire early if reached)
- Job timeout: 24 hours maximum
Handling expired jobs
Handling expired jobs
Jobs expire after 24 hours. Completed rows are billed and saved to the output dataset.Resume processing:This processes only unfinished/failed requests from the original job.Download complete lineage:Downloads all datasets in the continuation chain.
Best practices
Best practices
- Validate thoroughly: Check dataset format before uploading
- Descriptive IDs: Use meaningful
custom_idvalues for tracking - Optimize tokens: Set reasonable
max_tokenslimits - Monitor progress: Track long-running jobs regularly
- Cache optimization: Place static content first in prompts