Building reliable applications requires handling network conditions, transient errors, and long-running requests. This guide covers recommended patterns for production use.Documentation Index
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
Use this file to discover all available pages before exploring further.
Timeout configuration
Set timeouts based on your workload type:| Workload | Recommended client timeout |
|---|---|
| Interactive / chat | 30–60 seconds |
| Agentic (tool calls, multi-step) | 5–30 minutes |
| Large model inference (long context) | 10–30 minutes |
| Batch job submission | 60 seconds (results are async) |
Python SDK
Raw HTTP
Retry logic
Which errors are retryable
| Status | Meaning | Retry? |
|---|---|---|
429 | Rate limit | ✅ Yes — with backoff |
500 | Internal server error | ✅ Yes — transient |
502 | Bad gateway | ✅ Yes — transient |
503 | Service unavailable | ✅ Yes — with backoff |
504 | Gateway timeout | ✅ Yes — transient |
400 | Bad request | ❌ No — fix the request |
401 | Unauthorized | ❌ No — check API key |
404 | Not found | ❌ No — check model/deployment ID |
422 | Unprocessable entity | ❌ No — fix the request body |
Exponential backoff with jitter
OpenAI SDK built-in retry
Handling 429 rate limits
On serverless: Limits scale automatically with sustained usage. For immediate capacity, contact support or switch to a dedicated deployment. On dedicated deployments: Increase concurrency by raising replica counts (for example withfirectl deployment update and autoscaling settings). See Autoscaling.
Long-running training jobs
For RL / RFT trainer jobs, usereconnect_and_wait on the job manager to recover from preemption or transient failures. See Trainer job manager for parameters and examples.
To preserve optimizer state across interruptions, set dcp_save_interval in your training config. See RFT parameters reference.