- Load balancing: Yes, supported out of the box
- Continuous batching: Yes, supported
- Batch inference: Yes, supported via the Batch API
- Streaming: Yes, supported
Models & Inference
Does the API support batching and load balancing?
Current capabilities include:
There’s a model I would like to use that isn’t available on Fireworks. Can I request it?
Previous
What factors affect the number of simultaneous requests that can be handled?
Next