- Load balancing: Yes, supported out of the box
- Continuous batching: Yes, supported
- Batch inference: Not currently supported (on the roadmap)
- Note: For batch use cases, we recommend sending multiple parallel HTTP requests to the deployment while maintaining some fixed level of concurrency.
- Streaming: Yes, supported