Does the API support batching and load balancing?

Current capabilities include:

Load balancing: Yes, supported out of the box
Continuous batching: Yes, supported
Batch inference: Not currently supported (on the roadmap)
- Note: For batch use cases, we recommend sending multiple parallel HTTP requests to the deployment while maintaining some fixed level of concurrency.
Streaming: Yes, supported