Deployment & Infrastructure
What are the rate limits for on-demand deployments?
Request throughput scales with your GPU allocation. Base allocations include:
- Up to 8 A100 GPUs
- Up to 8 H100 GPUs
On-demand deployments offer several advantages:
- Predictable pricing based on time units, not token I/O
- Protected latency and performance, independent of traffic on the serverless platform
- Choice of GPUs, including A100s and H100s
Need more GPUs? Contact us to discuss higher allocations for your specific use case.