Request throughput scales with your GPU allocation. Base allocations include:

  • Up to 8 A100 GPUs
  • Up to 8 H100 GPUs

On-demand deployments offer several advantages:

  • Predictable pricing based on time units, not token I/O
  • Protected latency and performance, independent of traffic on the serverless platform
  • Choice of GPUs, including A100s and H100s

Need more GPUs? Contact us to discuss higher allocations for your specific use case.