- Scaling from 0: No minimum cost when scaled to zero
- Scaling up: Each new replica adds to your total cost proportionally. For example:
- Scaling from 1 to 2 replicas doubles your GPU costs
- If each replica uses multiple GPUs, costs scale accordingly (e.g., scaling from 1 to 2 replicas with 2 GPUs each means paying for 4 GPUs total)
Deployment & Infrastructure
How does autoscaling affect my costs?
What factors affect the number of simultaneous requests that can be handled?
Previous
What are the rate limits for on-demand deployments?
Next