Deployment & Infrastructure
How does billing work for on-demand deployments?
On-demand deployments come with automatic cost optimization features:
- Default autoscaling: Automatically scales to 0 replicas when not in use
- Pay for what you use: Charged only for GPU time when replicas are active
- Flexible configuration: Customize autoscaling behavior to match your needs
Best practices for cost management:
- Leverage default autoscaling: The system automatically scales down deployments when not in use
- Customize carefully: While you can modify autoscaling behavior using our configuration options, note that preventing scale-to-zero will result in continuous GPU charges
- Consider your use case: For intermittent or low-frequency usage, serverless deployments might be more cost-effective
For detailed configuration options, see our deployment guide.