On-demand deployments come with automatic cost optimization features:

  • Default autoscaling: Automatically scales to 0 replicas when not in use
  • Pay for what you use: Charged only for GPU time when replicas are active
  • Flexible configuration: Customize autoscaling behavior to match your needs

Best practices for cost management:

  1. Leverage default autoscaling: The system automatically scales down deployments when not in use
  2. Customize carefully: While you can modify autoscaling behavior using our configuration options, note that preventing scale-to-zero will result in continuous GPU charges
  3. Consider your use case: For intermittent or low-frequency usage, serverless deployments might be more cost-effective

For detailed configuration options, see our deployment guide.