How does billing work for on-demand deployments? - Fireworks AI Docs

On-demand deployments come with automatic cost optimization features:

Default autoscaling: Automatically scales to 0 replicas when not in use
Pay for what you use: Charged only for GPU time when replicas are active
Flexible configuration: Customize autoscaling behavior to match your needs

Best practices for cost management:

Leverage default autoscaling: The system automatically scales down deployments when not in use
Customize carefully: While you can modify autoscaling behavior using our configuration options, note that preventing scale-to-zero will result in continuous GPU charges
Consider your use case: For intermittent or low-frequency usage, serverless deployments might be more cost-effective

For detailed configuration options, see our deployment guide.

What are the rate limits for on-demand deployments?

How does billing and scaling work for on-demand GPU deployments?