Fireworks AI Docs home page
Search...
⌘K
Ask AI
Community
Status
Dashboard
Dashboard
Search...
Navigation
Billing & Pricing
Is prompt caching billed differently for serverless models?
Documentation
SDKs
CLI
API Reference
Model Library
FAQ
Changelog
Account & Access
Company account access
Close account
Multiple accounts login
GitHub authentication email
LinkedIn authentication email
Billing & Pricing
Pricing structure
Fine-tuned model fees
Bulk usage discounts
Serverless discounts
Credits & billing system
Account suspension reasons
$1 credit depleted
Missing credits issue
Invoice vs credits
Credit receipts
Models API billing
Serverless prompt caching billing
Input image pricing
Deployment & Infrastructure
Performance optimization
Performance benchmarking
Model latency ranges
Performance factors
Performance best practices
Serverless latency guarantees
Serverless SLAs
Serverless quotas
Fine-tuned serverless costs
Model removal notice
Serverless timeout issues
System scaling
Auto scaling support
Throughput capacity
Request handling factors
Autoscaling cost impact
On-demand rate limits
On-demand billing
GPU deployment billing
GPU selection guide
Custom model deployment issues
Deployment performance expectations
Performance consultation
Single replica optimization
Models & Inference
Custom base models
Serverless model availability
Model availability requests
Llama 3.1 405B quantization
API batching & load balancing
Request handling capacity
Safety filter controls
Token limit controls
Streaming performance metrics
FLUX multiple images
FLUX image-to-image
FLUX custom LoRA
SDXL ControlNet sizing
Fine-tuning
Fine-tuning service
Fine-tuning model support
Fine-tuned model access
firectl invalid ID errors
Llama 3.1 LoRA deployment
Security & Compliance
Data encryption at rest
Data encryption in transit
Client-side encryption options
Security policy documentation
LLM model guardrails
Private network connections
Security certifications
Support & General
General support
Performance support
Deployment regions
Support options
Support process
Enterprise support
Enterprise support Slack
Enterprise support tiers & SLAs
Enterprise tier quotas
Billing & Pricing
Is prompt caching billed differently for serverless models?
Copy page
No,
prompt caching does not affect billing for serverless models
. You are charged the same amount regardless of whether your request benefits from prompt caching or not.
Was this page helpful?
Yes
No
Are calls to the Models API billable?
Previous
How many tokens per image?
Next
Assistant
Responses are generated using AI and may contain mistakes.