Costs & management

Deployment costs

Q: Are there costs associated with deploying fine-tuned models? Fine-tuned (LoRA) models require a dedicated deployment. Here’s what you need to know: What you pay for:

Deployment costs on a per-GPU-second basis for hosting the model
Usage costs on a per-token basis when the model is used for inference
The fine-tuning process itself, if applicable

Deployment options:

Live-merge deployment: Deploy your LoRA model with weights merged into the base model for optimal performance
Multi-LoRA deployment: Deploy up to 100 LoRA models as addons on a single base model deployment

For more details, see the Deploying Fine Tuned Models guide.

Model availability

Q: Do you provide notice before removing model availability? Yes, we provide advance notice before removing models from the serverless infrastructure:

Minimum 2 weeks’ notice before model removal
Longer notice periods may be provided for popular models, depending on usage
Higher-usage models may have extended deprecation timelines

Best Practices:

Monitor announcements regularly.
Prepare a migration plan in advance.
Test alternative models to ensure continuity.
Keep your contact information updated for timely notifications.

Additional resources

Discord Community: discord.gg/fireworks-ai
Email Support: inquiries@fireworks.ai
Documentation: Fireworks.ai docs

Reference

Examples

FAQ

Deployment costs

Model availability

Additional resources

Reference

Examples

FAQ

​Deployment costs

​Model availability

​Additional resources

Deployment costs

Model availability

Additional resources