Performance optimization

Performance improvement

Q: What are the techniques to improve performance?

To optimize model performance, consider the following techniques:

Quantization
Check model type: Determine whether the model is GQA (Grouped Query Attention) or MQA (Multi-Query Attention).
Increase batch size to improve throughput.

Benchmarking

Q: How can we benchmark?

There are multiple ways to benchmark your deployment’s performance:

Use our open-source load-testing tool
Develop custom performance testing scripts
Integrate with monitoring tools to track metrics

Model latency

Q: What’s the latency for small, medium, and large LLM models?

Model latency and performance depend on various factors:

Input/output prompt lengths
Model quantization
Model sharding
Disaggregated prefill processes
Hardware configuration
Multiple layers of caching
Fire optimizations
LoRA adapters (Low-Rank Adaptation)

Our team specializes in personalizing model performance. We work with you to understand your traffic patterns and create customized deployment templates that maximize performance for your use case.

Performance factors

Q: What factors affect model latency and performance?

Key factors that impact latency and performance include:

Model architecture and size
Hardware configuration
Network conditions
Request patterns
Batch size settings
Caching implementation

Best practices

Q: What are the best practices for optimizing performance?

For optimal performance, follow these recommendations:

Choose an appropriate model size for your specific use case.
Implement batching strategies to improve efficiency.
Use quantization where applicable to reduce computational load.
Monitor and adjust scaling parameters to meet demand.
Optimize prompt lengths to reduce processing time.
Implement caching to minimize repeated calculations.

Additional resources

Discord Community: discord.gg/fireworks-ai
Email Support: inquiries@fireworks.ai
Documentation: Fireworks.ai docs

On this page

Performance improvement
Benchmarking
Model latency
Performance factors
Best practices
Additional resources

Account & Access

Billing & Pricing

Deployment & Infrastructure

Models & Inference

Fine-tuning

Security & Compliance

Support & General

Performance improvement

Benchmarking

Model latency

Performance factors

Best practices

Additional resources

Account & Access

Billing & Pricing

Deployment & Infrastructure

Models & Inference

Fine-tuning

Security & Compliance

Support & General

​Performance improvement

​Benchmarking

​Model latency

​Performance factors

​Best practices

​Additional resources

Performance improvement

Benchmarking

Model latency

Performance factors

Best practices

Additional resources