Performance benchmarking

Understanding your deployment’s performance under various load conditions is essential for production readiness. Fireworks provides tools and best practices for benchmarking throughput, latency, and identifying bottlenecks.

Fireworks Benchmark Tool

Use our open-source benchmarking tool to measure and optimize your deployment’s performance: Fireworks Benchmark Tool This tool allows you to:

Test throughput and latency under various load conditions
Simulate production traffic patterns
Identify performance bottlenecks
Compare different deployment configurations

Installation

git clone https://github.com/fw-ai/benchmark.git
cd benchmark
pip install -r requirements.txt

Basic usage

Run a basic benchmark test:

python benchmark.py \
  --model "accounts/fireworks/models/llama-v3p1-8b-instruct" \
  --deployment "your-deployment-id" \
  --num-requests 1000 \
  --concurrency 10

Key metrics to monitor

When benchmarking your deployment, focus on these key metrics:

Throughput: Requests per second (RPS) your deployment can handle
Latency: Time to first token (TTFT) and end-to-end response time
Token generation rate: Tokens per second during generation
Error rate: Failed requests under load

Custom benchmarking

You can also develop custom performance testing scripts or integrate with monitoring tools to track metrics over time. Consider:

Using production-like request patterns and payloads
Testing with various concurrency levels
Monitoring resource utilization (GPU, memory, network)
Testing autoscaling behavior under load

Best practices

Warm up your deployment: Run a few requests before benchmarking to ensure models are loaded
Test realistic scenarios: Use request patterns and payloads similar to your production workload
Gradually increase load: Start with low concurrency and gradually increase to find your deployment’s limits
Monitor for errors: Track error rates and response codes to identify issues under load
Compare configurations: Test different deployment shapes, quantization levels, and hardware to optimize cost and performance

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

Fireworks Benchmark Tool

Installation

Basic usage

Key metrics to monitor

Custom benchmarking

Best practices

Next steps

Autoscaling

Client-side optimization

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​Fireworks Benchmark Tool

​Installation

​Basic usage

​Key metrics to monitor

​Custom benchmarking

​Best practices

​Next steps

Autoscaling

Client-side optimization

Fireworks Benchmark Tool

Installation

Basic usage

Key metrics to monitor

Custom benchmarking

Best practices

Next steps