Fireworks Benchmark Tool
Use our open-source benchmarking tool to measure and optimize your deployment’s performance: Fireworks Benchmark Tool This tool allows you to:- Test throughput and latency under various load conditions
- Simulate production traffic patterns
- Identify performance bottlenecks
- Compare different deployment configurations
Installation
Basic usage
Run a basic benchmark test:Key metrics to monitor
When benchmarking your deployment, focus on these key metrics:- Throughput: Requests per second (RPS) your deployment can handle
- Latency: Time to first token (TTFT) and end-to-end response time
- Token generation rate: Tokens per second during generation
- Error rate: Failed requests under load
Custom benchmarking
You can also develop custom performance testing scripts or integrate with monitoring tools to track metrics over time. Consider:- Using production-like request patterns and payloads
- Testing with various concurrency levels
- Monitoring resource utilization (GPU, memory, network)
- Testing autoscaling behavior under load
Best practices
- Warm up your deployment: Run a few requests before benchmarking to ensure models are loaded
- Test realistic scenarios: Use request patterns and payloads similar to your production workload
- Gradually increase load: Start with low concurrency and gradually increase to find your deployment’s limits
- Monitor for errors: Track error rates and response codes to identify issues under load
- Compare configurations: Test different deployment shapes, quantization levels, and hardware to optimize cost and performance