How to get performance metrics for streaming responses?
Performance Metrics Overview
The Inference API returns several per-request metrics in the response. They can be useful for one-off debugging or can be logged by the client in their preferred observability tool. For aggregate metrics, see the usage dashboard.
Non-streaming requests: Performance metrics are always included in response headers (e.g., fireworks-prompt-tokens
, fireworks-server-time-to-first-token
).
Streaming requests: Only selected performance metrics, such as “fireworks-server-time-to-first-token,” are available because HTTP headers must be sent before the first token can be streamed. Use the perf_metrics_in_response
body parameter to include all metrics in the last SSE event of the response body.
Using perf_metrics_in_response
To get performance metrics for streaming responses, set the perf_metrics_in_response
parameter to true
in your request. This will include performance data in the response body under the perf_metrics
field.
Response Body Location
For streaming responses, performance metrics are included in the response body under the perf_metrics
field in the final chunk (the one with finish_reason
set). This is because headers may not be accessible during streaming.
Example with Fireworks Build SDK
Example with cURL
Available Metrics
For detailed information about all available performance metrics, see the API reference documentation.