Export metrics from your dedicated deployments to your observability stack
base_model
: The base model identifier (e.g., “accounts/fireworks/models/deepseek-v3”)deployment
: Full deployment path (e.g., “accounts/account-name/deployments/deployment-id”)deployment_account
: The account namedeployment_id
: The deployment identifierrequest_counter_total:sum_by_deployment
: Request rate per deploymenttokens_cached_prompt_total:sum_by_deployment
: Rate of cached prompt tokens per deploymenttokens_prompt_total:sum_by_deployment
: Rate of total prompt tokens processed per deploymentlatency_generation_per_token_ms_bucket:sum_by_deployment
: Per-token generation time distributionlatency_generation_queue_ms_bucket:sum_by_deployment
: Time spent waiting in generation queuelatency_overall_ms_bucket:sum_by_deployment
: End-to-end request latency distributionlatency_to_first_token_ms_bucket:sum_by_deployment
: Time to first token distributionlatency_prefill_ms_bucket:sum_by_deployment
: Prefill processing time distributionlatency_prefill_queue_ms_bucket:sum_by_deployment
: Time spent waiting in prefill queuetokens_generated_per_request_bucket:sum_by_deployment
: Distribution of generated tokens per requesttokens_prompt_per_request_bucket:sum_by_deployment
: Distribution of prompt tokens per requestgenerator_kv_blocks_fraction:avg_by_deployment
: Average fraction of KV cache blocks in usegenerator_kv_slots_fraction:avg_by_deployment
: Average fraction of KV cache slots in usegenerator_model_forward_time:avg_by_deployment
: Average time spent in model forward passrequests_coordinator_concurrent_count:avg_by_deployment
: Average number of concurrent requestsprefiller_prompt_cache_ttl:avg_by_deployment
: Average prompt cache time-to-live