Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Serverless inference is priced per token. Every text or vision request is billed across three dimensions:
  • Input tokens — what you send to the model.
  • Cached input tokens — input tokens served from prompt cache, priced lower.
  • Output tokens — what the model generates.
Embeddings are billed only on input tokens.

How pricing works

  • Prices below are per 1 million tokens in US dollars.
  • Batch inference is billed at 50% of serverless pricing on both input and output. See Batch inference.

Text and vision models

Per-model pricing for headline models. Fast variants appear as adjacent rows.
ModelInputCached inputOutput
Kimi K2.6$0.95$0.16$4.00
Kimi K2.6 Turbo (Fast, Preview)$2.00$0.30$8.00
Kimi K2.5$0.60$0.10$3.00
DeepSeek V4 Pro$1.74$0.145$3.48
DeepSeek V3 family$0.56$0.28$1.68
GLM 5.1$1.40$0.26$4.40
GLM 5.1 Fast (Preview)$2.80$0.52$8.80
GLM 5$1.00$0.20$3.20
GLM 4.7$0.60$0.30$2.20
MiniMax 2.7$0.30$0.06$1.20
MiniMax 2.5$0.30$0.03$1.20
Qwen3 VL 30B A3B$0.15$0.075$0.60
OpenAI GPT OSS 120B$0.15$0.015$0.60
OpenAI GPT OSS 20B$0.07$0.035$0.30

Other base models — by size and architecture

For any text or vision model not listed individually, pricing is set by parameter count and architecture. These tier prices apply uniformly to input and output (no separate cached-input rate):
Model$ / 1M tokens
Less than 4B parameters$0.10
4B – 16B parameters$0.20
More than 16B parameters$0.90
MoE up to 56B parameters (e.g. Mixtral 8x7B)$0.50
MoE 56.1B – 176B parameters (e.g. DBRX, Mixtral 8x22B)$1.20

Priority serverless

Priority routes traffic above the Standard tier for higher reliability during peak periods, at a premium per-token price. Opt in by setting service_tier: "priority" on OpenAI-compatible chat completions or the Anthropic-compatible messages API. See Serverless Priority and Fast for usage.
ModelInputCached inputOutput
Kimi K2.6$1.50$0.22$6.00
GLM 5.1$2.10$0.39$6.60
MiniMax 2.7$0.45$0.09$1.80
OpenAI GPT OSS 120B$0.18$0.018$0.72
Priority tier is in Preview. Available models and pricing may change as the program expands.

Embeddings

Embeddings are billed per 1M input tokens.
Base model parameter count$ / 1M input tokens
up to 150M$0.008
150M – 350M$0.016
Qwen3 8B$0.10

Notes

  • For account-level controls (spend tiers, monthly budget, on-demand GPU quotas), see Account quotas.