Serverless Pricing

Overview

Serverless inference is priced per token. Every text or vision request is billed across three dimensions:

Input tokens — what you send to the model.

Cached input tokens — input tokens served from prompt cache, priced lower.

Output tokens — what the model generates.

Embeddings are billed only on input tokens.

Text and vision models

Per-model pricing for headline models. Fast variants appear as adjacent rows. In each Standard or Priority cell, prices are input / cached input / output (USD per 1M tokens), in that order.

Model	Standard	Priority
Kimi K2.6	$0.95 / $0.16 / $4.00	$1.50 / $0.22 / $6.00
Kimi K2.6 Turbo (Fast, Preview)	$2.00 / $0.30 / $8.00	—
Kimi K2.5	$0.60 / $0.10 / $3.00	—
DeepSeek V4 Pro	$1.74 / $0.145 / $3.48	—
DeepSeek V4 Flash	$0.14 / $0.028 / $0.28	—
GLM 5.1	$1.40 / $0.26 / $4.40	$2.10 / $0.39 / $6.60
GLM 5.1 Fast (Preview)	$2.80 / $0.52 / $8.80	—
MiniMax 2.7	$0.30 / $0.06 / $1.20	$0.45 / $0.09 / $1.80
MiniMax 2.5	$0.30 / $0.03 / $1.20	—
OpenAI GPT OSS 120B	$0.15 / $0.015 / $0.60	$0.18 / $0.018 / $0.72
OpenAI GPT OSS 20B	$0.07 / $0.035 / $0.30	—

— in the Priority column means Priority is not available for that model. This pricing table is the source of truth for Priority availability.

Other base models — by size and architecture

For any text or vision model not listed individually, pricing is set by parameter count and architecture. These tier prices apply uniformly to input and output (no separate cached-input rate):

Model	$ / 1M tokens
Less than 4B parameters	$0.10
4B – 16B parameters	$0.20
More than 16B parameters	$0.90
MoE up to 56B parameters (e.g. Mixtral 8x7B)	$0.50
MoE 56.1B – 176B parameters (e.g. DBRX, Mixtral 8x22B)	$1.20

Priority tier is in Preview. Available models and pricing may change as the program expands.

Base model parameter count	$ / 1M input tokens
up to 150M	$0.008
150M – 350M	$0.016
Qwen3 8B	$0.10

Base model parameter count

$ / 1M input tokens

up to 150M

$0.008

150M – 350M

$0.016

Qwen3 8B

$0.10

Overview

How pricing works

Text and vision models

Other base models — by size and architecture

Embeddings

Notes

Documentation Index

​Overview

​How pricing works

​Text and vision models

​Other base models — by size and architecture

​Embeddings

​Notes

Overview

How pricing works

Text and vision models

Other base models — by size and architecture

Embeddings

Notes