Skip to main content
You can use the Anthropic Python SDK or Anthropic TypeScript SDK to interact with Fireworks, making it easy to migrate applications that already use Anthropic’s Messages API. Fireworks exposes an Anthropic-compatible endpoint at POST /v1/messages.

Quickstart

Install the Anthropic SDK for your language:
pip install anthropic
Then make your first request:
import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference",
)

response = client.messages.create(
    model="accounts/fireworks/models/kimi-k2p5",
    max_tokens=256,
    messages=[
        {"role": "user", "content": "Say hello in Spanish. Reply in one word."}
    ],
)

print(response.content[0].text)
The base URL for the Anthropic SDK is https://api.fireworks.ai/inference (without the /v1 suffix). The SDK appends /v1/messages automatically.

Usage

Use the Anthropic SDK as you normally would. Set model to a Fireworks model resource name, such as accounts/fireworks/models/kimi-k2p5. The Serverless Quickstart includes Anthropic SDK examples for common use cases:

API compatibility

Supported endpoint

Fireworks supports the Anthropic /v1/messages endpoint, including non-streaming and streaming (SSE) responses.

Deployment support

Anthropic compatibility is supported for serverless and on-demand deployments. Requests must go through api.fireworks.ai/inference (direct route endpoints are not supported for this surface).

Differences from Anthropic

The following parameters and fields are handled differently or are not supported:
  • model: Must be a Fireworks model identifier (for example, accounts/fireworks/models/deepseek-v3p2) instead of an Anthropic model name. See the Fireworks Model Library for available models.
  • max_tokens: Optional on Fireworks (required on Anthropic).
  • anthropic-version header: Not required. Fireworks ignores this header.
  • usage field: Included in both non-streaming and streaming responses. See Token usage for details.
  • service_tier: Not supported.
  • inference_geo: Not supported.

Reasoning effort mapping

When you use the thinking parameter with output_config.effort, Anthropic effort values map to Fireworks reasoning_effort:
Anthropic effortFireworks mapping
lowlow
mediummedium
highhigh
maxhigh
The adaptive thinking type is not supported yet.
For more details on reasoning, including interleaved thinking with tool use, see the Reasoning guide.

Unsupported features

The following Anthropic features are not available on Fireworks:
  • Server tools: Server-side tool families (for example, code execution, memory, web fetch, tool search, and web search) are not supported.
  • Server-tool metadata: Fields such as caller and container are not supported.
  • Tool schema fields: eager_input_streaming, cache_control, allowed_callers, defer_loading, and input_examples are not supported.
  • server_tool_use: Not included in usage tracking.
  • speed: The output_config.speed option is not supported yet.

Fireworks extensions

The following Fireworks-specific extension is available on the Anthropic-compatible endpoint:
  • raw_output: A request parameter (boolean) that returns low-level details of what the model sees, including formatted prompts and function call data.

Token usage

Token usage (input_tokens and output_tokens) is included in both non-streaming and streaming responses.

Non-streaming

For non-streaming requests, usage is returned on the response object:
response = client.messages.create(
    model="accounts/fireworks/models/kimi-k2p5",
    max_tokens=256,
    messages=[{"role": "user", "content": "Say hello"}],
)

print(f"Input tokens:  {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

Streaming

For streaming requests, token usage is included in the final message_delta event:
stream = client.messages.create(
    model="accounts/fireworks/models/kimi-k2p5",
    max_tokens=256,
    messages=[{"role": "user", "content": "Say hello"}],
    stream=True,
)

for event in stream:
    if event.type == "message_delta":
        print(f"Input tokens:  {event.usage.input_tokens}")
        print(f"Output tokens: {event.usage.output_tokens}")
There is only one message_delta event per stream (the last event before message_stop), and it always contains the actual token counts. The message_start event also includes a usage field, but its values are always 0 and should be ignored for metering purposes.

Next steps