Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt

Use this file to discover all available pages before exploring further.

Serverless is the fastest way to get started with using open models. This quickstart will help you make your first API call in minutes.

Step 1: Create and export an API key

Before you begin, create an API key in the Fireworks dashboard. Click Create API key and store it in a safe location. Once you have your API key, export it as an environment variable in your terminal:
export FIREWORKS_API_KEY="your_api_key_here"

Step 2: Make your first Serverless API call

Install the Fireworks Python SDK:
The SDK is currently in alpha. Use the --pre flag when installing to get the latest version.
pip install --pre fireworks-ai
Then make your first Serverless API call:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[{
    "role": "user",
    "content": "Say hello in Spanish",
  }],
)

print(response.choices[0].message.content)
You should see a response like: "¡Hola!"
For Priority Tier (service_tier: "priority") and Fast mode, see Serverless Priority and Fast.

Common use cases

Streaming responses

Stream responses token-by-token for a better user experience:
from fireworks import Fireworks

client = Fireworks()

stream = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[{"role": "user", "content": "Tell me a short story"}],
  stream=True
)

for chunk in stream:
  if chunk.choices[0].delta.content:
    print(chunk.choices[0].delta.content, end="")

Function calling

Connect your models to external tools and APIs:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
    model="accounts/fireworks/models/kimi-k2-instruct-0905",
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City name, e.g. San Francisco",
                        }
                    },
                    "required": ["location"],
                },
            },
        },
    ],
)

print(response.choices[0].message.tool_calls)
Learn more about function calling →

Structured outputs (JSON mode)

Get reliable JSON responses that match your schema:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[
    {
      "role": "user",
      "content": "Extract the name and age from: John is 30 years old",
    }
  ],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "person",
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "number" }
        },
        "required": ["name", "age"],
      },
    },
  },
)

print(response.choices[0].message.content)
Learn more about structured outputs →

Reasoning

Some models support reasoning, where the model shows its thought process before giving the final answer:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3p2",
    messages=[
        {"role": "user", "content": "What is 25 * 37? Show your work."}
    ],
    reasoning_effort="medium",
)

msg = response.choices[0].message
if msg.reasoning_content:
    print("Reasoning:", msg.reasoning_content)
print("Answer:", msg.content)
Learn more about reasoning →

Vision models

Analyze images with vision-language models:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
          },
        },
      ],
    }
  ],
)

print(response.choices[0].message.content)
Learn more about vision models →

Serverless model lifecycle

Serverless models are managed by the Fireworks team and may be updated or deprecated as new models are released. We provide at least 2 weeks advance notice before removing any model, with longer notice periods for popular models based on usage. For production workloads requiring long-term model stability, we recommend using on-demand deployments, which give you full control over model versions and updates.
Serverless request-rate limits and TPM limits depend on your account state. Adding a payment method unlocks higher serverless capacity over time. See Rate limits & quotas for the full policy.
Effective limits can still be lower while adaptive limits respond to recent traffic, so short spikes may be throttled. For higher starting limits, higher upper bounds, or dedicated capacity, contact sales or consider on-demand deployments.

Next steps

Ready to scale to production, explore other modalities, or customize your models?

Deploy and autoscale on Dedicated GPUs

Deploy with high performance on dedicated GPUs with fast autoscaling and minimal cold starts

Fine-tune Models

Improve model quality with supervised and reinforcement learning

Embeddings & Reranking

Use embeddings & reranking in search & context retrieval

Batch Inference

Run async inference jobs at scale, faster and cheaper

Browse 100+ Models

Explore all available models across modalities

API Reference

Complete API documentation