Skip to main content
Serverless is the fastest way to get started with using open models. This quickstart will help you make your first API call in minutes.

Step 1: Create and export an API key

Before you begin, create an API key in the Fireworks dashboard. Click Create API key and store it in a safe location. Once you have your API key, export it as an environment variable in your terminal:
export FIREWORKS_API_KEY="your_api_key_here"

Step 2: Make your first Serverless API call

Install the Fireworks Python SDK:
The SDK is currently in alpha. Use the --pre flag when installing to get the latest version.
pip install --pre fireworks-ai
Then make your first Serverless API call:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[{
    "role": "user",
    "content": "Say hello in Spanish",
  }],
)

print(response.choices[0].message.content)
You should see a response like: "¡Hola!"

Common use cases

Streaming responses

Stream responses token-by-token for a better user experience:
from fireworks import Fireworks

client = Fireworks()

stream = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[{"role": "user", "content": "Tell me a short story"}],
  stream=True
)

for chunk in stream:
  if chunk.choices[0].delta.content:
    print(chunk.choices[0].delta.content, end="")

Function calling

Connect your models to external tools and APIs:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
    model="accounts/fireworks/models/kimi-k2-instruct-0905",
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City name, e.g. San Francisco",
                        }
                    },
                    "required": ["location"],
                },
            },
        },
    ],
)

print(response.choices[0].message.tool_calls)
Learn more about function calling →

Structured outputs (JSON mode)

Get reliable JSON responses that match your schema:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[
    {
      "role": "user",
      "content": "Extract the name and age from: John is 30 years old",
    }
  ],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "person",
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "number" }
        },
        "required": ["name", "age"],
      },
    },
  },
)

print(response.choices[0].message.content)
Learn more about structured outputs →

Vision models

Analyze images with vision-language models:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
          },
        },
      ],
    }
  ],
)

print(response.choices[0].message.content)
Learn more about vision models →

Serverless model lifecycle

Serverless models are managed by the Fireworks team and may be updated or deprecated as new models are released. We provide at least 2 weeks advance notice before removing any model, with longer notice periods for popular models based on usage. For production workloads requiring long-term model stability, we recommend using on-demand deployments, which give you full control over model versions and updates.
Make sure to add a payment method to access higher rate limits up to 6,000 RPM. Without a payment method, you’re limited to 10 RPM.

Next steps

Ready to scale to production, explore other modalities, or customize your models?