Skip to main content
Structured outputs ensure model responses conform to your specified format, making them easy to parse and integrate into your application. Fireworks supports two methods: JSON mode (using JSON schemas) and Grammar mode (using custom BNF grammars).
New to structured outputs? Check out the Serverless Quickstart for a quick introduction.

Quick Start

Force model output to conform to a JSON schema:
import os
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference/v1"
)

# Define your schema
class Result(BaseModel):
    winner: str

# Make the request
response = client.chat.completions.create(
    model="accounts/fireworks/models/kimi-k2p5",
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Result",
            "schema": Result.model_json_schema()
        }
    },
    messages=[{
        "role": "user",
        "content": "Who won the US presidential election in 2012? Reply in JSON format."
    }]
)

print(response.choices[0].message.content)
# Output: {"winner": "Barack Obama"}
Include the schema in both your prompt and the response_format for best results. The model doesn’t automatically “see” the schema—it’s enforced during generation.

Response Format Options

Fireworks supports two JSON mode variants:
  • json_object – Force any valid JSON output (no specific schema)
  • json_schema – Enforce a specific JSON schema (recommended)
Always instruct the model to produce JSON in your prompt. Without this, the model may generate whitespace indefinitely until hitting token limits.
To force JSON output without a specific schema:
response = client.chat.completions.create(
    model="accounts/fireworks/models/kimi-k2p5",
    response_format={"type": "json_object"},
    messages=[{
        "role": "user",
        "content": "List the top 3 programming languages in JSON format."
    }]
)
This is similar to OpenAI’s JSON mode.
Token limits: If finish_reason="length", the response may be truncated and invalid JSON. Increase max_tokens if needed.Completions API: JSON mode works with both Chat Completions and Completions APIs.Function calling: When using Tool Calling, JSON mode is enabled automatically—these guidelines don’t apply.

JSON Schema Support

Fireworks supports most JSON schema specification constructs: Supported:
  • Types: string, number, integer, boolean, object, array, null
  • Object constraints: properties, required
  • Array constraints: items
  • Nested schemas: anyOf, $ref
Not yet supported:
  • oneOf composition
  • Length/size constraints (minLength, maxLength, minItems, maxItems)
  • Regular expressions (pattern)
Fireworks automatically prevents hallucinated fields by treating schemas with properties as if "unevaluatedProperties": false is set.
Some models support generating structured JSON outputs alongside their reasoning process. The Fireworks Python SDK exposes the model’s reasoning via the reasoning_content field, keeping it separate from the JSON output in the content field.
Using response_format with json_schema disables reasoning output. To get both reasoning and structured JSON, include the schema in your prompt instead and omit the response_format parameter.

Example Usage

import json
from fireworks import Fireworks
from pydantic import BaseModel

client = Fireworks()

# Define the output schema
class QAResult(BaseModel):
    question: str
    answer: str

# Include the schema in the prompt to preserve reasoning
schema = QAResult.model_json_schema()

response = client.chat.completions.create(
    model="accounts/fireworks/models/kimi-k2p5",
    messages=[{
        "role": "user",
        "content": (
            "Who wrote 'Pride and Prejudice'?\n\n"
            f"Reply in JSON matching this schema:\n{json.dumps(schema, indent=2)}"
        )
    }],
    max_tokens=1000
)

# The Fireworks SDK separates reasoning into its own field
reasoning = response.choices[0].message.reasoning_content
content = response.choices[0].message.content

# Strip markdown code fences if the model wraps the JSON
json_str = content.strip()
if json_str.startswith("```"):
    json_str = json_str.split("\n", 1)[1].rsplit("```", 1)[0].strip()

# Parse into Pydantic model
qa_result = QAResult.model_validate_json(json_str)

if reasoning:
    print("Reasoning:", reasoning)
print("Result:", qa_result.model_dump_json(indent=2))
If you don’t need reasoning and just want guaranteed schema-conformant JSON, use the response_format parameter as shown in the Quick Start above. The response_format approach enforces the schema during generation, eliminating the need to parse the JSON yourself.

Use Cases

Reasoning mode is useful for:
  • Debugging: Understanding why the model generated specific outputs
  • Auditing: Documenting the decision-making process
  • Complex tasks: Scenarios where the reasoning is as valuable as the final answer
See the Reasoning guide for more on working with reasoning models.

Grammar Mode

For advanced use cases where JSON isn’t sufficient, use Grammar mode to constrain outputs using custom BNF grammars. Grammar mode is ideal for:
  • Classification tasks – Limit output to a predefined list of options
  • Language-specific output – Force output in specific languages or character sets
  • Custom formats – Define arbitrary output structures beyond JSON
Learn more about Grammar mode → Check out Tool Calling for multi-turn capabilities and routing across multiple schemas.