Using JSON mode

What is JSON mode?

JSON mode allows you to force the output of any Fireworks language model to conform to a provided JSON schema.

Why JSON responses?

Clarity and Precision: Responding in JSON ensures that the output from the LLM is clear, precise, and easy to parse. This is particularly beneficial in scenarios where the response needs to be further processed or analyzed by other systems.
Ease of Integration: JSON, being a widely-used format, allows for easy integration with various platforms and applications. This interoperability is essential for developers looking to incorporate AI capabilities into their existing systems without extensive modifications.

End-to-end example

This guide provides a step-by-step example of how to create a structured output response using the Fireworks API. The example uses Python and the pydantic library to define the schema for the output. You can find more information about Pydantic here.

Prerequisites

Before you begin, ensure you have the following:

Python installed on your system.
fireworks-ai and pydantic libraries installed. You can install them using pip:
```
pip install fireworks-ai pydantic
```

Next, select the model you want to use. In this example, we use mixtral-8x7b-instruct, but all Fireworks models support this feature.

Step 1: Import libraries

Start by importing the required libraries:

from fireworks import LLM
from pydantic import BaseModel, Field

Step 2: Configure the Fireworks client

Initialize the Fireworks client with your model and deployment type:

llm = LLM(
    model="llama-v3p3-70b-instruct",
    deployment_type="serverless"
)

Step 3: Define the output schema

Define a Pydantic model to specify the schema of the output. For example, this model defines a simple schema with a single field winner.

class Result(BaseModel):
    winner: str

Step 4: Specify your output schema in your chat completions request

Make a request to the Fireworks API to get a JSON response. In your request, specify the output schema you used in step 3. For example, to ask who won the US presidential election in 2012:

chat_completion = llm.chat.completions.create(
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Result",
            "schema": Result.model_json_schema()
        }
    },
    messages=[
        {
            "role": "user",
            "content": "Who won the US presidential election in 2012? Reply just in one JSON.",
        },
    ],
)

Step 5: Display the result

Finally, print the result:

print(repr(chat_completion.choices[0].message.content))

This will display the response in the format defined by the Result schema. We get a nice JSON response that can be parsed and integrated with the rest of your application.

'{\n "winner": "Barack Obama"\n}'

We use a grammar-based state machine to make sure that the LLMs would always generate all the fields in the schema. If your provided output schema is not a valid JSON schema, we will fail the response.

Structured response modes

Fireworks support the following variants:

Arbitrary JSON. Similar to OpenAI, you can force the model to produce any valid json by providing {"type": "json_object"} as response_format in the request. This forces the model to output JSON but does not specify what specific JSON schema to use.
JSON with the given schema. To specify a given JSON schema, you can provide the schema according to JSON schema spec to be imposed on the model generation. See supported constructs in the next section.

When using JSON mode, you MUST instruct the model to produce JSON and describe the desired schema via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request.

To get the best outcome, you need to include the schema in both the prompt and the schema. Technically, it means that when using “JSON with the given schema” mode, the model doesn’t automatically “see” the schema passed in the response_format field. Adherence to the schema is forced upon the model during sampling. So for best results, you need to include the desired schema in the prompt in addition to specifying it as response_format. You may need to experiment with the best way to describe the schema in the prompt depending on the model: besides JSON schema, describing it in plain English might work well too, e.g. “extract name and address of the person in JSON format”. **Note: **that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length. In this case, the return value might not be a valid JSON. Structured response modes work for both Completions and Chat Completions APIs. If you use function calling, JSON mode is enabled automatically and function schema is added to the prompt. So none of the comments above apply.

JSON schema constructs

Fireworks supports a subset of JSON schema specification. Supported:

Nested schemas composition, including anyOf and $ref
type: string, number, integer boolean, object, array, null
properties and required for objects
items for arrays

Fireworks API doesn’t error out on unsupported constructs. They just won’t be enforced. Not yet supported constraints include:

Sophisticated composition with oneOf
Length/size constraints for objects and arrays
Regular expressions via pattern

Note: JSON specification allows for arbitrary field names to appear in an object with the properties constraint unless "additionalProperties": false or "unevaluatedProperties": false is provided. It’s a poor default for LLM constrained generation since any hallucination would be accepted. Thus Fireworks treats any schema with properties constraint as if it had "unevaluatedProperties": false. An example of response_format field with the schema accepting an object with two fields - a required string and an optional integer:

{
  "type": "json_schema",
  "json_schema": {
    "type": "object",
    "properties": {
      "foo": {"type": "string"},
      "bar": {"type": "integer"}
    },
    "required": ["foo"]
  }
}

Reasoning Model JSON Mode

In addition to standard JSON responses, Fireworks JSON mode now supports generating an output that includes the model’s internal reasoning. In this mode, the response contains a “reasoning” section wrapped in <think>...</think> tags followed by the JSON object that adheres to your specified schema.

How It Works

When using Reasoning Model JSON Mode, the model first outputs its reasoning process enclosed in <think> tags. After the reasoning section, it outputs the JSON data. This allows you to capture both the rationale behind the model’s answer as well as the structured data for downstream processing.

Example Usage with Pydantic

Below is an example illustrating how to parse the response directly into a Pydantic model. In this example, the response contains both a reasoning part and a JSON part. The JSON part is then parsed into the QAResult Pydantic model using Pydantic’s .parse_raw() method.

import json
import re
from pydantic import BaseModel
from fireworks import LLM
import os

# Initialize the Fireworks client
llm = LLM(
    model="llama-v3p3-70b-instruct",
    deployment_type="serverless"
)

# Define the output schema using Pydantic
class QAResult(BaseModel):
    question: str
    answer: str

# Prepare the user input
user_input = "Who wrote 'Pride and Prejudice'?"

# Construct the messages payload
messages = [{"role": "user", "content": user_input}]

# Make the API call to the model
response = llm.chat.completions.create(
    messages=messages,
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "QAResult",
            "schema": QAResult.model_json_schema()
        }
    },
    max_tokens=1000,
)

# Extract the content of the response
response_content = response.choices[0].message.content
print("show response content", response_content)

# Extract the reasoning part enclosed in <think>...</think> tags
reasoning_match = re.search(r"<think>(.*?)</think>", response_content, re.DOTALL)
reasoning = reasoning_match.group(1).strip() if reasoning_match else "No reasoning provided."

# Extract the JSON part that follows after the reasoning
json_match = re.search(r"</think>\s*(\{.*\})", response_content, re.DOTALL)
json_str = json_match.group(1).strip() if json_match else "{}"

# Parse the JSON string directly into a Pydantic model
qa_result = QAResult.parse_raw(json_str)

# Output the reasoning and the parsed JSON data
print("Reasoning:")
print(reasoning)
print("\nQAResult (JSON Data):")
print(qa_result.json(indent=4))

Additional notebooks and exmaples

Explore how Reasoning JSON Mode is used in different contexts:

🖥️ Reasoning JSON Mode for Computer Specs

Generate structured PC specifications while capturing the model’s thought process behind component choices.

🏥 Reasoning JSON Mode for Healthcare

Structure patient healthcare records with AI-generated reasoning, ensuring interpretability and compliance.

Key Points

Dual Output: The model outputs both a reasoning explanation and a structured JSON object.
Extraction: Use regular expressions to split the output—one capturing the reasoning within <think> tags and another capturing the JSON.
Direct Parsing: Parse the JSON part into your Pydantic model with QAResult.parse_raw(), leveraging Pydantic’s validation and serialization capabilities.

This new mode is ideal for scenarios where understanding the model’s thought process is as important as obtaining the final answer. It is especially useful during debugging, auditing, or whenever transparency in the decision-making process is required.

Similar features

Check out our function calling model if you’re interested in use cases like:

Multi-turn capabilities: For example, the ability for the model to ask for clarifying information about parameters
Routing: The ability for the model to route across multiple different options or models. Instead of just having one possible JSON Schema, you have many different JSON schemas to work across.

Check out grammar mode if you want structured output specified not through JSON, but rather through an arbitrary grammar (limit output to specific words, character limits, character types, etc).

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

What is JSON mode?

Why JSON responses?

End-to-end example

Prerequisites

Step 1: Import libraries

Step 2: Configure the Fireworks client

Step 3: Define the output schema

Step 4: Specify your output schema in your chat completions request

Step 5: Display the result

Structured response modes

JSON schema constructs

Reasoning Model JSON Mode

How It Works

Example Usage with Pydantic

Additional notebooks and exmaples

🖥️ Reasoning JSON Mode for Computer Specs

🏥 Reasoning JSON Mode for Healthcare

Key Points

Similar features

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

​What is JSON mode?

​Why JSON responses?

​End-to-end example

​Prerequisites

​Step 1: Import libraries

​Step 2: Configure the Fireworks client

​Step 3: Define the output schema

​Step 4: Specify your output schema in your chat completions request

​Step 5: Display the result

​Structured response modes

​JSON schema constructs

​Reasoning Model JSON Mode

​How It Works

​Example Usage with Pydantic

​Additional notebooks and exmaples

🖥️ Reasoning JSON Mode for Computer Specs

🏥 Reasoning JSON Mode for Healthcare

​Key Points

​Similar features

What is JSON mode?

Why JSON responses?

End-to-end example

Prerequisites

Step 1: Import libraries

Step 2: Configure the Fireworks client

Step 3: Define the output schema

Step 4: Specify your output schema in your chat completions request

Step 5: Display the result

Structured response modes

JSON schema constructs

Reasoning Model JSON Mode

How It Works

Example Usage with Pydantic

Additional notebooks and exmaples

Key Points

Similar features