What is JSON mode?
JSON mode allows you to force the output of any Fireworks language model to conform to a provided JSON schema.Why JSON responses?
- Clarity and Precision: Responding in JSON ensures that the output from the LLM is clear, precise, and easy to parse. This is particularly beneficial in scenarios where the response needs to be further processed or analyzed by other systems.
- Ease of Integration: JSON, being a widely-used format, allows for easy integration with various platforms and applications. This interoperability is essential for developers looking to incorporate AI capabilities into their existing systems without extensive modifications.
End-to-end example
This guide provides a step-by-step example of how to create a structured output response using the Fireworks API. The example uses Python and thepydantic
library to define the schema for the output. You can find more information about Pydantic here.
Prerequisites
Before you begin, ensure you have the following:- Python installed on your system.
-
fireworks-ai
andpydantic
libraries installed. You can install them using pip:
mixtral-8x7b-instruct
, but all Fireworks models support this feature.
Step 1: Import libraries
Start by importing the required libraries:Step 2: Configure the Fireworks client
Initialize the Fireworks client with your model and deployment type:Step 3: Define the output schema
Define a Pydantic model to specify the schema of the output. For example, this model defines a simple schema with a single fieldwinner
.
Step 4: Specify your output schema in your chat completions request
Make a request to the Fireworks API to get a JSON response. In your request, specify the output schema you used in step 3. For example, to ask who won the US presidential election in 2012:Step 5: Display the result
Finally, print the result:Result
schema. We get a nice JSON response that can be parsed and integrated with the rest of your application.
Structured response modes
Fireworks support the following variants:- Arbitrary JSON. Similar to OpenAI, you can force the model to produce any valid json by providing
{"type": "json_object"}
asresponse_format
in the request. This forces the model to output JSON but does not specify what specific JSON schema to use. - JSON with the given schema. To specify a given JSON schema, you can provide the schema according to JSON schema spec to be imposed on the model generation. See supported constructs in the next section.
When using JSON mode, you MUST instruct the model to produce JSON and describe the desired schema via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request.
response_format
field. Adherence to the schema is forced upon the model during sampling. So for best results, you need to include the desired schema in the prompt in addition to specifying it as response_format
. You may need to experiment with the best way to describe the schema in the prompt depending on the model: besides JSON schema, describing it in plain English might work well too, e.g. “extract name and address of the person in JSON format”.
**Note: **that the message content may be partially cut off if finish_reason="length"
, which indicates the generation exceeded max_tokens
or the conversation exceeded the max context length. In this case, the return value might not be a valid JSON.
Structured response modes work for both Completions and Chat Completions APIs.
If you use function calling, JSON mode is enabled automatically and function schema is added to the prompt. So none of the comments above apply.
JSON schema constructs
Fireworks supports a subset of JSON schema specification. Supported:- Nested schemas composition, including
anyOf
and$ref
type
:string
,number
,integer
boolean
,object
,array
,null
properties
andrequired
for objectsitems
for arrays
- Sophisticated composition with
oneOf
- Length/size constraints for objects and arrays
- Regular expressions via
pattern
properties
constraint unless "additionalProperties": false
or "unevaluatedProperties": false
is provided. It’s a poor default for LLM constrained generation since any hallucination would be accepted. Thus Fireworks treats any schema with properties
constraint as if it had "unevaluatedProperties": false
.
An example of response_format
field with the schema accepting an object with two fields - a required string and an optional integer:
Reasoning Model JSON Mode
In addition to standard JSON responses, Fireworks JSON mode now supports generating an output that includes the model’s internal reasoning. In this mode, the response contains a “reasoning” section wrapped in<think>...</think>
tags followed by the JSON object that adheres to your specified schema.
How It Works
When using Reasoning Model JSON Mode, the model first outputs its reasoning process enclosed in<think>
tags. After the reasoning section, it outputs the JSON data. This allows you to capture both the rationale behind the model’s answer as well as the structured data for downstream processing.
Example Usage with Pydantic
Below is an example illustrating how to parse the response directly into a Pydantic model. In this example, the response contains both a reasoning part and a JSON part. The JSON part is then parsed into theQAResult
Pydantic model using Pydantic’s .parse_raw()
method.
Additional notebooks and exmaples
Explore how Reasoning JSON Mode is used in different contexts:🖥️ Reasoning JSON Mode for Computer Specs
Generate structured PC specifications while capturing the model’s thought process behind component choices.
🏥 Reasoning JSON Mode for Healthcare
Structure patient healthcare records with AI-generated reasoning, ensuring interpretability and compliance.
Key Points
- Dual Output: The model outputs both a reasoning explanation and a structured JSON object.
- Extraction: Use regular expressions to split the output—one capturing the reasoning within
<think>
tags and another capturing the JSON. - Direct Parsing: Parse the JSON part into your Pydantic model with
QAResult.parse_raw()
, leveraging Pydantic’s validation and serialization capabilities.
Similar features
Check out our function calling model if you’re interested in use cases like:- Multi-turn capabilities: For example, the ability for the model to ask for clarifying information about parameters
- Routing: The ability for the model to route across multiple different options or models. Instead of just having one possible JSON Schema, you have many different JSON schemas to work across.