Using JSON mode
What is JSON mode?
JSON mode enables you to provide a JSON schema to force any Fireworks language model to respond in
Why JSON responses?
-
Clarity and Precision: Responding in JSON ensures that the output from the LLM is clear, precise, and easy to parse. This is particularly beneficial in scenarios where the response needs to be further processed or analyzed by other systems.
-
Ease of Integration: JSON, being a widely-used format, allows for easy integration with various platforms and applications. This interoperability is essential for developers looking to incorporate AI capabilities into their existing systems without extensive modifications.
End-to-end example
This guide provides a step-by-step example of how to create a structured output response using the Fireworks.ai API. The example uses Python and the pydantic
library to define the schema for the output.
Prerequisites
Before you begin, ensure you have the following:
-
Python installed on your system.
-
openai
andpydantic
libraries installed. You can install them using pip:pip install openai pydantic
Next, select the model you want to use. In this example, we use mixtral-8x7b-instruct
, but all fireworks models support this feature. You can find your favorite model and get a JSON response out of it!
Step 1: Import libraries
Start by importing the required libraries:
import openai
from pydantic import BaseModel, Field
Step 2: Configure the Fireworks.ai client
You can use either Fireworks.ai or OpenAI SDK with this feature. Using OpenAI SDK with your API key and the base URL:
client = openai.OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key="Your_API_Key",
)
Replace "Your_API_Key"
with your actual API key.
Step 3: Define the output schema
Define a Pydantic model to specify the schema of the output. For example:
class Result(BaseModel):
winner: str
This model defines a simple schema with a single field winner
. If you are not familiar with pydantic, please check the documentation here . Pydantic emits JSON Schema, and you can find more informations about it here.
Step 4: Specify your output schema in your chat completions request
Make a request to the Fireworks.ai API to get a JSON response. In your request, specify the output schema you used in step 3. For example, to ask who won the US presidential election in 2012:
chat_completion = client.chat.completions.create(
model="accounts/fireworks/models/mixtral-8x7b-instruct",
response_format={"type": "json_object", "schema": Result.model_json_schema()},
messages=[
{
"role": "user",
"content": "Who won the US presidential election in 2012? Reply just in one JSON.",
},
],
)
Step 5: Display the result
Finally, print the result:
print(repr(chat_completion.choices[0].message.content))
This will display the response in the format defined by the Result
schema. We get just one nice json response:
'{\n "winner": "Barack Obama"\n}'
And you can parse that as a plain JSON, and hook it up with the rest of your system. Current we enforce a structure with a grammar based state machine, to make sure that the LLMs would always generate all the fields in the schema. If your provided output schema is not a valid json schema, we will fail the response.
Structured response modes
Fireworks support the following variants:
- Arbitrary JSON. Similar to OpenAI, you can force the model to produce any valid json by providing
{"type": "json_object"}
asresponse_format
in the request. This forces the model to output JSON but does not specify what specific JSON schema to use. - JSON with the given schema. To specify a given JSON schema, you can provide the schema according to JSON schema spec to be imposed on the model generation. See supported constructs in the next section.
Important: when using JSON mode, it’s also crucial to instruct the model to produce JSON and describe the desired schema via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request.
To get the best outcome, you need to include the schema in both the prompt and the schema.
Technically, it means that when using “JSON with the given schema” mode, the model doesn’t automatically “see” the schema passed in the response_format
field. Adherence to the schema is forced upon the model during sampling. So for best results, you need to include the desired schema in the prompt in addition to specifying it as response_format
. You may need to experiment with the best way to describe the schema in the prompt depending on the model: besides JSON schema, describing it in plain English might work well too, e.g. “extract name and address of the person in JSON format”.
Note: that the message content may be partially cut off if finish_reason="length"
, which indicates the generation exceeded max_tokens
or the conversation exceeded the max context length. In this case, the return value might not be a valid JSON.
Structured response modes work for both Completions and Chat Completions APIs.
If you use function calling, JSON mode is enabled automatically and function schema is added to the prompt. So none of the comments above apply.
JSON schema constructs
Fireworks supports a subset of JSON schema specification.
Supported:
- Nested schemas composition, including
anyOf
and$ref
type
:string
,number
,integer
boolean
,object
,array
,null
properties
andrequired
for objectsitems
for arrays
Fireworks API doesn’t error out on unsupported constructs. They just won’t be enforced. Not yet supported constraints include:
- Sophisticated composition with
oneOf
- Length/size constraints for objects and arrays
- Regular expressions via
pattern
Note: JSON specification allows for arbitrary field names to appear in an object with the properties
constraint unless "additionalProperties": false
or "unevaluatedProperties": false
is provided. It’s a poor default for LLM constrained generation since any hallucination would be accepted. Thus Fireworks treats any schema with properties
constraint as if it had "unevaluatedProperties": false
.
An example of response_format
field with the schema accepting an object with two fields - a required string and an optional integer:
{
"type": "json_object",
"schema": {
"type": "object",
"properties": {
"foo": {"type": "string"},
"bar": {"type": "integer"}
},
"required": ["foo"]
}
}
Similar features
Check out our function calling model if you’re interested in use cases like:
- Multi-turn capabilities: For example, the ability for the model to ask for clarifying information about parameters
- Routing: The ability for the model to route across multiple different options or models. Instead of just having one possible JSON Schema, you have many different JSON schemas to work across.
Check out grammar mode if you want structured output specified not through JSON, but rather through an arbitrary grammar (limit output to specific words, character limits, character types, etc).
Was this page helpful?