pydantic
library to define the schema for the output. You can find more information about Pydantic here.
fireworks-ai
and pydantic
libraries installed. You can install them using pip:
mixtral-8x7b-instruct
, but all Fireworks models support this feature.
winner
.
Result
schema. We get a nice JSON response that can be parsed and integrated with the rest of your application.
{"type": "json_object"}
as response_format
in the request. This forces the model to output JSON but does not specify what specific JSON schema to use.response_format
field. Adherence to the schema is forced upon the model during sampling. So for best results, you need to include the desired schema in the prompt in addition to specifying it as response_format
. You may need to experiment with the best way to describe the schema in the prompt depending on the model: besides JSON schema, describing it in plain English might work well too, e.g. “extract name and address of the person in JSON format”.
**Note: **that the message content may be partially cut off if finish_reason="length"
, which indicates the generation exceeded max_tokens
or the conversation exceeded the max context length. In this case, the return value might not be a valid JSON.
Structured response modes work for both Completions and Chat Completions APIs.
If you use function calling, JSON mode is enabled automatically and function schema is added to the prompt. So none of the comments above apply.
anyOf
and $ref
type
: string
, number
, integer
boolean
, object
, array
, null
properties
and required
for objectsitems
for arraysoneOf
pattern
properties
constraint unless "additionalProperties": false
or "unevaluatedProperties": false
is provided. It’s a poor default for LLM constrained generation since any hallucination would be accepted. Thus Fireworks treats any schema with properties
constraint as if it had "unevaluatedProperties": false
.
An example of response_format
field with the schema accepting an object with two fields - a required string and an optional integer:
<think>...</think>
tags followed by the JSON object that adheres to your specified schema.
<think>
tags. After the reasoning section, it outputs the JSON data. This allows you to capture both the rationale behind the model’s answer as well as the structured data for downstream processing.
QAResult
Pydantic model using Pydantic’s .parse_raw()
method.
<think>
tags and another capturing the JSON.QAResult.parse_raw()
, leveraging Pydantic’s validation and serialization capabilities.