Using grammar mode
What is grammar-based structured output?
Grammar mode is the ability to specify a forced output schema for any Fireworks model via an extended BNF formal grammar (GBNF format). This method is popularly used to constrain model outputs in llama.cpp. What is a formal grammar? It’s a way to define rules to declare strings to be valid or invalid. See the “Syntax for Describing Grammars” below for more info. Similar to our JSON mode format, you provide response_format
field in the request like {"type": "grammar", "grammar": <your BNF grammar string> }
.
For best results, we still recommend that you do some prompt engineering and describe the desired output to the model to guide decision-making.
Why grammar-based structured output?
- Relying solely on system prompt engineering is finicky and time-consuming. It can be difficult to coerce the model to do certain things, for example
- Behave like a classifier, only output from a predefined list
- Output only Japanese, Chinese, a specified programming language, or otherwise prevent the model from generating a large set of of tokens
- Sometimes JSON is not what you need (e.g. it may be finicky with string escaping) and you need some other structured output
- Small models may have difficulty following instructions
End-to-end examples
This guide provides a step-by-step example of creating a structured output response with grammar using the Fireworks.ai API. The example uses Python and the OpenAI library to define the schema for the output.
Prerequisites
Before you begin, ensure you have the following:
-
Python installed on your system.
-
openai
libraries installed. You can install them using pip:
Next, select the model you want to use. In this example, we use mixtral-8x7b-instruct
, but all fireworks models support this feature. You can find your favorite model and get structured responses out of it!
Step 1: Configure the Fireworks.ai client
You can use either Fireworks.ai or OpenAI SDK with this feature. Using OpenAI SDK with your API key and the base URL:
Replace "Your_API_Key"
with your actual API key.
Step 2: Define the output grammar
Define a grammar to restrict the specified output. Let’s say you have a model that is a classifier and classifies patient requests into a few predefined classes:
Then you can ask the model to only respond within these classes.
Step 3: Specify your output grammar in your chat completions request
and for the response, we will only get one of the 5 classes we specified, in this case, the model output is
Note, that we still have done some prompt engineering to instruct the model about possible diagnoses in free form. Alternatively, we may have used one of the fine-tuned models for the medical domain.
Advanced examples
Japanese and Chinese
Make a request to the Fireworks.ai API to get a structured response. In your request, specify the output schema you used in step 3. For example, we are pretending
The model will reply in Japanese
And since the grammar is actually more lenient than Japanese and covers Chinese as well, we can also just prompt the model to be a fluent Chinese speaker.
And you can see here that we are trying something a little difficult, asking a Japanese tour guide to speak Chinese. But with the help from the grammar, the model replied in Chinese, with the same grammar specified
Without the help from the grammar, here is the model reply in a mix of Chinese and English
C code generation
This is one of the community contribution on llama.cpp. You can hook that with our Mixtral model and try to come up with a good solution for a coding problem you have.
In this case, we get a cute little valid C program as the output:
Syntax
Background
Bakus-Naur Form (BNF) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. Fireworks API uses an extension of BNF with a few modern regex-like features, inspired by Llama.cpp’s implementation.
Basics
In BNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode code points) and other non-terminals. The basic format of a production rule is nonterminal ::= sequence...
.
Consider an example of a small chess notation grammar:
Non-terminals and terminals
Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like move
, castle
, or check-mate
.
Terminals are actual characters (code points). They can be specified as a sequence like "1"
or "O-O"
or as ranges like [1-9]
or [NBKQR]
.
Characters and character ranges
Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example hiragana ::= [ぁ-ゟ]
, or with escapes: 8-bit (\xXX
), 16-bit (\uXXXX
) or 32-bit (\UXXXXXXXX
).
Character ranges can be negated with ^
:
Dot .
symbol matches any character:
Sequences and alternatives
The order of symbols in a sequence matter. For example, in "1. " move " " move "\n"
, the "1. "
must come before the first move
, etc.
Alternatives, denoted by |
, give different sequences that are acceptable. For example, in move ::= pawn | nonpawn | castle
, move
can be a pawn
move, a nonpawn
move, or a castle
.
Parentheses ()
can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.
Repetition and optional symbols
*
after a symbol or sequence means that it can be repeated zero or more times.+
denotes that the symbol or sequence should appear one or more times.?
makes the preceding symbol or sequence optional.
Comments and newlines
Comments can be specified with #
:
Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker |
will continue the current rule, even outside of parentheses.
The root rule
In a full grammar, the root
rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.
Was this page helpful?