Using grammar mode

What is grammar-based structured output?

Grammar mode is the ability to specify a forced output schema for any Fireworks model via an extended BNF formal grammar (GBNF format). This method is popularly used to constrain model outputs in llama.cpp. What is a formal grammar? It’s a way to define rules to declare strings to be valid or invalid. See the “Syntax for Describing Grammars” below for more info. Similar to our JSON mode format, you provide response_format field in the request like {"type": "grammar", "grammar": <your BNF grammar string> }. For best results, we still recommend that you do some prompt engineering and describe the desired output to the model to guide decision-making.

Why grammar-based structured output?

Relying solely on system prompt engineering is finicky and time-consuming. It can be difficult to coerce the model to do certain things, for example
- Behave like a classifier, only output from a predefined list
- Output only Japanese, Chinese, a specified programming language, or otherwise prevent the model from generating a large set of of tokens
Sometimes JSON is not what you need (e.g. it may be finicky with string escaping) and you need some other structured output
Small models may have difficulty following instructions

End-to-end examples

This guide provides a step-by-step example of creating a structured output response with grammar using the Fireworks API. The example uses Python and the Fireworks Build SDK to define the schema for the output.

Prerequisites

Before you begin, ensure you have the following:

Python installed on your system.
Build SDK installed. You can install it using pip:
```
pip install fireworks-ai
```

Next, select the model you want to use. In this example, we use llama-v3p1-405b-instruct, but all fireworks models support this feature.

Step 1: Configure the Fireworks Build SDK

from fireworks import LLM

client = LLM(
    model="llama-v3p1-405b-instruct",
    deployment_type="serverless"
)

Step 2: Define the output grammar

Let’s say you have a classifier model that sorts patient requests into a few predefined classes. Then, you can ask the model to only respond within these classes.

root      ::= diagnosis
diagnosis ::= "arthritis" | "dengue" | "urinary tract infection" | "impetigo" | "cervical spondylosis"

Step 3: Specify your output grammar in your chat completions request

from fireworks import LLM

client = LLM(
    model="llama-v3p1-405b-instruct",
    deployment_type="serverless"
)

diagnosis_grammar = """
root      ::= diagnosis
diagnosis ::= "arthritis" | "dengue" | "urinary tract infection" | "impetigo" | "cervical spondylosis"
"""

chat_completion = client.chat.completions.create(
    response_format={"type": "grammar", "grammar": diagnosis_grammar},
    messages=[
        {
            "role": "system",
            "content": "Given the symptoms try to guess the possible diagnosis. Possible choices: arthritis, dengue, urinary tract infection, impetigo, cervical spondylosis. Answer with a single word",
        },
        {
            "role": "user",
            "content": "I have been having trouble with my muscles and joints. My neck is really tight and my muscles feel weak. I have swollen joints and it is hard to move around without becoming stiff. It is also really uncomfortable to walk.",
        },
    ],
)
print(chat_completion.choices[0].message.content)

and for the response, we will only get one of the 5 classes we specified. In this case, the model output is

'arthritis'

Note that we have done some prompt engineering to instruct the model about possible diagnoses in free form. Alternatively, we may have used one of the fine-tuned models for the medical domain.

Advanced examples

Japanese and Chinese

Given the below configuration

from fireworks import LLM

llm = LLM(
    model="llama-v3p1-405b-instruct",
    deployment_type="serverless"
)

cjk_grammar = """
root        ::= jp-char+ ([ \t\n] jp-char+)*
jp-char     ::= hiragana | katakana | punctuation | cjk
hiragana    ::= [ぁ-ゟ]
katakana    ::= [ァ-ヿ]
punctuation ::= [、-〾]
cjk         ::= [一-鿿]
"""
chat_completion = llm.chat.completions.create(
    response_format={"type": "grammar", "grammar": cjk_grammar},
    messages=[
        {
            "role": "user",
            "content": "You are a Japanese tour guide who speaks fluent Japanese. Please tell me what are some good places for me to visit in Kyoto",
        },
    ],
)

print(chat_completion.choices[0].message.content)

The model will reply in Japanese

こんにちは、私は日本語を母国語として話せるキョトの私が案内する旅行案内者です。京都を旅行にお付き合いいただきありがとうごさいます。京都にはたくさんの楽しいところがありますが、私はあなたの需要に基いて、いくつかのおすすめていきます。\n最初に、古都の一面を体感できる場所として、清水寺をおすすめします。清水寺は世界的に有名な寺院で、美しい金面山だまのホームページと、きれいな庭で知られています。\n次に、京

And since the grammar is actually more lenient than Japanese and covers Chinese as well, we can also just prompt the model to be a fluent Chinese speaker.

You are a Japanese tour guide who speaks fluent Chinese. Please tell me what are some good places for me to visit in Shanghai?",

We can try something even more difficult: asking a Japanese tour guide to speak Chinese. With the help from the grammar, the model replied in Chinese, with the same grammar specified

当您访问上海时、我建议您参观以下几个地方。\\n上海外国语大学。这是一所著名的大学、校园美景优秀、还有各种餐馆可供您选择。\\n上海中山公园。这是一座位于城市中心的公园、风景优美、有许多古老的建设和展览馆。\\n南京路。这是一条繁华的商业大街、有许多品牌商店和餐馆可供您选择。\\n上海南京东路步行街。这是全球最大的步行街之一、有许多商店和餐馆可供您选择。\\n上海世博中心。这是一座展览馆复合体、经常举办各种展览和会议

Without the grammar, the model replies in a mixture of Chinese and English

你好！uming as a Japanese tour guide who speaks fluent Chinese, I would be happy to recommend some places for you to visit in Shanghai!\\n\\n1. The Bund: This is a famous waterfront area in Shanghai that offers stunning views of the city's skyline, including the iconic Oriental Pearl Tower. You can take a leisurely stroll along the promenade and enjoy the beautiful scenery.\\n\\n2. Yuyuan Garden: This is a beautiful classical Chinese garden that dates back to the Ming Dynasty. It features pavilions, halls, rockeries, ponds, and cloisters, and is a great place to experience traditional Chinese architecture and garden design.\\n\\n3. Shanghai Tower: This is the tallest building in China and the second-tallest building in the world. You can take the elevator up to the observation deck on the 128th floor for breatht

C code generation

Programming languages like C can also be expressed as a grammar.

from fireworks import LLM

llm = LLM(
    model="llama-v3p1-405b-instruct",
    deployment_type="serverless"
)

c_grammar = """
root ::= (declaration)*

declaration ::= dataType identifier "(" parameter? ")" "{" statement* "}"

dataType  ::= "int" ws | "float" ws | "char" ws
identifier ::= [a-zA-Z_] [a-zA-Z_0-9]*

parameter ::= dataType identifier

statement ::=
    ( dataType identifier ws "=" ws expression ";" ) |
    ( identifier ws "=" ws expression ";" ) |
    ( identifier ws "(" argList? ")" ";" ) |
    ( "return" ws expression ";" ) |
    ( "while" "(" condition ")" "{" statement* "}" ) |
    ( "for" "(" forInit ";" ws condition ";" ws forUpdate ")" "{" statement* "}" ) |
    ( "if" "(" condition ")" "{" statement* "}" ("else" "{" statement* "}")? ) |
    ( singleLineComment ) |
    ( multiLineComment )

forInit ::= dataType identifier ws "=" ws expression | identifier ws "=" ws expression
forUpdate ::= identifier ws "=" ws expression

condition ::= expression relationOperator expression
relationOperator ::= ("<=" | "<" | "==" | "!=" | ">=" | ">")

expression ::= term (("+" | "-") term)*
term ::= factor(("*" | "/") factor)*

factor ::= identifier | number | unaryTerm | funcCall | parenExpression
unaryTerm ::= "-" factor
funcCall ::= identifier "(" argList? ")"
parenExpression ::= "(" ws expression ws ")"

argList ::= expression ("," ws expression)*

number ::= [0-9]+

singleLineComment ::= "//" [^\n]* "\n"
multiLineComment ::= "/*" ( [^*] | ("*" [^/]) )* "*/"

ws ::= ([ \t\n]+)"""
                           
chat_completion = llm.chat.completions.create(
    response_format={"type": "grammar", "grammar": c_grammar},
    messages=[
        {
            "role": "user",
            "content": "You are an expert in writing C code. Can you write a program that prints hello world?",
        },
    ],
)

print(chat_completion.choices[0].message.content)

In this case, we get a cute little C program as the output:

char\nc(int a){return 2*a;}

Syntax

Background

Bakus-Naur Form (BNF) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. Fireworks API uses an extension of BNF with a few modern regex-like features, inspired by Llama.cpp’s implementation.

Basics

In BNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode code points) and other non-terminals. The basic format of a production rule is nonterminal ::= sequence.... Consider an example of a small chess notation grammar:

# `root` specifies the pattern for the overall output
root ::= (
    # it must start with the characters "1. " followed by a sequence
    # of characters that match the `move` rule, followed by a space, followed
    # by another move, and then a newline
    "1. " move " " move "\n"

    # it's followed by one or more subsequent moves, numbered with one or two digits
    ([1-9] [0-9]? ". " move " " move "\n")+
)

# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
# The `[+#]?` denotes the possibility of checking or mate signs after moves
move ::= (pawn | nonpawn | castle) [+#]?

pawn ::= ...
nonpawn ::= ...
castle ::= ...

Non-terminals and terminals

Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like move, castle, or check-mate. Terminals are actual characters (code points). They can be specified as a sequence like "1" or "O-O" or as ranges like [1-9] or [NBKQR].

Characters and character ranges

Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example hiragana ::= [ぁ-ゟ], or with escapes: 8-bit (\xXX), 16-bit (\uXXXX) or 32-bit (\UXXXXXXXX). Character ranges can be negated with ^:

single-line ::= [^\n]+ "\n"`

Dot . symbol matches any character:

any-three-symbol-sequence ::= ...

Sequences and alternatives

The order of symbols in a sequence matter. For example, in "1. " move " " move "\n", the "1. " must come before the first move, etc. Alternatives, denoted by |, give different sequences that are acceptable. For example, in move ::= pawn | nonpawn | castle, move can be a pawn move, a nonpawn move, or a castle. Parentheses () can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.

Repetition and optional symbols

* after a symbol or sequence means that it can be repeated zero or more times.
+ denotes that the symbol or sequence should appear one or more times.
? makes the preceding symbol or sequence optional.

Comments and newlines

Comments can be specified with #:

# defines optional whitespace
ws ::= [ \t\n]+

Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker | will continue the current rule, even outside of parentheses.

The root rule

In a full grammar, the root rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.

# a grammar for lists
root ::= ("- " item)+
item ::= [^\n]+ "\n"

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

What is grammar-based structured output?

Why grammar-based structured output?

End-to-end examples

Prerequisites

Step 1: Configure the Fireworks Build SDK

Step 2: Define the output grammar

Step 3: Specify your output grammar in your chat completions request

Advanced examples

Japanese and Chinese

C code generation

Syntax

Background

Basics

Non-terminals and terminals

Characters and character ranges

Sequences and alternatives

Repetition and optional symbols

Comments and newlines

The root rule

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

​What is grammar-based structured output?

​Why grammar-based structured output?

​End-to-end examples

​Prerequisites

​Step 1: Configure the Fireworks Build SDK

​Step 2: Define the output grammar

​Step 3: Specify your output grammar in your chat completions request

​Advanced examples

​Japanese and Chinese

​C code generation

​Syntax

​Background

​Basics

​Non-terminals and terminals

​Characters and character ranges

​Sequences and alternatives

​Repetition and optional symbols

​Comments and newlines

​The root rule

What is grammar-based structured output?

Why grammar-based structured output?

End-to-end examples

Prerequisites

Step 1: Configure the Fireworks Build SDK

Step 2: Define the output grammar

Step 3: Specify your output grammar in your chat completions request

Advanced examples

Japanese and Chinese

C code generation

Syntax

Background

Basics

Non-terminals and terminals

Characters and character ranges

Sequences and alternatives

Repetition and optional symbols

Comments and newlines

The root rule