Using function-calling

Introduction

Function calling enables models to intelligently select and utilize tools based on user input. This powerful feature allows you to build dynamic agents that can access real-time information and generate structured outputs. The function calling API doesn’t execute functions directly. Instead, it generates OpenAI-compatible function call specifications that you then implement.

How function calling works

Tools specifications: You specify a query along with the list of available tools for the model. The tools are specified using JSON Schema. Each tool includes its name, description, and required parameters.
Intent detection: The model analyzes user input and determines whether to provide a conversational response or generate function calling specifications.
Function call generation: When appropriate, the model outputs structured function calls in OpenAI-compatible format, including all necessary parameters based on the context.
Execution and response generation: You execute the specified function calls and feed results back to the model for continued conversation.

Supported models

A subset of models hosted on Fireworks supports function calling using the described syntax. These models are listed below. The supportsTools field in the model response also indicates whether the model supports function calling.

Llama 3.1 405B Instruct
Llama 3.1 70B Instruct
Qwen 2.5 72B Instruct
Mixtral MoE 8x22B Instruct
Firefunction-v2: Latest and most performant model, optimized for complex function calling scenarios (on-demand only)
Firefunction-v1: Previous generation, Mixtral-based function calling model optimized for fast routing and structured output (on-demand only)

These models can all utilize function calling with the same syntax, shown below.

Basic example: City population data retrieval: Llama 3.1 405B Instruct

For this example, let’s consider a user looking for population data for a specific city. We will provide the model with a tool that it can invoke to retrieve city population data.

To achieve this, we detail the purpose, arguments, and usage of the get_city_population function using JSON Schema. This information is provided through the tools argument. The user query is sent as usual through the messages argument.

Request

from fireworks import LLM
import json

llm = LLM(
    model="llama-v3p1-405b-instruct",
    deployment_type="serverless"
)

# Define the function tool for getting city population
tools = [
    {
        "type": "function",
        "function": {
            # The name of the function
            "name": "get_city_population",
            # A detailed description of what the function does
            "description": "Retrieve the current population data for a specified city.",
            # Define the JSON schema for the function parameters
            "parameters": {
                # Always declare a top-level object for parameters
                "type": "object",
                # Properties define the arguments for the function
                "properties": {
                    "city_name": {
                        # JSON Schema type
                        "type": "string",
                        # A detailed description of the property
                        "description": "The name of the city for which population data is needed, e.g., 'San Francisco'."
                    },
                },
                # Specify which properties are required
                "required": ["city_name"],
            },
        },
    }
]

# Define a comprehensive system prompt
prompt = f"""
You have access to the following function:

Function Name: '{tools[0]["function"]["name"]}'
Purpose: '{tools[0]["function"]["description"]}'
Parameters Schema: {json.dumps(tools[0]["function"]["parameters"], indent=4)}

Instructions for Using Functions:
1. Use the function '{tools[0]["function"]["name"]}' to retrieve population data when required.
2. If a function call is necessary, reply ONLY in the following format:
   <function={tools[0]["function"]["name"]}>{{"city_name": "example_city"}}</function>
3. Adhere strictly to the parameters schema. Ensure all required fields are provided.
4. Use the function only when you cannot directly answer using general knowledge.
5. If no function is necessary, respond to the query directly without mentioning the function.

Examples:
- For a query like "What is the population of Toronto?" respond with:
  <function=get_city_population>{{"city_name": "Toronto"}}</function>
- For "What is the population of the Earth?" respond with general knowledge and do NOT use the function.
"""

# Initial message context
messages = [
    {"role": "system", "content": prompt},
    {"role": "user", "content": "What is the population of San Francisco?"}
]

# Call the model
chat_completion = llm.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)

# Print the model's response
print(chat_completion.choices[0].message.model_dump_json(indent=4))

Response

{
    "content": null,
    "refusal": null,
    "role": "assistant",
    "audio": null,
    "function_call": null,
    "tool_calls": [
        {
            "id": "call_tPSbe4guTSXuUWbqtWguSJzu",
            "function": {
                "arguments": "{\"city_name\": \"San Francisco\"}",
                "name": "get_city_population"
            },
            "type": "function",
            "index": 0
        }
    ]
}

In our case, the model decides to invoke the get_city_population tool with a specific argument. Note that the model itself does not invoke the tool. It just specifies the argument. When the model issues a function call, the completion reason will be set to tool_calls. The API caller is responsible for parsing the function name and arguments supplied by the model and invoking the appropriate tool.

Call External API

def get_city_population(city_name: str):
    print(f"{city_name=}")
    if city_name == "San Francisco":
        return {"population": 883305}
    else:
        raise NotImplementedError()

function_call = chat_completion.choices[0].message.tool_calls[0].function
tool_response = locals()[function_call.name](**json.loads(function_call.arguments))
print(tool_response)

Response

city_name='San Francisco'
{'population': 883305}

The API caller obtains the response from the tool invocation and passes its response back to the model for generating a response.

Request

agent_response = chat_completion.choices[0].message

# Append the response from the agent
messages.append(
    {
        "role": agent_response.role, 
        "content": "",
        "tool_calls": [
            tool_call.model_dump()
            for tool_call in chat_completion.choices[0].message.tool_calls
        ]
    }
)

# Append the response from the tool 
messages.append(
    {
        "role": "tool",
        "content": json.dumps(tool_response)
    }
)

next_chat_completion = llm.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)

print(next_chat_completion.choices[0].message.model_dump_json(indent=4))

Response

{
    "content": "The population of San Francisco is 883305.",
    "refusal": null,
    "role": "assistant",
    "audio": null,
    "function_call": null,
    "tool_calls": null
}

This results in the following response

The population of San Francisco is 883305.

Advanced example: Financial data retrieval

TL;DR This example tutorial is available as a Python notebook [code | Colab]. For this example, let’s consider a user looking for Nike’s financial data. We will provide the model with a tool that the model is allowed to invoke to get access to the financial information of any company.

To achieve our goal, we will provide the model with information about the get_financial_data function. We detail its purpose, arguments, etc in JSON Schema. We send this information in through the tools argument. We send the user query as usual through the messages argument.

Request

from fireworks import LLM
import json

llm = LLM(
    model="llama-v3p1-405b-instruct",
    deployment_type="serverless"
)

messages = [
    {"role": "system", "content": f"You are a helpful assistant with access to functions." 
     															"Use them if required."},
    {"role": "user", "content": "What are Nike's net income in 2022?"}
]

tools = [
    {
        "type": "function",
        "function": {
            # name of the function 
            "name": "get_financial_data",
            # a good, detailed description for what the function is supposed to do
            "description": "Get financial data for a company given the metric and year.",
            # a well defined json schema: https://json-schema.org/learn/getting-started-step-by-step#define
            "parameters": {
                # for OpenAI compatibility, we always declare a top level object for the parameters of the function
                "type": "object",
                # the properties for the object would be any arguments you want to provide to the function
                "properties": {
                    "metric": {
                        # JSON Schema supports string, number, integer, object, array, boolean and null
                        # for more information, please check out https://json-schema.org/understanding-json-schema/reference/type
                        "type": "string",
                        # You can restrict the space of possible values in an JSON Schema
                        # you can check out https://json-schema.org/understanding-json-schema/reference/enum for more examples on how enum works
                        "enum": ["net_income", "revenue", "ebdita"],
                    },
                    "financial_year": {
                        "type": "integer", 
                        # If the model does not understand how it is supposed to fill the field, a good description goes a long way 
                        "description": "Year for which we want to get financial data."
                    },
                    "company": {
                        "type": "string",
                        "description": "Name of the company for which we want to get financial data."
                    }
                },
                # You can specify which of the properties from above are required
                # for more info on `required` field, please check https://json-schema.org/understanding-json-schema/reference/object#required
                "required": ["metric", "financial_year", "company"],
            },
        },
    }
]

chat_completion = llm.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)
print(chat_completion.choices[0].message.model_dump_json(indent=4))

Response

{
  "content": "",
  "role": "assistant",
  "function_call": null,
  "tool_calls": [
    {
      "id": "call_XstygHYlzKrI8hbERr0ybeOQ",
      "function": {
        "arguments": "{\"metric\": \"net_income\", \"financial_year\": 2022, \"company\": \"Nike\"}",
        "name": "get_financial_data"
      },
      "type": "function",
      "index": 0
    }
  ]
}

In our case, the model decides to invoke the tool get_financial_data with some specific set of arguments. Again note that the model itself won’t invoke the tool — it just specifies the arguments. When the model issues a function call, the completion reason will be set to tool_calls. The API caller is responsible for parsing the function name and arguments supplied by the model and invoking the appropriate tool.

Call External API

def get_financial_data(metric: str, financial_year: int, company: str):
    print(f"{metric=} {financial_year=} {company=}")
    if metric == "net_income" and financial_year == 2022 and company == "Nike":
        return {"net_income": 6_046_000_000}
    else:
        raise NotImplementedError()

function_call = chat_completion.choices[0].message.tool_calls[0].function
tool_response = locals()[function_call.name](**json.loads(function_call.arguments))
print(tool_response)

Response

metric='net_income' financial_year=2022 company='Nike'
{'net_income': 6046000000}

The API caller obtains the response from the tool invocation and passes its response back to the model for generating a response.

Request

agent_response = chat_completion.choices[0].message

# Append the response from the agent
messages.append(
    {
        "role": agent_response.role, 
        "content": "",
        "tool_calls": [
            tool_call.model_dump()
            for tool_call in chat_completion.choices[0].message.tool_calls
        ]
    }
)

# Append the response from the tool 
messages.append(
    {
        "role": "tool",
        "content": json.dumps(tool_response)
    }
)

next_chat_completion = llm.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)

print(next_chat_completion.choices[0].message.content)

Response

{
  "content": "Nike's net income for the year 2022 was $6,046,000,000.",
  "role": "assistant",
  "function_call": null,
  "tool_calls": null
}

This results in the following response

Nike's net income for the year 2022 was $6,046,000,000.

Tools specification

The tools field is an array where each component includes the following fields:

type (string) Specifies the type of the tool. Currently, only function is supported.
function (object) Specifies the function to be called. It includes the following fields:
- description (string): A description of what the function does, used by the model to choose when and how to call the function.
- name (string): The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
- parameters (object): The parameters the functions accepts, described as a JSON Schema object. See the JSON Schema reference for documentation about the format.

Tool choice

The tool_choice parameter controls whether the model is allowed to call functions or not. Currently, we support auto, none , any or a specific function name.

auto (default) The model can dynamically choose between generating a message or calling a function. This is the default tool choice when no value is specified for tool_choice.
none Disables the use of any tools, similar to not specifying the tool_choice field.
any Allows the model to call any function. You can also specify:

tool_choice = {"type": "function"}

This ensures that a function call will always be made, with no restriction on the function’s name.

Specific function name To force the model to use a particular function, you can explicitly specify the function name in the tool_choice field. For example:

tool_choice = {"type": "function", "function": {"name": "get_financial_data"}}

This ensures that the model will only use the get_financial_data function.

OpenAI compatibility

Fireworks AI’s function calling API is fully compatible with OpenAI’s implementation, with a few differences:

No support for parallel function calling
No nested function calling
Simplified tool choice options

Streaming with tool calls

When using streaming with function calls, tool call arguments are sent incrementally as the model generates them. This allows you to build real-time applications where users can see function calls being constructed as they happen.

How streaming tool calls work

Tool call initiation: The model starts a tool call with the function name
Argument streaming: Arguments are streamed in chunks as JSON text
Completion detection: The stream ends with finish_reason: "tool_calls"

Parsing streaming tool calls

Here’s a complete example showing how to handle streaming tool calls:

import json
from fireworks import LLM
import os

# Initialize the LLM client
llm = LLM(model="llama-v3p1-70b-instruct", deployment_type="auto", api_key=os.environ.get("FIREWORKS_API_KEY"))

# Define the tool
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

messages = [
    {"role": "user", "content": "What's the weather like in San Francisco?"}
]

# Create streaming completion
stream = llm.chat.completions.create(
    messages=messages,
    tools=tools,
    stream=True,
    temperature=0.1
)

# Initialize tool call tracking
tool_calls = {}

# Process the stream
for chunk in stream:
    if chunk.choices[0].delta.tool_calls:
        for tool_call in chunk.choices[0].delta.tool_calls:
            # Get the tool call index to track multiple tool calls
            index = tool_call.index
            
            # Initialize this tool call if we haven't seen it
            if index not in tool_calls:
                tool_calls[index] = {
                    "id": "",
                    "name": "",
                    "arguments": ""
                }
            
            # Update tool call ID if provided
            if tool_call.id:
                tool_calls[index]["id"] = tool_call.id
            
            # Update function name if provided
            if tool_call.function and tool_call.function.name:
                tool_calls[index]["name"] = tool_call.function.name
            
            # Accumulate arguments if provided
            if tool_call.function and tool_call.function.arguments:
                tool_calls[index]["arguments"] += tool_call.function.arguments
    
    # Check if we're done
    if chunk.choices[0].finish_reason == "tool_calls":
        # Parse the completed tool calls
        for index, tool_call in tool_calls.items():
            print(f"Tool Call {index}:")
            print(f"  ID: {tool_call['id']}")
            print(f"  Function: {tool_call['name']}")
            
            try:
                # Parse the JSON arguments
                args = json.loads(tool_call['arguments'])
                print(f"  Arguments: {args}")
                
                # Execute the function here
                # result = execute_function(tool_call['name'], args)
                
            except json.JSONDecodeError as e:
                print(f"  Error parsing arguments: {e}")
                print(f"  Raw arguments: {tool_call['arguments']}")
        
        break
    
    # Handle regular content if present
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Key considerations

Argument accumulation: Tool call arguments arrive in multiple chunks and must be concatenated
Multiple tool calls: Use the index field to track different tool calls in the same response
JSON validation: Always validate the final accumulated arguments as valid JSON before execution
Error handling: Be prepared for incomplete or malformed JSON during streaming

Example output

As the stream progresses, you’ll see output like:

Tool Call 0:
  ID: call_abc123
  Function: get_weather
  Arguments: {'city': 'San Francisco', 'unit': 'celsius'}

You can then execute the function and continue the conversation by appending the tool results back to the message history.

Best practices

Number of Functions: The length of the list of functions specified to the model directly impacts its performance. For best performance, keep the list of functions below 7. It’s possible to see some degradation in the quality of the model as the tool list length exceeds 10.
Function Description: The function specification follows JSON Schema. For best performance, describe in great detail what the function does under the “description” section. An example of a good function description is “Get financial data for a company given the metric and year”. A bad example would be “Get financial data for a company”.
System Prompt: In order to ensure optimal performance, we recommend not adding any additional system prompt. User-specified system prompts can interfere with the function detection and calling ability of the model. The auto-injected prompt for our function calling model is designed to ensure optimal performance.
Temperature: Set the temperature to 0.0 or some low value. This helps the model to only generate confident predictions and avoid hallucinating parameter values.
Function descriptions: Providing verbose descriptions for functions and its parameters. This is similar to prompt engineering: the more elaborate and accurate the function definition/documentation, the better the model is at deciphering the accurate intent of the function and its parameters.

Function calling vs JSON mode

When should you use function calling vs JSON mode? Use function calling when:

Building interactive agents
Requiring structured API calls
Implementing multi-step workflows
Needing dynamic decision making

Use JSON mode when:

Performing simple data extraction
Working with static data
Needing non-JSON output formats
Processing batch data without interaction

Example apps

Official demos
- Function Calling Battle App
- Data Extraction Pipeline
Langchain integrations

Resources

Data policy

Data from Firefunction is logged and automatically deleted after 30 days to ensure product quality and to prevent abuse ( bulk data on average # functions used, etc). This data will never be used to train models. Please contact raythai@fireworks.ai if you have questions, comments, or use cases where data cannot be logged.

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

Introduction

How function calling works

Supported models

Basic example: City population data retrieval: Llama 3.1 405B Instruct

Advanced example: Financial data retrieval

Tools specification

Tool choice

OpenAI compatibility

Streaming with tool calls

How streaming tool calls work

Parsing streaming tool calls

Key considerations

Example output

Best practices

Function calling vs JSON mode

Example apps

Resources

Data policy

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

​Introduction

​How function calling works

​Supported models

​Basic example: City population data retrieval: Llama 3.1 405B Instruct

​Advanced example: Financial data retrieval

​Tools specification

​Tool choice

​OpenAI compatibility

​Streaming with tool calls

​How streaming tool calls work

​Parsing streaming tool calls

​Key considerations

​Example output

​Best practices

​Function calling vs JSON mode

​Example apps

​Resources

​Data policy

Introduction

How function calling works

Supported models

Basic example: City population data retrieval: Llama 3.1 405B Instruct

Advanced example: Financial data retrieval

Tools specification

Tool choice

OpenAI compatibility

Streaming with tool calls

How streaming tool calls work

Parsing streaming tool calls

Key considerations

Example output

Best practices

Function calling vs JSON mode

Example apps

Resources

Data policy