You can use the OpenAI Python client library to interact with Fireworks. This makes migration of existing applications already using OpenAI particularly easy. For Anthropic SDK support, see Anthropic compatibility.Documentation Index
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
Use this file to discover all available pages before exploring further.
Specify endpoint and API key
Using the OpenAI client
You can use the OpenAI client by initializing it with your Fireworks configuration:Using environment variables
Alternative approach
Usage
Use OpenAI’s SDK how you’d normally would. Just ensure that themodel parameter refers to one of Fireworks models.
Completion
Simple completion API that doesn’t modify provided prompt in any way:Chat Completion
Works best for models fine-tuned for conversation (e.g. llama*-chat variants):Fine-tuning compatibility
Fireworks fine-tuning uses the same OpenAI-compatible chat completion format for training data. If you have datasets formatted for OpenAI SFT, they work on Fireworks with no conversion required — the samemessages array with role, content, tool_calls, and weight fields.
Fireworks also supports additional features in the training schema:
- Thinking traces via
reasoning_contenton assistant messages (for models like DeepSeek R1 and Qwen3) - Per-message weights to control which turns the model learns from
- Per-sample weights for weighted training
- Vision inputs using the same OpenAI-compatible multimodal content format
API compatibility
Differences
The following options have minor differences:max_tokens: behaves differently if the model context length is exceeded. If the length ofpromptormessagesplusmax_tokensis higher than the model’s context window,max_tokenswill be adjusted lower accordingly. OpenAI returns an invalid request error in this situation. Control this behavior with thecontext_length_exceeded_behaviorparameter:truncate(default): Automatically adjustsmax_tokensto fit within the context windowerror: Returns an error like OpenAI does
Token usage for streaming responses
OpenAI API returns usage stats (number of tokens in prompt and completion) for non-streaming responses but doesn’t for the streaming ones (see forum post). Fireworks API returns usage stats in both cases. For streaming responses, theusage field is returned in the very last chunk on the response (i.e. the one having finish_reason set). For example:
cURL
Note, that if you’re using OpenAI SDK, they
usage field won’t be listed in the SDK’s structure definition. But it can be accessed directly. For example: