OpenAI compatibility
You can use OpenAI Python client library to interact with Fireworks. This makes migration of existing applications already using OpenAI particularly easy.
Specify endpoint and API key
You can override parameters for the entire application using environment variables
or by setting these values in Python
Alternatively, you may specify these parameters for a single request (useful if you mix calls to OpenAI and Fireworks in the same process):
Usage
Use OpenAI’s SDK how you’d normally would. Just ensure that the model
parameter refers to one of Fireworks models.
Completion
Simple completion API that doesn’t modify provided prompt in any way
Chat Completion
Works best for models fine-tuned for conversation (e.g. llama*-chat variants)
API compatibility
Differences
The following options have minor differences:
stop
: the returned string includes the stop word for Fireworks while it’s omitted for OpenAI (it can be easily truncated on client side)max_tokens
: behaves differently if the model context length is exceeded. If the length ofprompt
ormessages
plusmax_tokens
is higher than the model’s context window,max_tokens
will be adjusted lower accordingly. OpenAI returns invalid request error in this situation. This behavior can be adjusted bycontext_length_exceeded_behavior
parameter.
Token usage for streaming responses
OpenAI API returns usage stats (number of tokens in prompt and completion) for non-streaming responses but doesn’t for the streaming ones (see forum post).
Fireworks.ai returns usage stats in both cases. For streaming responses, the usage
field is returned in the very last chunk on the response (i.e. the one having finish_reason
set). For example:
Note, that if you’re using OpenAI SDK, they usage
field won’t be listed in the SDK’s structure definition. But it can be accessed directly. For example:
- In Python SDK, you can access the attribute directly, e.g.
for chunk in openai.ChatCompletion.create(...): print(chunk["usage"])
. - In TypeScript SDK, you need to cast away the typing, e.g.
for await (const chunk of await openai.chat.completions.create(...)) { console.log((chunk as any).usage); }
.
Not supported options
The following options are not yet supported:
presence_penalty
frequency_penalty
best_of
: you can usen
insteadlogit_bias
functions
: you can use our LangChain integration to achieve similar functionality client-side
Please reach out to us on Discord if you have a use case requiring one of these.
Was this page helpful?