Tutorial
Foreword
This tutorial demonstrates how to use the Fireworks AI Python SDK with a few toy examples. First, we will use the LLM class to make a simple request to various models and compare the outputs. Then, we will try to fine-tune a model to learn information it has never seen before.
This tutorial will cost $10 to run due to the on-demand model deployments.
1. Setup
To get started with the Fireworks AI Python SDK, you need to install the firectl
CLI tool and create an API key.
Install our CLI tool firectl
to interact with the Fireworks AI platform.
Sign in to Fireworks by running the following command:
A browser window will open to the Fireworks AI login page. Once you login, your machine will be authenticated.
Create an API key by running the following command:
Copy the value of the Key
field to your environment variable FIREWORKS_API_KEY
.
Install the Fireworks AI Python SDK.
Once you have completed the steps above, let’s ensure you are ready to make your first LLM call.
2. Call a language model using the LLM()
class
Now that your machine is setup with credentials and the SDK, lets ensure you are ready to make your first LLM call and explain some of the nuances of this SDK.
Create a new file called main.py
and import the Fireworks AI SDK.
Instantiate the LLM
class. The LLM class accepts a model
argument that you
can use to specify the model you want to use. For this tutorial, we will use the
Llama 4 Maverick
model.
When creating an LLM instance, you can specify the deployment type as either "serverless"
, "on-demand"
, or "auto"
. If you pass "auto"
, the SDK will try to use serverless hosting if available, otherwise it will create an on-demand deployment. In the other cases, the SDK will try to create a deployment of the specified type and will throw an error if it’s not available for the model you selected.
The SDK will try and re-use existing deployments for the same model if possible, see Resource management for more details.
With great power comes great responsibility! Be careful with the deployment_type
parameter, especially for "auto"
and "on-demand"
. While the SDK will try to make the most cost effective choice for you and put sensible autoscaling policies in place, it is possible to unintentionally create many deployments that lead to unwanted spend, especially when working with non-serverless models.
Make a request to the LLM. The LLM
class is OpenAI compatible, so you can use
the same chat completion interface to make a request to the LLM.
The great thing about the SDK is that you can use your favorite Python constructs to powerfully work with LLMs. For example, let’s try calling a few LLMs in a loop and see how they respond:
Or, we can test different temperature values to see how the model’s behavior changes:
3. Fine-tune a model
The Build SDK makes fine-tuning a model a breeze! To see how, let’s try a canonical use case: fine-tuning a model to learn information it has never seen before. To do this, we will use the TOFU (Task of Fictitious Unlearning) dataset. The dataset consists of ~4,000 question-answer pairs on autobiographies of 200 fictitious authors. Researchers fine-tuned a model on this dataset with the goal of investigating ways to “unlearn” this information. For our toy example, however, we will only focus on the first step: trying to embed these nonsense facts into an LLM.
Install the required dependencies, you will need the datasets
library from Hugging Face to load the dataset.
Load and prepare the dataset. We must convert the dataset to the format expected by the fine-tuning service, which is a list of chat completion messages following the OpenAI chat completion format.
We can then create a Dataset
object and upload it to Fireworks using the Dataset.from_list()
method.
Now we can create a base model and fine-tune it on the dataset. Let’s try fine-tuning Qwen2.5 7B Instruct. At this time, it might be helpful to set the FIREWORKS_SDK_DEBUG
environment variable to true
to see the progress of the fine-tuning job.
Qwen2.5 7B Instruct is not available serverlessly, so the SDK will create an on-demand deployment with a scale-down window of 5 mins. This will incur some costs.
Now we can test the fine-tuned model.
If everything worked out correctly, you should see something like:
Just like we did in the previous section, you can try iterating over different models and fine-tuning hyperparameters like epochs
and learning_rate
to experiment with different fine-tuning jobs!
You’ll notice that despite us using two models in this tutorial, the only actually created a single deployment. This is the power of the Build SDK’s smart resource management in action! Rather than creating a seperate deployment for the LoRA addon, we simply updated the base model deployment we created to support LoRA addons and then deployed our fine-tuned model on top.
You can feel free to send more requests to either model. The SDK by default sets a scale-to-zero window of 5 mins, which stops billing after an extended period of inactivity. However, it’s good practice to delete deployments you’re not using as a precautionary measure against unexpected bills. You can call
base_model.delete_deployment(ignore_checks=True)
to delete the deployment, bypassing the check that triggers if you’ve used the deployment recently.
Conclusion
This tutorial walked you through the basic use cases for the SDK: trying out different models/configurations and fine-tuning on a dataset. From here, you should check out the Reference for more details on the objects and methods available in the SDK.