Concepts

Resources

Account

Your account is the top-level resource under which other resources are located. Quotas and billing are enforced at the account level, so usage for all users in an account contribute to the same quotas and bill.

For developer accounts, the account ID is auto-generated from the email address used to sign up.
Enterprise accounts can optionally choose a custom, unique account ID.

User

A user is an email address associated with an account. Users added to an account have full access to delete, edit, and create resources within the account, such as deployments and models.

Models and model types

A model is a set of model weights and metadata associated with the model. Each model has a globally unique name of the form accounts/<ACCOUNT_ID>/models/<MODEL_ID>. There are two types of models: Base models: A base model consists of the full set of model weights, including models pre-trained from scratch and full fine-tunes.

Fireworks has a library of common base models that can be used for serverless inference as well as dedicated deployments. Model IDs for these models are pre-populated. For example, llama-v3p1-70b-instruct is the model ID for the Llama 3.1 70B model that Fireworks provides. The ID for each model can be found on its page (example)
Users can also upload their own custom base models and specify model IDs.

LoRA (low-rank adaptation) addons: A LoRA addon is a small, fine-tuned model that significantly reduces the amount of memory required to deploy compared to a fully fine-tuned model. Fireworks supports training, uploading, and serving LoRA addons. LoRA addons must be deployed on a serverless or dedicated deployment for its corresponding base model. Model IDs for LoRAs can be either auto-generated or user-specified.

Deployments and deployment types

A model must be deployed before it can be used for inference. A deployment is a collection (one or more) model servers that host one base model and optionally one or more LoRA addons. Fireworks supports two types of deployments:

Serverless deployments: Fireworks hosts popular base models on shared “serverless” deployments. Users pay-per-token to query these models and do not need to configure GPUs. The most popular serverless deployments also support serverless LoRA addons. See the Deploying to serverless guide for details.
Dedicated deployments: Dedicated deployments enable users to configure private deployments with a wide array of hardware (see on-demand deployments guide). Dedicated deployments give users performance guarantees and the most flexibility and control over what models can be deployed. Both LoRA addons and base models can be deployed to dedicated deployments. Dedicated deployments are billed by a GPU-second basis (see pricing page).

See the Querying text models guide for a comprehensive overview of making LLM inference.

Deployed model

Users can specify a model to query for inference using the model name and deployment name. Alternatively, users can refer to a “deployed model” name that refers to a unique instance of a base model or LoRA addon that is loaded into a deployment. See deploying models guide for more.

Dataset

A dataset is an immutable set of training examples that can be used to fine-tune a model.

Fine-tuning job

A fine-tuning job is an offline training job that uses a dataset to train a LoRA addon model.

Resource names and IDs

A resource name is a globally unique identifier of a resource. The format of a name also identifies the type and hierarchy of the resource, for example: Resource IDs must satisfy the following constraints:

Between 1 and 63 characters (inclusive)
Consists of a-z, 0-9, and hyphen (-)
Does not begin or end with a hyphen (-)
Does not begin with a digit

Control plane and data plane

The Fireworks API can be split into a control plane and a data plane.

The control plane consists of APIs used for managing the lifecycle of resources. This includes your account, models, and deployments.
The data plane consists of the APIs used for inference and the backend services that power them.

Interfaces

Users can interact with Fireworks through one of many interfaces:

The web console at https://fireworks.ai
The command-line interface firectl
Python SDK

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

Resources

Account

User

Models and model types

Deployments and deployment types

Deployed model

Dataset

Fine-tuning job

Resource names and IDs

Control plane and data plane

Interfaces

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

​Resources

​Account

​User

​Models and model types

​Deployments and deployment types

​Deployed model

​Dataset

​Fine-tuning job

​Resource names and IDs

​Control plane and data plane

​Interfaces

Resources

Account

User

Models and model types

Deployments and deployment types

Deployed model

Dataset

Fine-tuning job

Resource names and IDs

Control plane and data plane

Interfaces