Resources

Account

Your account is the top-level resource under which other resources are located. Quotas and billing are enforced at the account level, so usage for all users in an account contribute to the same quotas and bill.

  • For developer accounts, the account ID is auto-generated from the email address used to sign up.
  • Enterprise accounts can optionally choose a custom, unique account ID.

User

A user is an email address associated with an account. Users added to an account have full access to delete, edit, and create resources within the account, such as deployments and models.

Models and model types

A model is a set of model weights and metadata associated with the model. Each model has a globally unique name of the form accounts/<ACCOUNT_ID>/models/<MODEL_ID>. There are two types of models:

**Base models - **A base model consists of the full set of model weights, including models pre-trained from scratch and full fine-tunes.

  • Fireworks has a library of common base models that can be used for serverless inference as well as dedicated deployments**. **Model IDs for these models are pre-populated. For example, “llama-v3p1-70b-instruct” is the model ID for the Llama 3.1 70B model that Fireworks provides. It can be found on each model’s page (example)
  • Users can upload their own custom base models and specify model IDs.

LoRA (low-rank adaptation) addons - A LoRA addon is a small, fine-tuned model that significantly reduces the amount of memory required to deploy compared to a fully fine-tuned model. Fireworks supports training, uploading, and serving LoRA addons. LoRA addons must be deployed on a serverless or dedicated deployment for its corresponding base model. Model IDs for LoRAs can be either auto-generated or user-specified.

Deployments and deployment types

A model must be deployed before it can be used for inference. A deployment is a collection (one or more) model servers that host one base model and optionally one or more LoRA addons.

Fireworks supports both:

  • Serverless deployments - Fireworks hosts popular base models on “serverless” deployments, where Fireworks creates a deployment for all Fireworks users can share. Users do not need to configure GPUs and can pay-per-token to query. The most popular serverless deployments also support serverless LoRA addons. See the Deploying to serverless guide for details.
  • **Dedicated deployments - **Dedicated deployments enable users to configure private deployments with a wide array of hardware (see on-demand deployments guide). Dedicated deployments give users the most flexibility and control over what models can be deployed and performance guarantees. Both LoRA addons and base models can be deployed to dedicated deployments. Dedicated deployments are billed by a GPU-second basis (see pricing page).

See the Querying text models guide for a comprehensive overview of making LLM inference.

Deployed model

Users can specify a model to query for inference using both the model name and deployment name. Alternatively, users can refer to a “deployed model” name that refers to a unique instance of a base model or LoRA addon that is loaded into a deployment. See deploying models guide for more.

Dataset

A dataset is an immutable set of training examples that can be used to fine-tune a model.

Fine-tuning job

A fine-tuning job is an offline training job that uses a dataset to train a LoRA addon model.

Resource names and IDs

A full resource name looks like

accounts/my-account/models/my-model

The individual segments my-account and my-model are account and model IDs, respectively.

Resource IDs must satisfy the following constraints:

  • between 1 and 63 characters (inclusive)
  • consist of a-z, 0-9, and hyphen (-)
  • does not begin or end with a hyphen (-)

Some APIs take the full resource name, while others may take a resource ID if the context is clear.

Control plane and data plane

The Fireworks API can be split into a control plane and a data plane.

  • The control plane consists of APIs used for managing the lifecycle of resources. This includes your account, models, and deployments.
  • The data plane consists of the APIs used for inference and the backend services that power them.

Interfaces

Users can interact with Fireworks through one of many interfaces: