A model must be deployed before it can be used for inference. Fireworks deploys the most popular base models to serverless deployments that can be used out of the box (including PEFT addons). See Querying text models.

Less popular base models or custom base models must be used with an on-demand deployment.

Deploying a model

PEFT addons

Deploying to serverless

Fireworks also supports deploying serverless addons for supported base models. To deploy a PEFT addon to serverless, run firectl deploy without passing a deployment ID:

firectl deploy <MODEL_ID>

Serverless addons are charged by input and output tokens for inference. There is no additional charge for deploying serverless addons.

PEFT addons on serverless have higher latency compared with base model inference. This includes LoRA fine-tunes, which are one type of PEFT addon. For faster inference speeds with PEFT addons, we recommend deploying to on-demand.

Unused addons may be automatically undeployed after a week.

Deploying to on-demand

Addons may also be deployed in an on-demand deployment of supported base models. To create an on-demand deployment, run:

firectl create deployment "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons
On-demand deployments are charged by GPU-hour. See Pricing for details.

Once the deployment is ready, deploy the addon to the deployment:

firectl deploy <MODEL_ID> --deployment <DEPLOYMENT_ID>

Base models

Custom base models may only be used with on-demand deployments. To create one, run:

firectl create deployment <MODEL_ID>
On-demand deployments are charged by GPU-hour. See Pricing for details.

Use the <MODEL_ID> specified during model upload. Creating the deployment will automatically deploy the base model to the deployment.

Checking whether a model is deployed

You can check the status of a model deployment by looking at the “Deployed Model Refs” section from:

firectl get model <MODEL_ID>

If successful, there will be an entry with State: DEPLOYED.

Alternatively, you can list all deployed models within your account by running:

firectl list deployed-models

Inference

Model identifier

After your model is successfully deployed, it will be ready for inference. A model can be queried using one of the following model identifiers:

  • The model and deployment names - accounts/<ACCOUNT_ID of model>/models/<MODEL_ID>#accounts/<ACCOUNT_ID of deployment>/deployments/<DEPLOYMENT_ID>, e.g.

    • accounts/fireworks/models/mixtral-8x7b#accounts/alice/deployments/12345678
    • accounts/alice/models/custom-model#accounts/alice/deployments/12345678
  • The model and deployment short-names - <ACCOUNT_ID of model>/<MODEL_ID>#<ACCOUNT_ID of deployment>/<DEPLOYMENT_ID>, e.g.

    • fireworks/mixtral-8x7b#alice/12345678
    • alice/custom-model#alice/12345678
  • Deployed model name - Instead of needing to use both the model and deployment name to refer to a deployed model, you can optionally just use a unique deployed model name. This name utilizes a unique deployed model ID that is created upon deployment. The deployed model ID takes the form <MODEL_ID>-<AUTOGENERATED_SUFFIX&gt and can be viewed with “firectl list deployed-models”

    • accounts/alice/deployedModels/mixtral-8x7b-abcdef
  • If you are deploying a custom model, you can also query it using the model name or model short-name, e.g.:

    • accounts/alice/models/custom-model
    • alice/custom-model

You can also use short names in place of the model and deployment names. For example:

  • <ACCOUNT_ID>/<MODEL_ID>
  • <ACCOUNT_ID>/<MODEL_ID>#<ACCOUNT_ID>/<DEPLOYMENT_ID>

Multiple deployments

Since a model may be deployed to multiple deployments, querying by model name will route to the “default” deployed model. You can see which deployed model entry is marked with Default: true by describing the model:

firectl get model <MODEL_ID>
...
Deployed Model Refs:
  [{
    Name: accounts/<ACCOUNT_ID>/deployedModels/<DEPLOYED_MODEL_ID_1>
    Deployment: accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID_1>
    State: DEPLOYED
    Default: true
  },
  {
    Name: accounts/<ACCOUNT_ID>/deployedModels/<DEPLOYED_MODEL_ID_2>
    Deployment: accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID_2>
    State: DEPLOYED
  },
]

To update the default deployed model, note the “Name” of the deployed model reference above. Then run:

firectl update deployed-model <DEPLOYED_MODEL_ID_2> --default

Deleting a default deployment:

To delete a default deployment you must delete all other deployments for the same model first, or designate a different deployed model as the default as described above. This is to ensure that querying by model name will always route to an unambiguous default deployment as long as deployments for the model exist.

Querying the model

To test the model using the completions API, run:

curl \
  --header 'Authorization: Bearer <FIREWORKS_API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "<MODEL_IDENTIFIER>",
    "prompt": "Say this is a test"
}' \
  --url https://api.fireworks.ai/inference/v1/completions

See Querying text models for a more comprehensive guide.

Publishing a model

By default, models can only be queried by the account that owns them. To make a model public, pass the --public flag when creating or updating it.

firectl update model <MODEL_ID> --public

To unpublish it, run:

firectl update model <MODEL_ID> --public=false