A model must be deployed before it can be used for inference.

Deploying a model

PEFT addons

Deploying to serverless

Fireworks also supports deploying serverless addons for certain base models. To deploy a model to serverless, run firectl deploy without passing a deployment ID:

firectl deploy <MODEL_ID>

Serverless addons are charged by input and output tokens for inference. There is no additional charge for deploying serverless addons.

Deploying to on-demand

Addons may also be deployed an on-demand deployment of the base model. To create an on-demand deployment, run:

firectl create deployment "accounts/fireworks/models/<BASE_MODEL_ID>" --enable-addons
On-demand deployments are charged by GPU-hour. See Pricing for details.

Once the deployment is ready, deploy the addon to the deployment:

firectl deploy <MODEL_ID> --deployment <DEPLOYMENT_ID>

Base models

Custom base models may only be used with on-demand deployments. To create one, run:

firectl create deployment <MODEL_ID>
On-demand deployments are charged by GPU-hour. See Pricing for details.

Creating the deployment will automatically deploy the base model to the deployment.

Checking whether a model is deployed

You can check the status of a model deployment by looking at the “Deployed Model Refs” section from:

firectl get model <MODEL_ID>

If successful, there will be an entry with State: DEPLOYED.

Alternatively, you can list all deployed models within your account by running:

firectl list deployed-models

Inference

Model identifier

After you model is sucessfully deployed, it will be ready for inference. A model can be queried using one of the following model identifiers:

  • The deployed model name - accounts/<ACCOUNT_ID>/deployedModels/<DEPLOYED_MODEL_ID>, e.g.
    • accounts/alice/deployedModels/mixtral-8x7b-abcdef
  • The model and deployment names - accounts/<MODEL_ACCOUNT_ID>/models/<MODEL_ID>#accounts/<DEPLOYMENT_ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>, e.g.
    • accounts/fireworks/models/mixtral-8x7b#accounts/alice/deployments/12345678
    • accounts/alice/models/custom-model#accounts/alice/deployments/12345678
  • The model and deployment short-names - <MODEL_ACCOUNT_ID>/<MODEL_ID>#<DEPLOYMENT_ACCOUNT_ID>/<DEPLOYMENT_ID>, e.g.
    • fireworks/mixtral-8x7b#alice/12345678
    • alice/custom-model#alice/12345678
  • If you are deploying a custom model, you can also query it using the model name or model short-name, e.g. e.g.
    • accounts/alice/models/custom-model
    • alice/custom-model

Since a model may be deployed to multiple deployments, querying by model name will route to the “default” deployed model. You can see which one is the default by running

firectl get model <MODEL_ID>

and checking the entry with Default: true.

You can also use short names in place of the model and deployment names. For example

  • <ACCOUNT_ID>/<MODEL_ID>
  • <ACCOUNT_ID>/<MODEL_ID>#<ACCOUNT_ID>/<DEPLOYMENT_ID>

Querying the model

To test the model using the completions API, run:

curl \
  --header 'Authorization: Bearer <FIREWORKS_API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "<MODEL_IDENTIFIER>",
    "prompt": "Say this is a test"
}' \
  --url https://api.fireworks.ai/inference/v1/completions

See Querying text models for a more comprehensive guide.

Publishing a model

By default, models can only be queried by the account that owns it. To make a model public, pass the --public flag when creating or updating it.

firectl update model <MODEL_ID> --public

To unpublish it, run

firectl update model <MODEL_ID> --public=false