Deploying models
A model must be deployed before it can be used for inference. Fireworks deploys the most popular base models to serverless deployments that can be used out of the box (including PEFT addons). See Querying text models.
Less popular base models or custom base models must be used with an on-demand deployment.
Deploying a model
PEFT addons
Deploying to serverless
Fireworks also supports deploying serverless addons for supported base models.
To deploy a PEFT addon to serverless, run
firectl deploy
without passing a deployment ID:
Serverless addons are charged by input and output tokens for inference. There is no additional charge for deploying serverless addons.
PEFT addons on serverless have higher latency compared with base model inference. This includes LoRA fine-tunes, which are one type of PEFT addon. For faster inference speeds with PEFT addons, we recommend deploying to on-demand.
Deploying to on-demand
Addons may also be deployed in an on-demand deployment of supported base models. To create an on-demand deployment, run:
Once the deployment is ready, deploy the addon to the deployment:
Base models
Custom base models may only be used with on-demand deployments. To create one, run:
Use the <MODEL_ID>
specified during model upload. Creating the deployment will automatically deploy the base model to the deployment.
Checking whether a model is deployed
You can check the status of a model deployment by looking at the “Deployed Model Refs” section from:
If successful, there will be an entry with State: DEPLOYED
.
Alternatively, you can list all deployed models within your account by running:
Inference
Model identifier
After your model is successfully deployed, it will be ready for inference. A model can be queried using one of the following model identifiers:
-
The model and deployment names -
accounts/<ACCOUNT_ID of model>/models/<MODEL_ID>#accounts/<ACCOUNT_ID of deployment>/deployments/<DEPLOYMENT_ID>
, e.g.accounts/fireworks/models/mixtral-8x7b#accounts/alice/deployments/12345678
accounts/alice/models/custom-model#accounts/alice/deployments/12345678
-
The model and deployment short-names -
<ACCOUNT_ID of model>/<MODEL_ID>#<ACCOUNT_ID of deployment>/<DEPLOYMENT_ID>
, e.g.fireworks/mixtral-8x7b#alice/12345678
alice/custom-model#alice/12345678
-
Deployed model name - Instead of needing to use both the model and deployment name to refer to a deployed model, you can optionally just use a unique deployed model name. This name utilizes a unique deployed model ID that is created upon deployment. The deployed model ID takes the form <MODEL_ID>-<AUTOGENERATED_SUFFIX> and can be viewed with “firectl list deployed-models”
accounts/alice/deployedModels/mixtral-8x7b-abcdef
-
If you are deploying a custom model, you can also query it using the model name or model short-name, e.g.:
accounts/alice/models/custom-model
alice/custom-model
You can also use short names in place of the model and deployment names. For example:
<ACCOUNT_ID>/<MODEL_ID>
<ACCOUNT_ID>/<MODEL_ID>#<ACCOUNT_ID>/<DEPLOYMENT_ID>
Multiple deployments
Since a model may be deployed to multiple deployments, querying by model name will route to the “default” deployed
model. You can see which deployed model entry is marked with Default: true
by describing the model:
To update the default deployed model, note the “Name” of the deployed model reference above. Then run:
Deleting a default deployment:
To delete a default deployment you must delete all other deployments for the same model first, or designate a different deployed model as the default as described above. This is to ensure that querying by model name will always route to an unambiguous default deployment as long as deployments for the model exist.
Querying the model
To test the model using the completions API, run:
See Querying text models for a more comprehensive guide.
Publishing a model
By default, models can only be queried by the account that owns them. To make a model public, pass the --public
flag
when creating or updating it.
To unpublish it, run:
Was this page helpful?