Deploying models
A model must be deployed before it can be used for inference.
Deploying a model
PeFT addons
Deploying to serverless
Fireworks also supports deploying serverless addons for certain base models. To deploy a model to serverless, run
firectl deploy
without passing a deployment ID:
firectl deploy <MODEL_ID>
Serverless addons are charged by input and output tokens for inference. There is no additional charge for deploying serverless addons.
Deploying to on-demand
Addons may also be deployed an on-demand deployment of the base model. To create an on-demand deployment, run:
firectl create deployment "accounts/fireworks/models/<BASE_MODEL_ID>" --enable-addons
Once the deployment is ready, deploy the addon to the deployment:
firectl deploy <MODEL_ID> --deployment <DEPLOYMENT_ID>
Base models
Custom base models may only be used with on-demand deployments. To create one, run:
firectl create deployment <MODEL_ID>
Creating the deployment will automatically deploy the base model to the deployment.
Checking whether a model is deployed
You can check the status of a model deployment by looking at the “Deployed Model Refs” section from:
firectl get model <MODEL_ID>
If successful, there will be an entry with State: DEPLOYED
.
Alternatively, you can list all deployed models within your account by running:
firectl list deployed-models
Inference
Model identifier
After you model is sucessfully deployed, it will be ready for inference. A model can be queried using one of the following model identifiers:
- The deployed model name -
accounts/<ACCOUNT_ID>/deployedModels/<DEPLOYED_MODEL_ID>
- The model and deployment names -
accounts/<ACCOUNT_ID>/models/<MODEL_ID>#accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>
- The model name -
accounts/<ACCOUNT_ID>/models/<MODEL_ID>
Since a model may be deployed to multiple deployments, querying by model name will route to the “default” deployed model. You can see which one is the default by running
firectl get model <MODEL_ID>
and checking the entry with Default: true
.
You can also use short names in place of the model and deployment names. For example
<ACCOUNT_ID>/<MODEL_ID>
<ACCOUNT_ID>/<MODEL_ID>#<ACCOUNT_ID>/<DEPLOYMENT_ID>
Querying the model
To test the model using the completions API, run:
curl \
--header 'Authorization: Bearer <FIREWORKS_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"model": "<MODEL_IDENTIFIER>",
"prompt": "Say this is a test"
}' \
--url https://api.fireworks.ai/inference/v1/completions
See Querying text models for a more comprehensive guide.
Publishing a model
By default, models can only be queried by the account that owns it. To make a model public, pass the --public
flag
when creating or updating it.
firectl update model <MODEL_ID> --public
To unpublish it, run
firectl update model <MODEL_ID> --public=false
Was this page helpful?