Uploading a custom base model
In addition to the predefined set of models already available on Fireworks, you can also upload your own custom models. To upload a custom LoRA addon, see importing fine-tuned models.
Requirements
Fireworks currently supports the following model architectures:
- Gemma
- Phi, Phi-3
- Llama 1,2,3,3.1
- LLaVa
- Mistral & Mixtral
- Qwen2, Qwen2.5, Qwen2.5-VL
- StableLM
- Starcoder(GPTBigCode) & Starcoder2
- DeepSeek V1 & V2
- GPT NeoX
The model files you will need to provide depend on the model architecture. In general, you will need the following files:
-
Model configuration:
config.json
.Fireworks does not support the
quantization_config
option inconfig.json
. -
Model weights, in one of the following formats:
*.safetensors
*.bin
-
Weights index:
*.index.json
-
Tokenizer file(s), e.g.
tokenizer.model
tokenizer.json
tokenizer_config.json
If the requisite files are not present, model deployment may fail.
Enabling chat completions
To enable the chat completions API for your custom base model, ensure your tokenizer_config.json
contains a chat_template
field. See the Hugging Face guide on Templates for Chat Models for details.
Uploading the model
To upload a custom base model, run the following command.
Uploading models from S3 buckets
For larger models, you can upload directly from an Amazon S3 bucket, which provides a faster transfer process than uploading from local files.
To upload a model directly from an S3 bucket, run the following command.
See the AWS documentation for how to generate an access key ID and secret access key pair.
Ensure the IAM user has read access to the S3 bucket containing the model.
Deploying
A model cannot be used for inference until it is deployed. See the Deploying models guide to deploy the model.
Publishing
By default, all models you create are only visible to and deployable by users within your account. To publish a model so anyone with a Fireworks account can deploy it, you can create it with the --public
flag. This will allow it to show up in public model lists.
To unpublish the model, just run