Custom Models

Upload your own models from Hugging Face or elsewhere to deploy fine-tuned or custom-trained models optimized for your use case.

Multiple upload options – Upload from local files or directly from S3 buckets
Secure uploads – All uploads are encrypted and models remain private to your account by default

Requirements

Supported architectures

Fireworks supports most popular model architectures, including:

View all supported architectures

Required files

You’ll need standard Hugging Face model files: config.json, model weights (.safetensors or .bin), and tokenizer files.

View detailed file requirements

The model files you will need to provide depend on the model architecture. In general, you will need:

Model configuration: config.json
Fireworks does not support the quantization_config option in config.json.
Model weights in one of the following formats:
- *.safetensors
- *.bin
Weights index: *.index.json
Tokenizer file(s), e.g.:
- tokenizer.model
- tokenizer.json
- tokenizer_config.json

If the requisite files are not present, model deployment may fail.Enabling chat completions: To enable the chat completions API for your custom base model, ensure your tokenizer_config.json contains a chat_template field. See the Hugging Face guide on Templates for Chat Models for details.

Uploading your model

Local files (CLI)
S3 bucket (CLI)
REST API

Upload from your local machine:

firectl create model <MODEL_ID> /path/to/files/

If you’re uploading an embedding model, add the --embedding flag.

Verifying your upload

After uploading, verify your model is ready to deploy:

firectl get model accounts/<ACCOUNT_ID>/models/<MODEL_NAME>

Look for State: READY in the output. Once ready, you can create a deployment.

Deploying your model

Once your model shows State: READY, create a deployment:

firectl create deployment accounts/<ACCOUNT_ID>/models/<MODEL_NAME> --wait

See the On-demand deployments guide for configuration options like GPU types, autoscaling, and quantization.

Publishing your model

By default, models are private to your account. Publish a model to make it available to other Fireworks users. When published:

Listed in the public model catalog
Deployable by anyone with a Fireworks account
Still hosted and controlled by your account

Publish a model:

firectl update model <MODEL_ID> --public

Unpublish a model:

firectl update model <MODEL_ID> --public=false

Importing fine-tuned models

In addition to models you fine-tune on the Fireworks platform, you can also upload your own custom fine-tuned models as LoRA adapters.

Requirements

Your custom LoRA addon must contain the following files:

adapter_config.json - The Hugging Face adapter configuration file
adapter_model.bin or adapter_model.safetensors - The saved addon file

The adapter_config.json must contain the following fields:

r - The number of LoRA ranks. Must be an integer between 4 and 64, inclusive
target_modules - A list of target modules. Currently the following target modules are supported:
- q_proj
- k_proj
- v_proj
- o_proj
- up_proj or w1
- down_proj or w2
- gate_proj or w3
- block_sparse_moe.gate

Additional fields may be specified but are ignored.

Enabling chat completions

To enable the chat completions API for your LoRA addon, add a fireworks.json file to the directory containing:

{
  "conversation_config": {
    "style": "jinja",
    "args": {
      "template": "<YOUR_JINJA_TEMPLATE>"
    }
  }
}

Uploading the LoRA adapter

To upload a LoRA addon, run the following command. The MODEL_ID is an arbitrary resource ID to refer to the model within Fireworks.

Only some base models support LoRA addons.

firectl create model <MODEL_ID> /path/to/files/ --base-model "accounts/fireworks/models/<BASE_MODEL_ID>"

Next steps

Deploy your model

Configure GPU types, autoscaling, and optimization

Quantization

Reduce serving costs with model quantization

Fine-tuning

Fine-tune models before deploying them

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

Requirements

Supported architectures

Required files

Uploading your model

Verifying your upload

Deploying your model

Publishing your model

Importing fine-tuned models

Requirements

Enabling chat completions

Uploading the LoRA adapter

Next steps

Deploy your model

Quantization

Fine-tuning

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​Requirements

​Supported architectures

​Required files

​Uploading your model

​Verifying your upload

​Deploying your model

​Publishing your model

​Importing fine-tuned models

​Requirements

​Enabling chat completions

​Uploading the LoRA adapter

​Next steps

Deploy your model

Quantization

Fine-tuning

Requirements

Supported architectures

Required files

Uploading your model

Verifying your upload

Deploying your model

Publishing your model

Importing fine-tuned models

Requirements

Enabling chat completions

Uploading the LoRA adapter

Next steps