Skip to main content
Upload your own models from Hugging Face or elsewhere to deploy fine-tuned or custom-trained models optimized for your use case.
  • Multiple upload options – Upload from local files or directly from S3 buckets or Azure Blob Storage
  • Secure uploads – All uploads are encrypted and models remain private to your account by default

Requirements

Supported architectures

Fireworks supports most popular model architectures, including:

Required files

You’ll need standard Hugging Face model files: config.json, model weights (.safetensors or .bin), and tokenizer files.
The model files you will need to provide depend on the model architecture. In general, you will need:
  • Model configuration: config.json
    Fireworks does not support the quantization_config option in config.json.
  • Model weights in one of the following formats:
    • *.safetensors
    • *.bin
  • Weights index: *.index.json
  • Tokenizer file(s), e.g.:
    • tokenizer.model
    • tokenizer.json
    • tokenizer_config.json
If the requisite files are not present, model deployment may fail.

Customizing base model configuration

For base models (not LoRA adapters), you can customize the chat template and generation defaults by modifying the standard Hugging Face configuration files:
  • Chat template: Add or modify the chat_template field in tokenizer_config.json. See the Hugging Face guide on Templates for Chat Models for details.
  • Generation defaults: Modify generation_config.json to set default generation parameters like max_new_tokens, temperature, top_p, etc.
You can also use a fireworks.json file with base models. If present, fireworks.json takes priority over generation_config.json. See Customizing generation defaults with fireworks.json for the full fireworks.json schema.
For LoRA adapters, you must use fireworks.json to customize generation defaults. Modifying generation_config.json in the adapter folder won’t work because adapters inherit these settings from their base model.

Uploading your model

For larger models, you can upload directly from cloud storage (S3 or Azure Blob Storage) for faster transfer instead of uploading from your local machine.
Upload from your local machine:
firectl model create <MODEL_ID> /path/to/files/
If you’re uploading an embedding model, add the --embedding flag.

Verifying your upload

After uploading, verify your model is ready to deploy:
firectl model get accounts/<ACCOUNT_ID>/models/<MODEL_NAME>
Look for State: READY in the output. Once ready, you can create a deployment.

Deploying your model

Once your model shows State: READY, create a deployment:
firectl deployment create accounts/<ACCOUNT_ID>/models/<MODEL_NAME> --wait
See the On-demand deployments guide for configuration options like GPU types, autoscaling, and quantization.

Publishing your model

By default, models are private to your account. Publish a model to make it available to other Fireworks users. When published:
  • Listed in the public model catalog
  • Deployable by anyone with a Fireworks account
  • Still hosted and controlled by your account
Publish a model:
firectl model update <MODEL_ID> --public
Unpublish a model:
firectl model update <MODEL_ID> --public=false

Importing fine-tuned models

In addition to models you fine-tune on the Fireworks platform, you can also upload your own custom fine-tuned models as LoRA adapters.
Uploaded LoRA adapters can only be deployed to on-demand (dedicated) deployments. Serverless deployment is not supported.

Requirements

Your custom LoRA addon must contain the following files:
  • adapter_config.json - The Hugging Face adapter configuration file
  • adapter_model.bin or adapter_model.safetensors - The saved addon file
The adapter_config.json must contain the following fields:
  • r - The number of LoRA ranks. Must be an integer between 4 and 64, inclusive
  • target_modules - A list of target modules. Currently the following target modules are supported:
    • q_proj
    • k_proj
    • v_proj
    • o_proj
    • up_proj or w1
    • down_proj or w2
    • gate_proj or w3
    • block_sparse_moe.gate
Additional fields may be specified but are ignored.

Customizing generation defaults with fireworks.json

For LoRA adapters, use a fireworks.json file to customize generation defaults. This is the recommended approach because adapters inherit configuration from their base model—modifying generation_config.json in the adapter folder won’t work. Add a fireworks.json file to the directory containing your adapter files:
fireworks.json
{
  "defaults": {
    "stop": ["<|im_end|>", "</s>"],
    "max_tokens": 1024,
    "temperature": 0.7,
    "top_k": 50,
    "top_p": 0.9,
    "min_p": 0.0,
    "typical_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0,
    "repetition_penalty": 1.0
  },
  "model_arch": null,
  "model_config_name": null,
  "has_lora": true,
  "has_teft": false
}
These defaults are applied when the user doesn’t specify values in their API request:
FieldTypeExampleDescription
stoparray["<|im_end|>", "</s>"]Default stop sequences
max_tokensinteger1024Default maximum tokens to generate
temperaturefloat0.7Default sampling temperature
top_kinteger50Default top-k sampling
top_pfloat0.9Default nucleus sampling probability
min_pfloat0.0Default minimum probability threshold
typical_pfloat1.0Default typical sampling probability
frequency_penaltyfloat0.0Default frequency penalty
presence_penaltyfloat0.0Default presence penalty
repetition_penaltyfloat1.0Default repetition penalty
FieldDefaultDescription
model_archnullModel architecture (e.g., "qwen2", "llama"). Usually auto-detected from base model
model_config_namenullModel configuration name (e.g., "4B"). Usually auto-detected from base model
has_loratrueSet to true for LoRA adapters
has_teftfalseSet to true if using TEFT (Token-Efficient Fine-Tuning)
All fields in fireworks.json are optional. Include only the fields you need to override.

Uploading the LoRA adapter

To upload a LoRA addon, run the following command. The MODEL_ID is an arbitrary resource ID to refer to the model within Fireworks.
Only some base models support LoRA addons.
firectl model create <MODEL_ID> /path/to/files/ --base-model "accounts/fireworks/models/<BASE_MODEL_ID>"

Next steps

Deploy your model

Configure GPU types, autoscaling, and optimization

Quantization

Reduce serving costs with model quantization

Fine-tuning

Fine-tune models before deploying them