Skip to main content
Upload your own models from Hugging Face or elsewhere to deploy fine-tuned or custom-trained models optimized for your use case.
  • Multiple upload options – Upload from local files or directly from S3 buckets or Azure Blob Storage
  • Secure uploads – All uploads are encrypted and models remain private to your account by default

Requirements

Supported architectures

Fireworks supports most popular model architectures, including:

Required files

You’ll need standard Hugging Face model files: config.json, model weights (.safetensors or .bin), and tokenizer files.
The model files you will need to provide depend on the model architecture. In general, you will need:
  • Model configuration: config.json
    Fireworks does not support the quantization_config option in config.json.
  • Model weights in one of the following formats:
    • *.safetensors
    • *.bin
  • Weights index: *.index.json
  • Tokenizer file(s), e.g.:
    • tokenizer.model
    • tokenizer.json
    • tokenizer_config.json
If the requisite files are not present, model deployment may fail.

Customizing base model configuration

For base models (not LoRA adapters), you can customize the chat template and generation defaults by modifying the standard Hugging Face configuration files:
  • Chat template: Add or modify the chat_template field in tokenizer_config.json. See the Hugging Face guide on Templates for Chat Models for details.
  • Generation defaults: Modify generation_config.json to set default generation parameters like max_new_tokens, temperature, top_p, etc.
You can also use a fireworks.json file with base models. If present, fireworks.json takes priority over both tokenizer_config.json and generation_config.json. See Customizing chat template and generation defaults for the full fireworks.json schema.
For LoRA adapters, you must use fireworks.json to customize configuration. Modifying tokenizer_config.json or generation_config.json in the adapter folder won’t work because adapters inherit these settings from their base model.

Uploading your model

For larger models, you can upload directly from cloud storage (S3 or Azure Blob Storage) for faster transfer instead of uploading from your local machine.
Upload from your local machine:
firectl model create <MODEL_ID> /path/to/files/
If you’re uploading an embedding model, add the --embedding flag.

Verifying your upload

After uploading, verify your model is ready to deploy:
firectl model get accounts/<ACCOUNT_ID>/models/<MODEL_NAME>
Look for State: READY in the output. Once ready, you can create a deployment.

Deploying your model

Once your model shows State: READY, create a deployment:
firectl deployment create accounts/<ACCOUNT_ID>/models/<MODEL_NAME> --wait
See the On-demand deployments guide for configuration options like GPU types, autoscaling, and quantization.

Publishing your model

By default, models are private to your account. Publish a model to make it available to other Fireworks users. When published:
  • Listed in the public model catalog
  • Deployable by anyone with a Fireworks account
  • Still hosted and controlled by your account
Publish a model:
firectl model update <MODEL_ID> --public
Unpublish a model:
firectl model update <MODEL_ID> --public=false

Importing fine-tuned models

In addition to models you fine-tune on the Fireworks platform, you can also upload your own custom fine-tuned models as LoRA adapters.

Requirements

Your custom LoRA addon must contain the following files:
  • adapter_config.json - The Hugging Face adapter configuration file
  • adapter_model.bin or adapter_model.safetensors - The saved addon file
The adapter_config.json must contain the following fields:
  • r - The number of LoRA ranks. Must be an integer between 4 and 64, inclusive
  • target_modules - A list of target modules. Currently the following target modules are supported:
    • q_proj
    • k_proj
    • v_proj
    • o_proj
    • up_proj or w1
    • down_proj or w2
    • gate_proj or w3
    • block_sparse_moe.gate
Additional fields may be specified but are ignored.

Customizing chat template and generation defaults

For LoRA adapters, use a fireworks.json file to customize the chat template and generation defaults. This is the recommended approach because adapters inherit configuration from their base model—modifying generation_config.json or tokenizer_config.json in the adapter folder won’t work. Add a fireworks.json file to the directory containing your adapter files:
fireworks.json
{
  "conversation_config": {
    "style": "jinja",
    "args": {
      "template": "{% for message in messages %}...",
      "system": "optional system prompt",
      "special_tokens_map": {
        "bos_token": "<s>",
        "eos_token": "</s>",
        "unk_token": "<unk>"
      },
      "keep_leading_spaces": true,
      "function_call_prefix": "...",
      "function_call_suffix": "...",
      "disable_grammar": false
    }
  },
  "defaults": {
    "stop": ["<|im_end|>", "</s>"],
    "max_tokens": 1024,
    "temperature": 0.7,
    "top_k": 50,
    "top_p": 0.9,
    "min_p": 0.0,
    "typical_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0,
    "repetition_penalty": 1.0
  },
  "model_arch": null,
  "model_config_name": null,
  "has_lora": true,
  "has_teft": false
}
FieldRequiredDescription
styleYesTemplate style (see supported styles below)
args.templateYes (for jinja)Jinja2 template string for formatting messages
args.systemNoDefault system prompt
args.special_tokens_mapNoToken mappings for bos_token, eos_token, unk_token
args.keep_leading_spacesNoPreserve leading whitespace in the template output
args.function_call_prefixNoPrefix for tool/function calls
args.function_call_suffixNoSuffix for tool/function calls
args.disable_grammarNoDisable grammar constraints
Supported conversation styles:
StyleDescription
jinjaCustom Jinja2 template (requires args.template)
huggingfaceUses the model’s HuggingFace chat template
alpacaAlpaca instruction format
chatmlChatML format
codellama-70b-instructCodeLlama 70B instruction format
deepseekDeepSeek format
deepseek-v3p1DeepSeek V3.1 format
deepseek-v3p2DeepSeek V3.2 format
glmGLM format
glm_47GLM 4.7 format
harmonyHarmony format
kimiKimi format
kimi-k2-instructKimi K2 instruction format
llama-chatLlama chat format
llama-infillLlama infilling format
llama4Llama 4 format
minimaxMiniMax format
minimax_m2MiniMax M2 format
mistral-chatMistral chat format
passthroughNo formatting applied
qwen2Qwen2 format
qwen3Qwen3 format
qwen3-coderQwen3 Coder format
qwen3-vlQwen3 Vision-Language format
qwen3-vl-moeQwen3 Vision-Language MoE format
stablelm-zephyrStableLM Zephyr format
vicunaVicuna chat format
These defaults are applied when the user doesn’t specify values in their API request:
FieldTypeExampleDescription
stoparray["<|im_end|>", "</s>"]Default stop sequences
max_tokensinteger1024Default maximum tokens to generate
temperaturefloat0.7Default sampling temperature
top_kinteger50Default top-k sampling
top_pfloat0.9Default nucleus sampling probability
min_pfloat0.0Default minimum probability threshold
typical_pfloat1.0Default typical sampling probability
frequency_penaltyfloat0.0Default frequency penalty
presence_penaltyfloat0.0Default presence penalty
repetition_penaltyfloat1.0Default repetition penalty
FieldDefaultDescription
model_archnullModel architecture (e.g., "qwen2", "llama"). Usually auto-detected from base model
model_config_namenullModel configuration name (e.g., "4B"). Usually auto-detected from base model
has_loratrueSet to true for LoRA adapters
has_teftfalseSet to true if using TEFT (Token-Efficient Fine-Tuning)
All fields in fireworks.json are optional except for conversation_config.style when customizing the chat template. Include only the fields you need to override.

Uploading the LoRA adapter

To upload a LoRA addon, run the following command. The MODEL_ID is an arbitrary resource ID to refer to the model within Fireworks.
Only some base models support LoRA addons.
firectl model create <MODEL_ID> /path/to/files/ --base-model "accounts/fireworks/models/<BASE_MODEL_ID>"

Next steps