- Multiple upload options – Upload from local files or directly from S3 buckets or Azure Blob Storage
- Secure uploads – All uploads are encrypted and models remain private to your account by default
Requirements
Supported architectures
Fireworks supports most popular model architectures, including:- DeepSeek V1, V2 & V3
- Qwen, Qwen2, Qwen2.5, Qwen2.5-VL, Qwen3
- Kimi K2 family
- GLM 4.X family
- Llama 1, 2, 3, 3.1, 4
- Mistral & Mixtral
- Gemma
- GPT-OSS 120B and 20B
View all supported architectures
View all supported architectures
Required files
You’ll need standard Hugging Face model files:config.json, model weights (.safetensors or .bin), and tokenizer files.
View detailed file requirements
View detailed file requirements
The model files you will need to provide depend on the model architecture. In general, you will need:
-
Model configuration:
config.jsonFireworks does not support thequantization_configoption inconfig.json. -
Model weights in one of the following formats:
*.safetensors*.bin
-
Weights index:
*.index.json -
Tokenizer file(s), e.g.:
tokenizer.modeltokenizer.jsontokenizer_config.json
Customizing base model configuration
For base models (not LoRA adapters), you can customize the chat template and generation defaults by modifying the standard Hugging Face configuration files:- Chat template: Add or modify the
chat_templatefield intokenizer_config.json. See the Hugging Face guide on Templates for Chat Models for details. - Generation defaults: Modify
generation_config.jsonto set default generation parameters likemax_new_tokens,temperature,top_p, etc.
fireworks.json file with base models. If present, fireworks.json takes priority over both tokenizer_config.json and generation_config.json. See Customizing chat template and generation defaults for the full fireworks.json schema.
Uploading your model
For larger models, you can upload directly from cloud storage (S3 or Azure Blob Storage) for faster transfer instead of uploading from your local machine.- Local files (CLI)
- S3 bucket (CLI)
- Azure Blob Storage (CLI)
- REST API
Upload from your local machine:
If you’re uploading an embedding model, add the
--embedding flag.Verifying your upload
After uploading, verify your model is ready to deploy:State: READY in the output. Once ready, you can create a deployment.
Deploying your model
Once your model showsState: READY, create a deployment:
Publishing your model
By default, models are private to your account. Publish a model to make it available to other Fireworks users. When published:- Listed in the public model catalog
- Deployable by anyone with a Fireworks account
- Still hosted and controlled by your account
Importing fine-tuned models
In addition to models you fine-tune on the Fireworks platform, you can also upload your own custom fine-tuned models as LoRA adapters.Requirements
Your custom LoRA addon must contain the following files:adapter_config.json- The Hugging Face adapter configuration fileadapter_model.binoradapter_model.safetensors- The saved addon file
adapter_config.json must contain the following fields:
r- The number of LoRA ranks. Must be an integer between 4 and 64, inclusivetarget_modules- A list of target modules. Currently the following target modules are supported:q_projk_projv_projo_projup_projorw1down_projorw2gate_projorw3block_sparse_moe.gate
Customizing chat template and generation defaults
For LoRA adapters, use afireworks.json file to customize the chat template and generation defaults. This is the recommended approach because adapters inherit configuration from their base model—modifying generation_config.json or tokenizer_config.json in the adapter folder won’t work.
Add a fireworks.json file to the directory containing your adapter files:
fireworks.json
conversation_config options
conversation_config options
| Field | Required | Description |
|---|---|---|
style | Yes | Template style (see supported styles below) |
args.template | Yes (for jinja) | Jinja2 template string for formatting messages |
args.system | No | Default system prompt |
args.special_tokens_map | No | Token mappings for bos_token, eos_token, unk_token |
args.keep_leading_spaces | No | Preserve leading whitespace in the template output |
args.function_call_prefix | No | Prefix for tool/function calls |
args.function_call_suffix | No | Suffix for tool/function calls |
args.disable_grammar | No | Disable grammar constraints |
| Style | Description |
|---|---|
jinja | Custom Jinja2 template (requires args.template) |
huggingface | Uses the model’s HuggingFace chat template |
alpaca | Alpaca instruction format |
chatml | ChatML format |
codellama-70b-instruct | CodeLlama 70B instruction format |
deepseek | DeepSeek format |
deepseek-v3p1 | DeepSeek V3.1 format |
deepseek-v3p2 | DeepSeek V3.2 format |
glm | GLM format |
glm_47 | GLM 4.7 format |
harmony | Harmony format |
kimi | Kimi format |
kimi-k2-instruct | Kimi K2 instruction format |
llama-chat | Llama chat format |
llama-infill | Llama infilling format |
llama4 | Llama 4 format |
minimax | MiniMax format |
minimax_m2 | MiniMax M2 format |
mistral-chat | Mistral chat format |
passthrough | No formatting applied |
qwen2 | Qwen2 format |
qwen3 | Qwen3 format |
qwen3-coder | Qwen3 Coder format |
qwen3-vl | Qwen3 Vision-Language format |
qwen3-vl-moe | Qwen3 Vision-Language MoE format |
stablelm-zephyr | StableLM Zephyr format |
vicuna | Vicuna chat format |
defaults options
defaults options
These defaults are applied when the user doesn’t specify values in their API request:
| Field | Type | Example | Description |
|---|---|---|---|
stop | array | ["<|im_end|>", "</s>"] | Default stop sequences |
max_tokens | integer | 1024 | Default maximum tokens to generate |
temperature | float | 0.7 | Default sampling temperature |
top_k | integer | 50 | Default top-k sampling |
top_p | float | 0.9 | Default nucleus sampling probability |
min_p | float | 0.0 | Default minimum probability threshold |
typical_p | float | 1.0 | Default typical sampling probability |
frequency_penalty | float | 0.0 | Default frequency penalty |
presence_penalty | float | 0.0 | Default presence penalty |
repetition_penalty | float | 1.0 | Default repetition penalty |
Additional options
Additional options
| Field | Default | Description |
|---|---|---|
model_arch | null | Model architecture (e.g., "qwen2", "llama"). Usually auto-detected from base model |
model_config_name | null | Model configuration name (e.g., "4B"). Usually auto-detected from base model |
has_lora | true | Set to true for LoRA adapters |
has_teft | false | Set to true if using TEFT (Token-Efficient Fine-Tuning) |
All fields in
fireworks.json are optional except for conversation_config.style when customizing the chat template. Include only the fields you need to override.Uploading the LoRA adapter
To upload a LoRA addon, run the following command. The MODEL_ID is an arbitrary resource ID to refer to the model within Fireworks.Only some base models support LoRA addons.