Reference
Resource types
The SDK currently supports three types of resources: LLM
, Dataset
, and SupervisedFineTuningJob
.
LLM
Properties:
deployment_name
str - The full name of the deployment (e.g.,accounts/my-account/deployments/my-custom-deployment
)deployment_display_name
str - The display name of the deployment, defaults to the filename where the LLM was instantiated unless otherwise specifiedtemperature
float - The temperature for generationmodel
str - The model associated with this LLM (e.g.,accounts/fireworks/models/llama-v3p2-3b-instruct
)base_deployment_name
str - If a LoRA addon, the deployment name of the base model deploymentpeft_base_model
str - If this is a LoRA addon, the base model identifier (e.g.,accounts/fireworks/models/llama-v3p2-3b-instruct
)addons_enabled
bool - Whether LoRA addons are enabled for this LLMmodel_id
str - The identifier used under the hood to query this model (e.g.,accounts/my-account/deployedModels/my-deployed-model-abcdefg
)
Instantiation
The LLM(*args, **kwargs)
class constructor initializes a new LLM instance.
Required Arguments
model
str - The model identifier to use (e.g.,accounts/fireworks/models/llama-v3p2-3b-instruct
)deployment_type
str - The type of deployment to use. Must be one of:"serverless"
: Uses Fireworks’ shared serverless infrastructure"on-demand"
: Uses dedicated resources for your deployment"auto"
: Automatically selects the most cost-effective option (recommended for experimentation)
Optional Arguments
Deployment Configuration
deployment_name
str, optional - Name to identify the deployment. If not provided, Fireworks will auto-generate one. If a deployment with the same name already exists, the SDK will try and re-use it.deployment_display_name
str, optional - Display name for the deployment. Defaults to the filename where the LLM was instantiated. If a deployment with the same display name and model already exists, the SDK will try and re-use it.base_deployment_name
str, optional - Base deployment name for LoRA addons. If not provided, will try to find a base model deployment that can be reused.
Authentication & API
api_key
str, optional - Your Fireworks API keybase_url
str, optional - Base URL for API calls. Defaults to “https://api.fireworks.ai/inference/v1”max_retries
int, optional - Maximum number of retry attempts. Defaults to 3
Scaling Configuration
scale_up_window
timedelta, optional - Time to wait before scaling up after increased load. Defaults to 1 secondscale_down_window
timedelta, optional - Time to wait before scaling down after decreased load. Defaults to 1 minutescale_to_zero_window
timedelta, optional - Time of inactivity before scaling to zero. Defaults to 5 minutes
Hardware & Performance
accelerator_type
str, optional - Type of GPU accelerator to useregion
str, optional - Region for deploymentmin_replica_count
int, optional - Minimum number of replicasmax_replica_count
int, optional - Maximum number of replicasreplica_count
int, optional - Fixed number of replicasaccelerator_count
int, optional - Number of accelerators per replicaprecision
str, optional - Model precision (e.g., “FP16”, “FP8”)max_batch_size
int, optional - Maximum batch size for inference
Advanced Features
enable_addons
bool, optional - Enable LoRA addons supportdraft_token_count
int, optional - Number of tokens to generate per step for speculative decodingdraft_model
str, optional - Model to use for speculative decodingngram_speculation_length
int, optional - Length of previous input sequence for N-gram speculationlong_prompt_optimized
bool, optional - Optimize for long promptstemperature
float, optional - Sampling temperature for generation
Monitoring & Metrics
enable_metrics
bool, optional - Enable metrics collection. Currently supports time to last token for non-streaming requests.
Additional Configuration
description
str, optional - Description of the deploymentcluster
str, optional - Cluster identifierenable_session_affinity
bool, optional - Enable session affinitydirect_route_api_keys
list[str], optional - List of API keys for direct routingdirect_route_type
str, optional - Type of direct routing
create_supervised_fine_tuning_job()
Creates a new supervised fine-tuning job and blocks until it is ready. See the SupervisedFineTuningJob section for details on the parameters.
Returns:
- An instance of
SupervisedFineTuningJob
.
delete_deployment()
Deletes the deployment associated with this LLM instance if one exists.
Arguments:
ignore_checks
bool, optional - Whether to ignore safety checks. Defaults to False.
get_time_to_last_token_mean()
Returns the mean time to last token for non-streaming requests. If no metrics are available, returns None.
Returns:
- A float representing the mean time to last token, or None if no metrics are available.
Dataset
The Dataset
class provides a convenient way to manage datasets for fine-tuning on Fireworks. It offers smart features like automatic naming and uploading of datasets. You do not instantiate a Dataset
object directly. Instead, you create a Dataset
object by using one of the class methods below.
Properties:
name
str - The name of the dataset
from_list()
Creates a Dataset from a list of training examples. Each example should be compatible with OpenAI’s chat completion format.
from_file()
Creates a Dataset from a local JSONL file. The file should contain training examples in OpenAI’s chat completion format.
from_string()
Creates a Dataset from a string containing JSONL-formatted training examples.
delete()
Deletes the dataset from Fireworks.
Data Format
The Dataset class expects data in OpenAI’s chat completion format. Each training example should be a JSON object with a messages
array containing message objects. Each message object should have:
role
: One of"system"
,"user"
, or"assistant"
content
: The message content as a string
Example format:
SupervisedFineTuningJob
The SupervisedFineTuningJob
class manages fine-tuning jobs on Fireworks. It provides a convenient interface for creating, monitoring, and managing fine-tuning jobs.
Properties:
output_model
str - The identifier of the output model (e.g.,accounts/my-account/models/my-finetuned-model
)output_llm
LLM - An LLM instance associated with the output model
Instantiation
You do not need to directly instantiate a SupervisedFineTuningJob
object. Instead, you should use the .create_supervised_fine_tuning_job()
method on the LLM
object and pass in the following required and optional arguments.
Required Arguments
name
str - A unique name for the fine-tuning jobllm
LLM - The LLM instance to fine-tunedataset_or_id
Union[Dataset, str] - The dataset to use for fine-tuning, either as a Dataset object or dataset ID
Optional Arguments
Training Configuration
epochs
int, optional - Number of training epochslearning_rate
float, optional - Learning rate for traininglora_rank
int, optional - Rank for LoRA fine-tuningjinja_template
str, optional - Template for formatting training examplesearly_stop
bool, optional - Whether to enable early stoppingmax_context_length
int, optional - Maximum context length for the modelbase_model_weight_precision
str, optional - Precision for base model weightsbatch_size
int, optional - Batch size for training
Hardware Configuration
accelerator_type
str, optional - Type of GPU accelerator to useaccelerator_count
int, optional - Number of accelerators to useis_turbo
bool, optional - Whether to use turbo mode for faster trainingregion
str, optional - Region for deploymentnodes
int, optional - Number of nodes to use
Evaluation & Monitoring
evaluation_dataset
str, optional - Dataset ID to use for evaluationeval_auto_carveout
bool, optional - Whether to automatically carve out evaluation datawandb_config
WandbConfig, optional - Configuration for Weights & Biases integration
Job Management
id
str, optional - Job ID (auto-generated if not provided)api_key
str, optional - API key for authenticationstate
JobState, optional - Current state of the jobcreate_time
datetime, optional - Time when the job was createdupdate_time
datetime, optional - Time when the job was last updatedcreated_by
str, optional - User who created the joboutput_model
str, optional - ID of the output model
wait_for_completion()
Polls the job status until it is complete and returns the job object.
delete()
Deletes the job.