Resource types

The SDK currently supports three types of resources: LLM, Dataset, and SupervisedFineTuningJob.

LLM

class LLM()

Properties:

  • deployment_name str - The full name of the deployment (e.g., accounts/my-account/deployments/my-custom-deployment)
  • deployment_display_name str - The display name of the deployment, defaults to the filename where the LLM was instantiated unless otherwise specified
  • temperature float - The temperature for generation
  • model str - The model associated with this LLM (e.g., accounts/fireworks/models/llama-v3p2-3b-instruct)
  • base_deployment_name str - If a LoRA addon, the deployment name of the base model deployment
  • peft_base_model str - If this is a LoRA addon, the base model identifier (e.g., accounts/fireworks/models/llama-v3p2-3b-instruct)
  • addons_enabled bool - Whether LoRA addons are enabled for this LLM
  • model_id str - The identifier used under the hood to query this model (e.g., accounts/my-account/deployedModels/my-deployed-model-abcdefg)

Instantiation

The LLM(*args, **kwargs) class constructor initializes a new LLM instance.

from fireworks.client import LLM
from datetime import timedelta

# Basic usage with required parameters
llm = LLM(
    model="accounts/fireworks/models/llama-v3p2-3b-instruct",
    deployment_type="auto"
)

# Advanced usage with optional parameters
llm = LLM(
    model="accounts/fireworks/models/llama-v3p2-3b-instruct",
    deployment_type="on-demand",
    deployment_name="my-custom-deployment",
    accelerator_type="NVIDIA_H100_80GB",
    min_replica_count=1,
    max_replica_count=3,
    scale_up_window=timedelta(seconds=30),
    scale_down_window=timedelta(minutes=10),
    enable_metrics=True
)

Required Arguments

  • model str - The model identifier to use (e.g., accounts/fireworks/models/llama-v3p2-3b-instruct)
  • deployment_type str - The type of deployment to use. Must be one of:
    • "serverless": Uses Fireworks’ shared serverless infrastructure
    • "on-demand": Uses dedicated resources for your deployment
    • "auto": Automatically selects the most cost-effective option (recommended for experimentation)

Optional Arguments

Deployment Configuration

  • deployment_name str, optional - Name to identify the deployment. If not provided, Fireworks will auto-generate one. If a deployment with the same name already exists, the SDK will try and re-use it.
  • deployment_display_name str, optional - Display name for the deployment. Defaults to the filename where the LLM was instantiated. If a deployment with the same display name and model already exists, the SDK will try and re-use it.
  • base_deployment_name str, optional - Base deployment name for LoRA addons. If not provided, will try to find a base model deployment that can be reused.

Authentication & API

  • api_key str, optional - Your Fireworks API key
  • base_url str, optional - Base URL for API calls. Defaults to “https://api.fireworks.ai/inference/v1
  • max_retries int, optional - Maximum number of retry attempts. Defaults to 3

Scaling Configuration

  • scale_up_window timedelta, optional - Time to wait before scaling up after increased load. Defaults to 1 second
  • scale_down_window timedelta, optional - Time to wait before scaling down after decreased load. Defaults to 1 minute
  • scale_to_zero_window timedelta, optional - Time of inactivity before scaling to zero. Defaults to 5 minutes

Hardware & Performance

  • accelerator_type str, optional - Type of GPU accelerator to use
  • region str, optional - Region for deployment
  • min_replica_count int, optional - Minimum number of replicas
  • max_replica_count int, optional - Maximum number of replicas
  • replica_count int, optional - Fixed number of replicas
  • accelerator_count int, optional - Number of accelerators per replica
  • precision str, optional - Model precision (e.g., “FP16”, “FP8”)
  • max_batch_size int, optional - Maximum batch size for inference

Advanced Features

  • enable_addons bool, optional - Enable LoRA addons support
  • draft_token_count int, optional - Number of tokens to generate per step for speculative decoding
  • draft_model str, optional - Model to use for speculative decoding
  • ngram_speculation_length int, optional - Length of previous input sequence for N-gram speculation
  • long_prompt_optimized bool, optional - Optimize for long prompts
  • temperature float, optional - Sampling temperature for generation

Monitoring & Metrics

  • enable_metrics bool, optional - Enable metrics collection. Currently supports time to last token for non-streaming requests.

Additional Configuration

  • description str, optional - Description of the deployment
  • cluster str, optional - Cluster identifier
  • enable_session_affinity bool, optional - Enable session affinity
  • direct_route_api_keys list[str], optional - List of API keys for direct routing
  • direct_route_type str, optional - Type of direct routing

create_supervised_fine_tuning_job()

Creates a new supervised fine-tuning job and blocks until it is ready. See the SupervisedFineTuningJob section for details on the parameters.

Returns:

  • An instance of SupervisedFineTuningJob.
job = llm.create_supervised_fine_tuning_job(
    name="my-fine-tuning-job",
    dataset_or_id=dataset,
    epochs=3,
    learning_rate=1e-5
)

delete_deployment()

Deletes the deployment associated with this LLM instance if one exists.

Arguments:

  • ignore_checks bool, optional - Whether to ignore safety checks. Defaults to False.
llm.delete_deployment(ignore_checks=True)

get_time_to_last_token_mean()

Returns the mean time to last token for non-streaming requests. If no metrics are available, returns None.

Returns:

  • A float representing the mean time to last token, or None if no metrics are available.
time_to_last_token_mean = llm.get_time_to_last_token_mean()

Dataset

The Dataset class provides a convenient way to manage datasets for fine-tuning on Fireworks. It offers smart features like automatic naming and uploading of datasets. You do not instantiate a Dataset object directly. Instead, you create a Dataset object by using one of the class methods below.

Properties:

  • name str - The name of the dataset

from_list()

@classmethod
from_list(data: list)

Creates a Dataset from a list of training examples. Each example should be compatible with OpenAI’s chat completion format.

from fireworks.client import Dataset

# Create dataset from a list of examples
examples = [
    {
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is the capital of France?"},
            {"role": "assistant", "content": "Paris."}
        ]
    }
]
dataset = Dataset.from_list(examples)

from_file()

@classmethod
from_file(path: str)

Creates a Dataset from a local JSONL file. The file should contain training examples in OpenAI’s chat completion format.

from fireworks.client import Dataset

# Create dataset from a JSONL file
dataset = Dataset.from_file("path/to/training_data.jsonl")

from_string()

@classmethod
from_string(data: str)

Creates a Dataset from a string containing JSONL-formatted training examples.

from fireworks.client import Dataset

# Create dataset from a JSONL string
jsonl_data = """
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}, {"role": "assistant", "content": "Hi there!"}]}
{"messages": [{"role": "user", "content": "What is 1+1?"}, {"role": "assistant", "content": "2"}]}
"""
dataset = Dataset.from_string(jsonl_data)

delete()

Deletes the dataset from Fireworks.

dataset = Dataset.from_file("path/to/training_data.jsonl")

dataset.delete()

Data Format

The Dataset class expects data in OpenAI’s chat completion format. Each training example should be a JSON object with a messages array containing message objects. Each message object should have:

  • role: One of "system", "user", or "assistant"
  • content: The message content as a string

Example format:

{
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
        {"role": "assistant", "content": "Paris."}
    ]
}

SupervisedFineTuningJob

The SupervisedFineTuningJob class manages fine-tuning jobs on Fireworks. It provides a convenient interface for creating, monitoring, and managing fine-tuning jobs.

class SupervisedFineTuningJob()

Properties:

  • output_model str - The identifier of the output model (e.g., accounts/my-account/models/my-finetuned-model)
  • output_llm LLM - An LLM instance associated with the output model

Instantiation

You do not need to directly instantiate a SupervisedFineTuningJob object. Instead, you should use the .create_supervised_fine_tuning_job() method on the LLM object and pass in the following required and optional arguments.

Required Arguments

  • name str - A unique name for the fine-tuning job
  • llm LLM - The LLM instance to fine-tune
  • dataset_or_id Union[Dataset, str] - The dataset to use for fine-tuning, either as a Dataset object or dataset ID

Optional Arguments

Training Configuration

  • epochs int, optional - Number of training epochs
  • learning_rate float, optional - Learning rate for training
  • lora_rank int, optional - Rank for LoRA fine-tuning
  • jinja_template str, optional - Template for formatting training examples
  • early_stop bool, optional - Whether to enable early stopping
  • max_context_length int, optional - Maximum context length for the model
  • base_model_weight_precision str, optional - Precision for base model weights
  • batch_size int, optional - Batch size for training

Hardware Configuration

  • accelerator_type str, optional - Type of GPU accelerator to use
  • accelerator_count int, optional - Number of accelerators to use
  • is_turbo bool, optional - Whether to use turbo mode for faster training
  • region str, optional - Region for deployment
  • nodes int, optional - Number of nodes to use

Evaluation & Monitoring

  • evaluation_dataset str, optional - Dataset ID to use for evaluation
  • eval_auto_carveout bool, optional - Whether to automatically carve out evaluation data
  • wandb_config WandbConfig, optional - Configuration for Weights & Biases integration

Job Management

  • id str, optional - Job ID (auto-generated if not provided)
  • api_key str, optional - API key for authentication
  • state JobState, optional - Current state of the job
  • create_time datetime, optional - Time when the job was created
  • update_time datetime, optional - Time when the job was last updated
  • created_by str, optional - User who created the job
  • output_model str, optional - ID of the output model

wait_for_completion()

Polls the job status until it is complete and returns the job object.

job = job.wait_for_completion()

delete()

Deletes the job.

job.delete()