Resource types
The SDK currently supports four types of resources:LLM, Dataset, SupervisedFineTuningJob, and BatchInferenceJob.
LLM
deployment_namestr - The full name of the deployment (e.g.,accounts/my-account/deployments/my-custom-deployment)deployment_display_namestr - The display name of the deployment, defaults to the filename where the LLM was instantiated unless otherwise specifieddeployment_urlstr - The URL to view the deployment in the Fireworks dashboardtemperaturefloat - The temperature for generationmodelstr - The model associated with this LLM (e.g.,accounts/fireworks/models/llama-v3p2-3b-instruct)base_deployment_namestr - If a LoRA addon, the deployment name of the base model deploymentpeft_base_modelstr - If this is a LoRA addon, the base model identifier (e.g.,accounts/fireworks/models/llama-v3p2-3b-instruct)addons_enabledbool - Whether LoRA addons are enabled for this LLMmodel_idstr - The identifier used under the hood to query this model (e.g.,accounts/my-account/deployedModels/my-deployed-model-abcdefg)deployment_idstr - The deployment ID (e.g.,my-custom-deployment)base_deployment_idstr - The base deployment ID for LoRA addonsperf_metrics_in_responsebool - Whether performance metrics are included in responses
Instantiation
TheLLM(*args, **kwargs) class constructor initializes a new LLM instance.
Required Arguments
modelstr - The model identifier to use (e.g.,accounts/fireworks/models/llama-v3p2-3b-instruct)deployment_typestr - The type of deployment to use. Must be one of:"serverless": Uses Fireworks’ shared serverless infrastructure"on-demand": Uses dedicated resources for your deployment"auto": Automatically selects the most cost-effective option (recommended for experimentation)"on-demand-lora": For LoRA addons that require dedicated resources
Optional Arguments
Deployment Configurationidstr, optional - Deployment ID to identify the deployment. Required when deployment_type is “on-demand”. Can be any simple string (e.g.,"my-deployment") - does not need to follow the format"accounts/account_id/deployments/model_id".deployment_display_namestr, optional - Display name for the deployment. Defaults to the filename where the LLM was instantiated. If a deployment with the same display name and model already exists, the SDK will try and re-use it.base_idstr, optional - Base deployment ID for LoRA addons. Required when deployment_type is “on-demand-lora”.
api_keystr, optional - Your Fireworks API keybase_urlstr, optional - Base URL for API calls. Defaults to “https://api.fireworks.ai/inference/v1”max_retriesint, optional - Maximum number of retry attempts. Defaults to 10
scale_up_windowtimedelta, optional - Time to wait before scaling up after increased load. Defaults to 1 secondscale_down_windowtimedelta, optional - Time to wait before scaling down after decreased load. Defaults to 1 minutescale_to_zero_windowtimedelta, optional - Time of inactivity before scaling to zero. Defaults to 5 minutes
accelerator_typestr, optional - Type of GPU accelerator to useregionstr, optional - Region for deploymentmulti_regionstr, optional - Multi-region configurationmin_replica_countint, optional - Minimum number of replicasmax_replica_countint, optional - Maximum number of replicasreplica_countint, optional - Fixed number of replicasaccelerator_countint, optional - Number of accelerators per replicaprecisionstr, optional - Model precision (e.g., “FP16”, “FP8”)world_sizeint, optional - World size for distributed traininggenerator_countint, optional - Number of generatorsdisaggregated_prefill_countint, optional - Number of disaggregated prefill instancesdisaggregated_prefill_world_sizeint, optional - World size for disaggregated prefillmax_batch_sizeint, optional - Maximum batch size for inferencemax_peft_batch_sizeint, optional - Maximum batch size for PEFT operationskv_cache_memory_pctint, optional - Percentage of memory for KV cache
enable_addonsbool, optional - Enable LoRA addons supportlive_mergebool, optional - Enable live mergingdraft_token_countint, optional - Number of tokens to generate per step for speculative decodingdraft_modelstr, optional - Model to use for speculative decodingngram_speculation_lengthint, optional - Length of previous input sequence for N-gram speculationlong_prompt_optimizedbool, optional - Optimize for long promptstemperaturefloat, optional - Sampling temperature for generationnum_peft_device_cachedint, optional - Number of PEFT devices to cache
enable_metricsbool, optional - Enable metrics collection. Currently supports time to last token for non-streaming requests.perf_metrics_in_responsebool, optional - Include performance metrics in API responses
descriptionstr, optional - Description of the deploymentannotationsdict[str, str], optional - Annotations for the deploymentclusterstr, optional - Cluster identifierenable_session_affinitybool, optional - Enable session affinitydirect_route_api_keyslist[str], optional - List of API keys for direct routingdirect_route_typestr, optional - Type of direct routingdirect_route_handlestr, optional - Direct route handle
apply(wait: bool = True)
Ensures the deployment is ready and returns the deployment. Like Terraform apply, this will ensure the deployment is ready.
create_supervised_fine_tuning_job()
Creates a new supervised fine-tuning job and blocks until it is ready. See the SupervisedFineTuningJob section for details on the parameters.
Returns:
- An instance of
SupervisedFineTuningJob.
reinforcement_step()
Performs a reinforcement learning step for training. This method creates a new model checkpoint by fine-tuning the current model on the provided dataset with reinforcement learning.
Arguments:
datasetDataset - The dataset containing training examples with rewardsoutput_modelstr - The name of the output model to createlora_rankint, optional - Rank for LoRA fine-tuning. Defaults to 16learning_ratefloat, optional - Learning rate for training. Defaults to 0.0001max_context_lengthint, optional - Maximum context length for the model. Defaults to 8192epochsint, optional - Number of training epochs. Defaults to 1batch_sizeint, optional - Batch size for training. Defaults to 32768accelerator_countint, optional - Number of accelerators to use for training. Defaults to 1accelerator_typestr, optional - Type of GPU accelerator to use for training. Supported values:"NVIDIA_A100_80GB","NVIDIA_H100_80GB","NVIDIA_H200_141GB". Defaults to"NVIDIA_A100_80GB"
When running on a trained LoRA (i.e., when using a model that is already a LoRA fine-tuned checkpoint), the training parameters (
lora_rank, learning_rate, max_context_length, epochs, batch_size) must always be the same as those used in the original LoRA training. Changing these parameters when continuing training from a LoRA checkpoint is not supported and will result in an error.- An instance of
ReinforcementSteprepresenting the training job
ValueError will be raised.
delete_deployment(ignore_checks: bool = False, wait: bool = True)
Deletes the deployment associated with this LLM instance if one exists.
Arguments:
ignore_checksbool, optional - Whether to ignore safety checks. Defaults to False.waitbool, optional - Whether to wait for deletion to complete. Defaults to True.
get_time_to_last_token_mean()
Returns the mean time to last token for non-streaming requests. If no metrics are available, returns None.
Returns:
- A float representing the mean time to last token, or None if no metrics are available.
with_deployment_type()
Returns a new LLM instance with the specified deployment type.
Arguments:
deployment_typestr - The deployment type to use (“serverless”, “on-demand”, “auto”, or “on-demand-lora”)
- A new
LLMinstance with the specified deployment type
with_temperature()
Returns a new LLM instance with the specified temperature.
Arguments:
temperaturefloat - The temperature for generation
- A new
LLMinstance with the specified temperature
with_perf_metrics_in_response()
Returns a new LLM instance with the specified performance metrics setting.
Arguments:
perf_metrics_in_responsebool - Whether to include performance metrics in responses
- A new
LLMinstance with the specified performance metrics setting
scale_to_zero()
Sends a request to scale the deployment to 0 replicas but does not wait for it to complete.
Returns:
- The deployment object, or None if no deployment exists
scale_to_1_replica()
Scales the deployment to at least 1 replica.
get_deployment()
Returns the deployment associated with this LLM instance, or None if no deployment exists.
Returns:
- The deployment object, or None if no deployment exists
is_peft_addon()
Checks if this LLM is a PEFT (Parameter-Efficient Fine-Tuning) addon.
Returns:
- True if this LLM is a PEFT addon, False otherwise
list_models()
Lists all models available to your account.
Returns:
- A list of model objects
get_model()
Gets the model object for this LLM’s model.
Returns:
- The model object, or None if the model doesn’t exist
is_available_on_serverless()
Checks if the model is available on serverless infrastructure.
Returns:
- True if the model is available on serverless, False otherwise
model_id()
Returns the model ID, which is the model name plus the deployment name if it exists. This is used for the “model” arg when calling the model.
Returns:
- The model ID string
supports_serverless_lora()
Checks if the model supports serverless LoRA deployment.
Returns:
- True if the model supports serverless LoRA, False otherwise
list_fireworks_models()
Lists all models available on the Fireworks account.
Returns:
- A list of model objects from the Fireworks account
is_model_on_fireworks_account()
Checks if the model is on the Fireworks account.
Arguments:
modelstr - The model identifier to check
- The model object if it exists on the Fireworks account, None otherwise
is_model_available_on_serverless()
Checks if a specific model is available on serverless infrastructure.
Arguments:
modelstr - The model identifier to check
- True if the model is available on serverless, False otherwise
is_model_deployed_on_serverless_account()
Checks if a model is deployed on a serverless-enabled account.
Arguments:
modelSyncModel - The model object to check
- True if the model is deployed on a supported serverless account, False otherwise
completions.create() and completions.acreate()
Creates a text completion using the LLM. These methods are OpenAI compatible and follow the same interface as described in the OpenAI Completions API. Use create() for synchronous calls and acreate() for asynchronous calls.
Arguments:
promptstr - The prompt to completestreambool, optional - Whether to stream the response. Defaults to Falseimageslist[str], optional - List of image URLs for multimodal modelsmax_tokensint, optional - The maximum number of tokens to generatelogprobsint, optional - Number of log probabilities to returnechobool, optional - Whether to echo the prompt in the responsetemperaturefloat, optional - Sampling temperature between 0 and 2. If not provided, uses the LLM’s default temperaturetop_pfloat, optional - Nucleus sampling parametertop_kint, optional - Top-k sampling parameter (must be between 0 and 100)frequency_penaltyfloat, optional - Frequency penalty for repetitionpresence_penaltyfloat, optional - Presence penalty for repetitionrepetition_penaltyfloat, optional - Repetition penaltyreasoning_effortstr, optional - How much effort the model should put into reasoningmirostat_lrfloat, optional - Mirostat learning ratemirostat_targetfloat, optional - Mirostat target entropynint, optional - Number of completions to generateignore_eosbool, optional - Whether to ignore end-of-sequence tokensstopstr or list[str], optional - Stop sequencesresponse_formatdict, optional - An object specifying the format that the model must outputcontext_length_exceeded_behaviorstr, optional - How to handle context length exceededuserstr, optional - User identifierextra_headersdict, optional - Additional headers to include in the request**kwargs- Additional parameters supported by the OpenAI API
Completionwhenstream=False(default)Generator[Completion, None, None]whenstream=True(sync version)AsyncGenerator[Completion, None]whenstream=True(async version)
chat.completions.create() and chat.completions.acreate()
Creates a chat completion using the LLM. These methods are OpenAI compatible and follow the same interface as described in the OpenAI Chat Completions API. Use create() for synchronous calls and acreate() for asynchronous calls.
Note: The Fireworks chat completions API includes additional request and response fields beyond the standard OpenAI API. See the Fireworks Chat Completions API reference for the complete set of available parameters and response fields.
Arguments:
messageslist - A list of messages comprising the conversation so farstreambool, optional - Whether to stream the response. Defaults to Falseresponse_formatdict, optional - An object specifying the format that the model must outputreasoning_effortstr, optional - How much effort the model should put into reasoningmax_tokensint, optional - The maximum number of tokens to generatetemperaturefloat, optional - Sampling temperature between 0 and 2. If not provided, uses the LLM’s default temperature. Note that temperature can also be set once during LLM instantiation if preferredtoolslist, optional - A list of tools the model may callextra_headersdict, optional - Additional headers to include in the request**kwargs- Additional parameters supported by the OpenAI API
ChatCompletionwhenstream=False(default)Generator[ChatCompletionChunk, None, None]whenstream=True(sync version)AsyncGenerator[ChatCompletionChunk, None]whenstream=True(async version)
ChatCompletion object structure, see the OpenAI Chat Completion Object documentation. For the ChatCompletionChunk object structure used in streaming, see the OpenAI Chat Streaming documentation.
Dataset
TheDataset class provides a convenient way to manage datasets for fine-tuning on Fireworks. It offers smart features like automatic naming and uploading of datasets. You do not instantiate a Dataset object directly. Instead, you create a Dataset object by using one of the class methods below.
Properties:
namestr - The full name of the dataset (e.g.,accounts/my-account/datasets/dataset-12345-my-data)idstr - The dataset identifier (e.g.,dataset-12345-my-data)urlstr - The URL to view the dataset in the Fireworks dashboard
from_list()
from_file()
from_string()
from_id()
sync()
Uploads the dataset to Fireworks if it doesn’t already exist. This method automatically:
- Checks if a dataset with the same content hash already exists
- If it exists, skips the upload to avoid duplicates
- If it doesn’t exist, creates and uploads the dataset to Fireworks
- Validates the dataset after upload
delete()
Deletes the dataset from Fireworks.
head(n: int = 5, as_dataset: bool = False)
Returns the first n rows of the dataset.
Arguments:
nint, optional - Number of rows to return. Defaults to 5.as_datasetbool, optional - If True, return a Dataset object; if False, return a list. Defaults to False.
- list or Dataset - List of dictionaries if as_dataset=False, Dataset object if as_dataset=True
create_evaluation_job(reward_function: Callable, samples: Optional[int] = None)
Creates an evaluation job using a reward function for this dataset.
Arguments:
reward_functionCallable - A callable decorated with @reward_functionsamplesint, optional - Optional number of samples to evaluate (creates a subset dataset)
- EvaluationJob - The created evaluation job
preview_evaluator(reward_function: Callable, samples: Optional[int] = None)
Previews the evaluator for the dataset.
Arguments:
reward_functionCallable - A callable decorated with @reward_functionsamplesint, optional - Optional number of samples to preview
- SyncPreviewEvaluatorResponse - Preview response from the evaluator
Data Format
The Dataset class expects data in OpenAI’s chat completion format. Each training example should be a JSON object with amessages array containing message objects. Each message object should have:
role: One of"system","user", or"assistant"content: The message content as a string
SupervisedFineTuningJob
TheSupervisedFineTuningJob class manages fine-tuning jobs on Fireworks. It provides a convenient interface for creating, monitoring, and managing fine-tuning jobs.
output_modelstr - The identifier of the output model (e.g.,accounts/my-account/models/my-finetuned-model)output_llmLLM - An LLM instance associated with the output modelidstr - The job IDdisplay_namestr - The display name of the jobnamestr - The full name of the joburlstr - The URL to view the job in the Fireworks dashboard
Instantiation
You do not need to directly instantiate aSupervisedFineTuningJob object. Instead, you should use the .create_supervised_fine_tuning_job() method on the LLM object and pass in the following required and optional arguments.
Required Arguments
display_namestr - A unique name for the fine-tuning job. Must only contain lowercase a-z, 0-9, and hyphen (-).dataset_or_idUnion[Dataset, str] - The dataset to use for fine-tuning, either as a Dataset object or dataset ID
Optional Arguments
Training Configurationepochsint, optional - Number of training epochslearning_ratefloat, optional - Learning rate for traininglora_rankint, optional - Rank for LoRA fine-tuningjinja_templatestr, optional - Template for formatting training examplesearly_stopbool, optional - Whether to enable early stoppingmax_context_lengthint, optional - Maximum context length for the modelbase_model_weight_precisionstr, optional - Precision for base model weightsbatch_sizeint, optional - Batch size for training
accelerator_typestr, optional - Type of GPU accelerator to useaccelerator_countint, optional - Number of accelerators to useis_turbobool, optional - Whether to use turbo mode for faster trainingregionstr, optional - Region for deploymentnodesint, optional - Number of nodes to use
evaluation_datasetstr, optional - Dataset ID to use for evaluationeval_auto_carveoutbool, optional - Whether to automatically carve out evaluation datawandb_configWandbConfig, optional - Configuration for Weights & Biases integration
output_modelstr, optional - The name of the output model to create. If not provided, it will be the same as the display_name argument.
sync()
Creates the job if it doesn’t exist, otherwise returns the existing job. If previous job failed, deletes it and creates a new one.
Returns:
- SupervisedFineTuningJob - The synced job object
wait_for_completion()
Polls the job status until it is complete and returns the job object.
Returns:
- SupervisedFineTuningJob - The completed job object
await_for_completion()
Asynchronously polls the job status until it is complete and returns the job object.
Returns:
- SupervisedFineTuningJob - The completed job object
delete()
Deletes the job.
adelete()
Asynchronously deletes the job.
ReinforcementStep
TheReinforcementStep class represents a reinforcement learning training step.
It provides methods to monitor and manage the training process.
statestr - The current state of the training job (e.g., “JOB_STATE_RUNNING”, “JOB_STATE_COMPLETED”)output_modelstr - The identifier of the output model (e.g.,accounts/my-account/models/my-improved-model)is_completedbool - Whether the training job has completed successfully
get()
Retrieves the current state of the training job from the server.
Returns:
- A
ReinforcementStepobject with updated state, orNoneif the job no longer exists
raise_if_bad_state()
Raises a RuntimeError if the job is in a failed, cancelled, or otherwise bad state. This is useful for error handling during training.
Raises:
RuntimeError- If the job is in a bad state (failed, cancelled, expired, etc.)
Usage Example
Iterative Reinforcement Learning Workflow
Thereinforcement_step method is designed to support iterative reinforcement learning workflows. Here’s a complete example showing how to perform multiple reinforcement learning steps:
- Uses the current model snapshot to generate rollouts
- For each prompt, generates multiple responses (required for Policy Optimization)
- Evaluates each response and computes rewards
- Creates a dataset with the rollouts and rewards (each sample contains multiple generations)
- Performs a reinforcement learning step to create an improved model
Each sample in the dataset must contain multiple trajectories for the same prompt. This is required for policy optimization to work.
BatchInferenceJob
TheBatchInferenceJob class provides a convenient way to manage batch inference jobs on Fireworks. It allows you to perform bulk asynchronous inference on large datasets, reducing costs by up to 50%.
namestr - The full name of the batch inference job (e.g.,accounts/my-account/batchInferenceJobs/test-job-123)idstr - The job identifier (e.g.,test-job-123)modelstr - The model used for inferenceinput_dataset_idstr - The input dataset identifieroutput_dataset_idstr - The output dataset identifierstatestr - The current state of the jobcreated_bystr - Email of the user who created the jobcreate_timestr - Creation timestampupdate_timestr - Last update timestamp
create()
modelstr - The model to use for inference (e.g.,llama-v3p1-8b-instructoraccounts/fireworks/models/llama-v3p1-8b-instruct)input_dataset_idstr - The input dataset ID containing JSONL formatted requestsoutput_dataset_idstr, optional - The output dataset ID. If not provided, one will be auto-generatedjob_idstr, optional - The job ID. If not provided, one will be auto-generateddisplay_namestr, optional - Display name for the jobinference_parametersdict, optional - Dict of inference parameters:max_tokensint - Maximum number of tokens to generatetemperaturefloat - Sampling temperature (0-2)top_pfloat - Top-p sampling parametertop_kint - Top-k sampling parameternint - Number of completions per requestextra_bodystr - Additional parameters as JSON string
api_keystr, optional - The API key to use
- A
BatchInferenceJobobject
get()
job_idstr - The job ID or full resource nameaccountstr - Account IDapi_keystr, optional - The API key to use
- A
BatchInferenceJobobject if found,Noneotherwise
list()
accountstr - Account IDapi_keystr, optional - The API key to usepage_sizeint, optional - Number of jobs to return per page. Defaults to 50
- A list of
BatchInferenceJobobjects
delete()
job_idstr - The job ID or full resource nameaccountstr - Account IDapi_keystr, optional - The API key to use
to_dict()
protoBatchInferenceJob - The batch inference job proto object
- A dictionary with human-readable field values