LLM
, Dataset
, SupervisedFineTuningJob
, and BatchInferenceJob
.
deployment_name
str - The full name of the deployment (e.g., accounts/my-account/deployments/my-custom-deployment
)deployment_display_name
str - The display name of the deployment, defaults to the filename where the LLM was instantiated unless otherwise specifieddeployment_url
str - The URL to view the deployment in the Fireworks dashboardtemperature
float - The temperature for generationmodel
str - The model associated with this LLM (e.g., accounts/fireworks/models/llama-v3p2-3b-instruct
)base_deployment_name
str - If a LoRA addon, the deployment name of the base model deploymentpeft_base_model
str - If this is a LoRA addon, the base model identifier (e.g., accounts/fireworks/models/llama-v3p2-3b-instruct
)addons_enabled
bool - Whether LoRA addons are enabled for this LLMmodel_id
str - The identifier used under the hood to query this model (e.g., accounts/my-account/deployedModels/my-deployed-model-abcdefg
)deployment_id
str - The deployment ID (e.g., my-custom-deployment
)base_deployment_id
str - The base deployment ID for LoRA addonsperf_metrics_in_response
bool - Whether performance metrics are included in responsesLLM(*args, **kwargs)
class constructor initializes a new LLM instance.
model
str - The model identifier to use (e.g., accounts/fireworks/models/llama-v3p2-3b-instruct
)deployment_type
str - The type of deployment to use. Must be one of:
"serverless"
: Uses Fireworks’ shared serverless infrastructure"on-demand"
: Uses dedicated resources for your deployment"auto"
: Automatically selects the most cost-effective option (recommended for experimentation)"on-demand-lora"
: For LoRA addons that require dedicated resourcesid
str, optional - Deployment ID to identify the deployment. Required when deployment_type is “on-demand”. Can be any simple string (e.g., "my-deployment"
) - does not need to follow the format "accounts/account_id/deployments/model_id"
.deployment_display_name
str, optional - Display name for the deployment. Defaults to the filename where the LLM was instantiated. If a deployment with the same display name and model already exists, the SDK will try and re-use it.base_id
str, optional - Base deployment ID for LoRA addons. Required when deployment_type is “on-demand-lora”.api_key
str, optional - Your Fireworks API keybase_url
str, optional - Base URL for API calls. Defaults to “https://api.fireworks.ai/inference/v1”max_retries
int, optional - Maximum number of retry attempts. Defaults to 10scale_up_window
timedelta, optional - Time to wait before scaling up after increased load. Defaults to 1 secondscale_down_window
timedelta, optional - Time to wait before scaling down after decreased load. Defaults to 1 minutescale_to_zero_window
timedelta, optional - Time of inactivity before scaling to zero. Defaults to 5 minutesaccelerator_type
str, optional - Type of GPU accelerator to useregion
str, optional - Region for deploymentmulti_region
str, optional - Multi-region configurationmin_replica_count
int, optional - Minimum number of replicasmax_replica_count
int, optional - Maximum number of replicasreplica_count
int, optional - Fixed number of replicasaccelerator_count
int, optional - Number of accelerators per replicaprecision
str, optional - Model precision (e.g., “FP16”, “FP8”)world_size
int, optional - World size for distributed traininggenerator_count
int, optional - Number of generatorsdisaggregated_prefill_count
int, optional - Number of disaggregated prefill instancesdisaggregated_prefill_world_size
int, optional - World size for disaggregated prefillmax_batch_size
int, optional - Maximum batch size for inferencemax_peft_batch_size
int, optional - Maximum batch size for PEFT operationskv_cache_memory_pct
int, optional - Percentage of memory for KV cacheenable_addons
bool, optional - Enable LoRA addons supportlive_merge
bool, optional - Enable live mergingdraft_token_count
int, optional - Number of tokens to generate per step for speculative decodingdraft_model
str, optional - Model to use for speculative decodingngram_speculation_length
int, optional - Length of previous input sequence for N-gram speculationlong_prompt_optimized
bool, optional - Optimize for long promptstemperature
float, optional - Sampling temperature for generationnum_peft_device_cached
int, optional - Number of PEFT devices to cacheenable_metrics
bool, optional - Enable metrics collection. Currently supports time to last token for non-streaming requests.perf_metrics_in_response
bool, optional - Include performance metrics in API responsesdescription
str, optional - Description of the deploymentannotations
dict[str, str], optional - Annotations for the deploymentcluster
str, optional - Cluster identifierenable_session_affinity
bool, optional - Enable session affinitydirect_route_api_keys
list[str], optional - List of API keys for direct routingdirect_route_type
str, optional - Type of direct routingdirect_route_handle
str, optional - Direct route handleapply(wait: bool = True)
create_supervised_fine_tuning_job()
SupervisedFineTuningJob
.create_reinforcement_fine_tuning_job()
ReinforcementFineTuningJob
.reinforcement_step()
dataset
Dataset - The dataset containing training examples with rewardsoutput_model
str - The name of the output model to createlora_rank
int, optional - Rank for LoRA fine-tuning. Defaults to 16learning_rate
float, optional - Learning rate for training. Defaults to 0.0001max_context_length
int, optional - Maximum context length for the model. Defaults to 8192epochs
int, optional - Number of training epochs. Defaults to 1batch_size
int, optional - Batch size for training. Defaults to 32768lora_rank
, learning_rate
, max_context_length
, epochs
, batch_size
) must always be the same as those used in the original LoRA training. Changing these parameters when continuing training from a LoRA checkpoint is not supported and will result in an error.ReinforcementStep
representing the training jobValueError
will be raised.
delete_deployment(ignore_checks: bool = False, wait: bool = True)
ignore_checks
bool, optional - Whether to ignore safety checks. Defaults to False.wait
bool, optional - Whether to wait for deletion to complete. Defaults to True.get_time_to_last_token_mean()
with_deployment_type()
deployment_type
str - The deployment type to use (“serverless”, “on-demand”, “auto”, or “on-demand-lora”)LLM
instance with the specified deployment typewith_temperature()
temperature
float - The temperature for generationLLM
instance with the specified temperaturewith_perf_metrics_in_response()
perf_metrics_in_response
bool - Whether to include performance metrics in responsesLLM
instance with the specified performance metrics settingscale_to_zero()
scale_to_1_replica()
get_deployment()
is_peft_addon()
list_models()
get_model()
is_available_on_serverless()
model_id()
supports_serverless_lora()
list_fireworks_models()
is_model_on_fireworks_account()
model
str - The model identifier to checkis_model_available_on_serverless()
model
str - The model identifier to checkis_model_deployed_on_serverless_account()
model
SyncModel - The model object to checkcompletions.create()
and completions.acreate()
create()
for synchronous calls and acreate()
for asynchronous calls.
Arguments:
prompt
str - The prompt to completestream
bool, optional - Whether to stream the response. Defaults to Falseimages
list[str], optional - List of image URLs for multimodal modelsmax_tokens
int, optional - The maximum number of tokens to generatelogprobs
int, optional - Number of log probabilities to returnecho
bool, optional - Whether to echo the prompt in the responsetemperature
float, optional - Sampling temperature between 0 and 2. If not provided, uses the LLM’s default temperaturetop_p
float, optional - Nucleus sampling parametertop_k
int, optional - Top-k sampling parameter (must be between 0 and 100)frequency_penalty
float, optional - Frequency penalty for repetitionpresence_penalty
float, optional - Presence penalty for repetitionrepetition_penalty
float, optional - Repetition penaltyreasoning_effort
str, optional - How much effort the model should put into reasoningmirostat_lr
float, optional - Mirostat learning ratemirostat_target
float, optional - Mirostat target entropyn
int, optional - Number of completions to generateignore_eos
bool, optional - Whether to ignore end-of-sequence tokensstop
str or list[str], optional - Stop sequencesresponse_format
dict, optional - An object specifying the format that the model must outputcontext_length_exceeded_behavior
str, optional - How to handle context length exceededuser
str, optional - User identifierextra_headers
dict, optional - Additional headers to include in the request**kwargs
- Additional parameters supported by the OpenAI APICompletion
when stream=False
(default)Generator[Completion, None, None]
when stream=True
(sync version)AsyncGenerator[Completion, None]
when stream=True
(async version)chat.completions.create()
and chat.completions.acreate()
create()
for synchronous calls and acreate()
for asynchronous calls.
Note: The Fireworks chat completions API includes additional request and response fields beyond the standard OpenAI API. See the Fireworks Chat Completions API reference for the complete set of available parameters and response fields.
Arguments:
messages
list - A list of messages comprising the conversation so farstream
bool, optional - Whether to stream the response. Defaults to Falseresponse_format
dict, optional - An object specifying the format that the model must outputreasoning_effort
str, optional - How much effort the model should put into reasoningmax_tokens
int, optional - The maximum number of tokens to generatetemperature
float, optional - Sampling temperature between 0 and 2. If not provided, uses the LLM’s default temperature. Note that temperature can also be set once during LLM instantiation if preferredtools
list, optional - A list of tools the model may callextra_headers
dict, optional - Additional headers to include in the request**kwargs
- Additional parameters supported by the OpenAI APIChatCompletion
when stream=False
(default)Generator[ChatCompletionChunk, None, None]
when stream=True
(sync version)AsyncGenerator[ChatCompletionChunk, None]
when stream=True
(async version)ChatCompletion
object structure, see the OpenAI Chat Completion Object documentation. For the ChatCompletionChunk
object structure used in streaming, see the OpenAI Chat Streaming documentation.
Dataset
class provides a convenient way to manage datasets for fine-tuning on Fireworks. It offers smart features like automatic naming and uploading of datasets. You do not instantiate a Dataset
object directly. Instead, you create a Dataset
object by using one of the class methods below.
Properties:
name
str - The full name of the dataset (e.g., accounts/my-account/datasets/dataset-12345-my-data
)id
str - The dataset identifier (e.g., dataset-12345-my-data
)url
str - The URL to view the dataset in the Fireworks dashboardfrom_list()
from_file()
from_string()
from_id()
sync()
delete()
head(n: int = 5, as_dataset: bool = False)
n
int, optional - Number of rows to return. Defaults to 5.as_dataset
bool, optional - If True, return a Dataset object; if False, return a list. Defaults to False.create_evaluation_job(reward_function: Callable, samples: Optional[int] = None)
reward_function
Callable - A callable decorated with @reward_functionsamples
int, optional - Optional number of samples to evaluate (creates a subset dataset)preview_evaluator(reward_function: Callable, samples: Optional[int] = None)
reward_function
Callable - A callable decorated with @reward_functionsamples
int, optional - Optional number of samples to previewmessages
array containing message objects. Each message object should have:
role
: One of "system"
, "user"
, or "assistant"
content
: The message content as a stringSupervisedFineTuningJob
class manages fine-tuning jobs on Fireworks. It provides a convenient interface for creating, monitoring, and managing fine-tuning jobs.
output_model
str - The identifier of the output model (e.g., accounts/my-account/models/my-finetuned-model
)output_llm
LLM - An LLM instance associated with the output modelid
str - The job IDdisplay_name
str - The display name of the jobname
str - The full name of the joburl
str - The URL to view the job in the Fireworks dashboardSupervisedFineTuningJob
object. Instead, you should use the .create_supervised_fine_tuning_job()
method on the LLM
object and pass in the following required and optional arguments.
display_name
str - A unique name for the fine-tuning job. Must only contain lowercase a-z, 0-9, and hyphen (-).dataset_or_id
Union[Dataset, str] - The dataset to use for fine-tuning, either as a Dataset object or dataset IDepochs
int, optional - Number of training epochslearning_rate
float, optional - Learning rate for traininglora_rank
int, optional - Rank for LoRA fine-tuningjinja_template
str, optional - Template for formatting training examplesearly_stop
bool, optional - Whether to enable early stoppingmax_context_length
int, optional - Maximum context length for the modelbase_model_weight_precision
str, optional - Precision for base model weightsbatch_size
int, optional - Batch size for trainingaccelerator_type
str, optional - Type of GPU accelerator to useaccelerator_count
int, optional - Number of accelerators to useis_turbo
bool, optional - Whether to use turbo mode for faster trainingregion
str, optional - Region for deploymentnodes
int, optional - Number of nodes to useevaluation_dataset
str, optional - Dataset ID to use for evaluationeval_auto_carveout
bool, optional - Whether to automatically carve out evaluation datawandb_config
WandbConfig, optional - Configuration for Weights & Biases integrationoutput_model
str, optional - The name of the output model to create. If not provided, it will be the same as the display_name argument.sync()
wait_for_completion()
await_for_completion()
delete()
adelete()
ReinforcementFineTuningJob
class manages reinforcement learning fine-tuning jobs on Fireworks.
output_llm
LLM - An LLM instance associated with the output modelid
str - The job IDdisplay_name
str - The display name of the jobname
str - The full name of the joburl
str - The URL to view the job in the Fireworks dashboardReinforcementFineTuningJob
object. Instead, you should use the .create_reinforcement_fine_tuning_job()
method on the LLM
object.
id
str - A unique ID for the fine-tuning job. Must only contain lowercase a-z, 0-9, and hyphen (-).dataset_or_id
Union[Dataset, str] - The dataset to use for fine-tuning, either as a Dataset object or dataset IDreward_function
Callable - A callable decorated with @reward_functionoutput_model
str, optional - The name of the output model to createwarm_start_from
str, optional - Model to warm start fromjinja_template
str, optional - Template for formatting training exampleslearning_rate
float, optional - Learning rate for training. Defaults to 0.0001max_context_length
int, optional - Maximum context length for the model. Defaults to 8192lora_rank
int, optional - Rank for LoRA fine-tuning. Defaults to 8base_model_weight_precision
str, optional - Precision for base model weightsepochs
int, optional - Number of training epochs. Defaults to 5batch_size
int, optional - Batch size for training. Defaults to 32768is_intermediate
bool, optional - Whether this is an intermediate model. Defaults to Falseaccelerator_type
str, optional - Type of GPU accelerator to useaccelerator_count
int, optional - Number of accelerators to useregion
str, optional - Region for deploymentmax_tokens
int, optional - Maximum tokens to generatetemperature
float, optional - Sampling temperature. Defaults to 1.0top_p
float, optional - Top-p sampling. Defaults to 1.0n
int, optional - Number of completions. Defaults to 8extra_body
str, optional - Extra body parameterstop_k
int, optional - Top-k sampling (must be between 0 and 100)wandb_enabled
bool, optional - Whether to enable W&B integration. Defaults to Falsewandb_api_key
str, optional - W&B API keywandb_project
str, optional - W&B project namewandb_entity
str, optional - W&B entitywandb_run_id
str, optional - W&B run IDwandb_url
str, optional - W&B URLsync()
wait_for_completion()
await_for_completion()
delete()
adelete()
ReinforcementStep
class represents a reinforcement learning training step.
It provides methods to monitor and manage the training process.
state
str - The current state of the training job (e.g., “JOB_STATE_RUNNING”, “JOB_STATE_COMPLETED”)output_model
str - The identifier of the output model (e.g., accounts/my-account/models/my-improved-model
)is_completed
bool - Whether the training job has completed successfullyget()
ReinforcementStep
object with updated state, or None
if the job no longer existsraise_if_bad_state()
RuntimeError
if the job is in a failed, cancelled, or otherwise bad state. This is useful for error handling during training.
Raises:
RuntimeError
- If the job is in a bad state (failed, cancelled, expired, etc.)reinforcement_step
method is designed to support iterative reinforcement learning workflows. Here’s a complete example showing how to perform multiple reinforcement learning steps:
Evaluator
class manages evaluators on Fireworks. Evaluators are created from reward functions and can be reused across multiple evaluation jobs.
name
str - The full name of the evaluatorid
str - The evaluator IDurl
str - The URL to view the evaluator in the Fireworks dashboardgateway
Gateway - The Fireworks Gateway instancereward_function
Callable - A callable decorated with @reward_functionsync()
preview(dataset)
dataset
Dataset - The dataset to preview withEvaluationJob
class manages evaluation jobs on Fireworks.
id
str - The job IDname
str - The full name of the joburl
str - The URL to view the job in the Fireworks dashboardoutput_dataset
Dataset - The output dataset from the evaluationgateway
Gateway - The Fireworks Gateway instanceevaluation_job
SyncEvaluationJob - The evaluation job proto objectsync()
wait_for_completion()
BatchInferenceJob
class provides a convenient way to manage batch inference jobs on Fireworks. It allows you to perform bulk asynchronous inference on large datasets, reducing costs by up to 50%.
name
str - The full name of the batch inference job (e.g., accounts/my-account/batchInferenceJobs/test-job-123
)id
str - The job identifier (e.g., test-job-123
)model
str - The model used for inferenceinput_dataset_id
str - The input dataset identifieroutput_dataset_id
str - The output dataset identifierstate
str - The current state of the jobcreated_by
str - Email of the user who created the jobcreate_time
str - Creation timestampupdate_time
str - Last update timestampcreate()
model
str - The model to use for inference (e.g., llama-v3p1-8b-instruct
or accounts/fireworks/models/llama-v3p1-8b-instruct
)input_dataset_id
str - The input dataset ID containing JSONL formatted requestsoutput_dataset_id
str, optional - The output dataset ID. If not provided, one will be auto-generatedjob_id
str, optional - The job ID. If not provided, one will be auto-generateddisplay_name
str, optional - Display name for the jobinference_parameters
dict, optional - Dict of inference parameters:
max_tokens
int - Maximum number of tokens to generatetemperature
float - Sampling temperature (0-2)top_p
float - Top-p sampling parametertop_k
int - Top-k sampling parametern
int - Number of completions per requestextra_body
str - Additional parameters as JSON stringapi_key
str, optional - The API key to useBatchInferenceJob
objectget()
job_id
str - The job ID or full resource nameaccount
str - Account IDapi_key
str, optional - The API key to useBatchInferenceJob
object if found, None
otherwiselist()
account
str - Account IDapi_key
str, optional - The API key to usepage_size
int, optional - Number of jobs to return per page. Defaults to 50BatchInferenceJob
objectsdelete()
job_id
str - The job ID or full resource nameaccount
str - Account IDapi_key
str, optional - The API key to useto_dict()
proto
BatchInferenceJob - The batch inference job proto object