Copy
Ask AI
firectl create reinforcement-fine-tuning-job [flags]
Examples
Copy
Ask AI
firectl create reinforcement-fine-tuning-job \
--base-model llama-v3-8b-instruct \
--dataset sample-dataset \
--epochs 5 \
--output-model name-of-the-trained-model
--evaluator accounts/my-account/evaluators/abc123
Flags
Copy
Ask AI
--base-model string The base model for the reinforcement fine-tuning job. Only one of base-model or warm-start-from should be specified.
--dataset string The dataset for the reinforcement fine-tuning job. (Required)
--output-model string The output model for the reinforcement fine-tuning job.
--job-id string The ID of the reinforcement fine-tuning job. If not set, it will be autogenerated.
--warm-start-from string The model to warm start from. If set, base-model must not be set.
--evaluator string The evaluator resource name to use for the reinforcement fine-tuning job. (Required)
--mcp-server string The MCP server resource name to use for the reinforcement fine-tuning job. (Optional)
--epochs int32 The number of epochs for the reinforcement fine-tuning job. (default 5)
--learning-rate float32 The learning rate for the reinforcement fine-tuning job. (default 0.0001)
--max-context-length int32 Maximum token length for sequences within each training batch. Shorter sequences are concatenated; longer sequences are truncated. (default 8192)
--batch-size int32 The maximum number of tokens packed into each training batch in the reinforcement fine-tuning. (default 32768)
--gradient-accumulation-steps int32 The number of gradient accumulation steps for the reinforcement fine-tuning job. (default 1)
--learning-rate-warmup-steps int32 The number of learning rate warmup steps for the reinforcement fine-tuning job.
--lora-rank int32 The rank of the LoRA layers for the reinforcement fine-tuning job.
(default 8)
--wandb-api-key string [WANDB_API_KEY] WandB API Key. (Required if any WandB flag is set)
--wandb-project string [WANDB_PROJECT] WandB Project. (Required if any WandB flag is set)
--wandb-entity string [WANDB_ENTITY] WandB Entity. (Required if any WandB flag is set)
--wandb Enable WandB
--temperature float32 The randomness of the model's word or token selection during text generation. (default 1)
--top-p float32 Top-p sampling, selecting the smallest set of candidate words whose cumulative probability exceeds the top-p. (default 1)
--response-candidates-count int32 The number of response candidates to generate per input. (default 4)
--max-output-tokens int32 The maximum number of tokens to generate in the response. If 0, the model's default will be used.
--top-k int32 Top-k sampling parameter, limits the token selection to the top k tokens.
--extra-body string Additional parameters for the inference request as a JSON string. For example: '{"stop": ["\n"]}'
--quiet If set, only errors will be printed.
--eval-auto-carveout If set, the evaluation dataset will be auto-carved.
-h, --help help for reinforcement-fine-tuning-job
Global flags
Copy
Ask AI
-a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini.
--api-key string An API key used to authenticate with Fireworks.
--dry-run Print the request proto without running it.
-o, --output Output Set the output format to "text", "json", or "flag". (default text)
-p, --profile string fireworks auth and settings profile to use.