Skip to main content
firectl create reinforcement-fine-tuning-job [flags]

Examples

firectl create reinforcement-fine-tuning-job \
	--base-model llama-v3-8b-instruct \
	--dataset sample-dataset \
	--epochs 5 \
	--output-model name-of-the-trained-model
	--evaluator accounts/my-account/evaluators/abc123

Flags

      --base-model string                   The base model for the reinforcement fine-tuning job. Only one of base-model or warm-start-from should be specified.
      --dataset string                      The dataset for the reinforcement fine-tuning job. (Required)
      --output-model string                 The output model for the reinforcement fine-tuning job.
      --job-id string                       The ID of the reinforcement fine-tuning job. If not set, it will be autogenerated.
      --warm-start-from string              The model to warm start from. If set, base-model must not be set.
      --evaluator string                    The evaluator resource name to use for the reinforcement fine-tuning job. (Required)
      --mcp-server string                   The MCP server resource name to use for the reinforcement fine-tuning job. (Optional)
      --epochs int32                        The number of epochs for the reinforcement fine-tuning job. (default 5)
      --learning-rate float32               The learning rate for the reinforcement fine-tuning job. (default 0.0001)
      --max-context-length int32            Maximum token length for sequences within each training batch. Shorter sequences are concatenated; longer sequences are truncated. (default 8192)
      --batch-size int32                    The maximum number of tokens packed into each training batch in the reinforcement fine-tuning. (default 32768)
      --gradient-accumulation-steps int32   The number of gradient accumulation steps for the reinforcement fine-tuning job. (default 1)
      --learning-rate-warmup-steps int32    The number of learning rate warmup steps for the reinforcement fine-tuning job.
      --lora-rank int32                     The rank of the LoRA layers for the reinforcement fine-tuning job.
                                             (default 8)
      --wandb-api-key string                [WANDB_API_KEY] WandB API Key. (Required if any WandB flag is set)
      --wandb-project string                [WANDB_PROJECT] WandB Project. (Required if any WandB flag is set)
      --wandb-entity string                 [WANDB_ENTITY] WandB Entity. (Required if any WandB flag is set)
      --wandb                               Enable WandB
                                            
      --temperature float32                 The randomness of the model's word or token selection during text generation. (default 1)
      --top-p float32                       Top-p sampling, selecting the smallest set of candidate words whose cumulative probability exceeds the top-p. (default 1)
      --response-candidates-count int32     The number of response candidates to generate per input. (default 4)
      --max-output-tokens int32             The maximum number of tokens to generate in the response. If 0, the model's default will be used.
      --top-k int32                         Top-k sampling parameter, limits the token selection to the top k tokens.
      --extra-body string                   Additional parameters for the inference request as a JSON string. For example: '{"stop": ["\n"]}'
      --quiet                               If set, only errors will be printed.
      --eval-auto-carveout                  If set, the evaluation dataset will be auto-carved.
  -h, --help                                help for reinforcement-fine-tuning-job

Global flags

  -a, --account-id string   The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini.
      --api-key string      An API key used to authenticate with Fireworks.
      --dry-run             Print the request proto without running it.
  -o, --output Output       Set the output format to "text", "json", or "flag". (default text)
  -p, --profile string      fireworks auth and settings profile to use.