Creates a batch inference job on the Fireworks AI platform with the provided settings. Prerequisites:
  • A dataset-id with the input data. Must be in OpenAI format (you may directly use an OpenAI batch format file)
  • A model-id to perform batch inference (or just specify a lora-id here). Will overwrite any model that has been specified per row.
firectl create bij [flags]

Example

firectl create bij \
--model accounts/fireworks/models/deepseek-r1-0528 \
--input-dataset mathproblems.jsonl

Flags

Flags:
      --job-id string              The ID of the batch inference job. If not set, it will be autogenerated.
      --display-name string        The display name of the batch inference job.
  -m, --model string               The model to use for inference. (Required)
  -d, --input-dataset-id string    The input dataset ID. (Required)
  -x, --output-dataset-id string   The output dataset ID. If not provided, a default one will be generated.
      --max-tokens int32           Maximum number of tokens to generate per response.
      --temperature float32        Sampling temperature (typically between 0 and 2).
      --top-p float32              Top-p sampling parameter (typically between 0 and 1).
      --top-k int32                Top-k sampling parameter, limits the token selection to the top k tokens.
      --n int32                    Number of response candidates to generate per input.
      --extra-body string          Additional inference parameters as a JSON string (e.g., '{"stop": ["\n"]}').
      --precision string           The precision with which the model should be served. If not specified, a suitable default will be chosen based on the model.
      --quiet                      If set, only errors will be printed.
  -h, --help                       help for batch-inference-job

Global Flags:
  -a, --account-id string   The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini.
      --api-key string      An API key used to authenticate with Fireworks.
      --dry-run             Print the request proto without running it.
  -o, --output Output       Set the output format to "text" or "json". (default text)
  -p, --profile string      fireworks auth and settings profile to use.