External GCS Bucket Integration

Use external Google Cloud Storage (GCS) buckets for fine-tuning while keeping your data private. Fireworks creates proxy datasets that reference your external buckets—data is only accessed during fine-tuning within a secure, isolated cluster.

Your data never leaves your GCS bucket except during fine-tuning, ensuring maximum privacy and security.

Required Permissions

You need to grant access to three service accounts:

Fireworks Control Plane

  • Account: fireworks-control-plane@fw-ai-cp-prod.iam.gserviceaccount.com
  • Required role: Custom role with storage.buckets.getIamPolicy permission
gcloud storage buckets add-iam-policy-binding <YOUR_BUCKET> \
  --member=serviceAccount:fireworks-control-plane@fw-ai-cp-prod.iam.gserviceaccount.com \
  --role=projects/<YOUR_PROJECT>/roles/<YOUR_CUSTOM_ROLE>

Inference Service Account

  • Account: inference@fw-ai-cp-prod.iam.gserviceaccount.com
  • Required role: Storage Object Viewer or Storage Object Admin

Your Company’s Fireworks Service Account

  • Account: Your company’s Fireworks service account email
  • Required role: Storage Object Viewer or Storage Object Admin

Usage Example

1

Create a Proxy Dataset

Create a dataset that references your external GCS bucket:

firectl create dataset {DATASET_NAME} --external-url gs://bucket-name/object-name

Ensure your gsutil path points directly to the JSONL file. If the file is in a folder, make sure the folder contains only the intended file.

2

Start Fine-tuning

Use the proxy dataset to create a fine-tuning job:

firectl create sftj \
  --dataset "accounts/{ACCOUNT}/datasets/{DATASET_NAME}" \
  --base-model "accounts/fireworks/models/{MODEL}" \
  --output-model {TRAINED_MODEL_NAME}

For additional options, run: firectl create sftj -h

Key Benefits

Data Privacy

Your data never leaves your GCS bucket except during fine-tuning

Security

Access is limited to isolated fine-tuning clusters

Simplicity

Reference external data without copying or moving files