External GCS Bucket Integration
Use external Google Cloud Storage buckets for fine-tuning while keeping your data private with secure, isolated access
External GCS Bucket Integration
Use external Google Cloud Storage (GCS) buckets for fine-tuning while keeping your data private. Fireworks creates proxy datasets that reference your external buckets—data is only accessed during fine-tuning within a secure, isolated cluster.
Your data never leaves your GCS bucket except during fine-tuning, ensuring maximum privacy and security.
Required Permissions
You need to grant access to three service accounts:
Fireworks Control Plane
- Account:
fireworks-control-plane@fw-ai-cp-prod.iam.gserviceaccount.com
- Required role: Custom role with
storage.buckets.getIamPolicy
permission
Inference Service Account
- Account:
inference@fw-ai-cp-prod.iam.gserviceaccount.com
- Required role: Storage Object Viewer or Storage Object Admin
Your Company’s Fireworks Service Account
- Account: Your company’s Fireworks service account email
- Required role: Storage Object Viewer or Storage Object Admin
Usage Example
Create a Proxy Dataset
Create a dataset that references your external GCS bucket:
Ensure your gsutil path points directly to the JSONL file. If the file is in a folder, make sure the folder contains only the intended file.
Start Fine-tuning
Use the proxy dataset to create a fine-tuning job:
For additional options, run: firectl create sftj -h
Key Benefits
Data Privacy
Your data never leaves your GCS bucket except during fine-tuning
Security
Access is limited to isolated fine-tuning clusters
Simplicity
Reference external data without copying or moving files