Step 1: Create and export an API key
Before you begin, create an API key in the Fireworks dashboard. Click Create API key and store it in a safe location. Once you have your API key, export it as an environment variable in your terminal:- macOS / Linux
- Windows
Step 2: Install the CLI
To create and manage on-demand deployments, you’ll need thefirectl CLI tool. Install it using one of the following methods, based on your platform:
Step 3: Create a deployment
This command will create a deployment of GPT OSS 120B optimized for speed. It will take a few minutes to complete. The resulting deployment will scale up to 1 replica.Name: field in the response, as it will be used in the next step to query your deployment.
Learn more about deployment options→
Learn more about autoscaling options→
Step 4: Query your deployment
Now you can query your on-demand deployment using the same API as serverless models, but using your dedicated deployment. Replace<DEPLOYMENT_NAME> in the below snippets with the value from the Name: field in the previous step:
- Python (Fireworks SDK)
- Python (OpenAI SDK)
- JavaScript
- curl
Install the Fireworks Python SDK:Then make your first on-demand API call:
The SDK is currently in alpha. Use the
--pre flag when installing to get the latest version.Common use cases
Autoscale based on requests per second
Autoscale based on concurrent requests
Next steps
Ready to scale to production, explore other modalities, or customize your models?Upload a custom model
Bring your own model and deploy it on Fireworks
Fine-tune Models
Improve model quality with supervised and reinforcement learning
Speech to Text
Real-time or batch audio transcription
Embeddings & Reranking
Use embeddings & reranking in search & context retrieval
Batch Inference
Run async inference jobs at scale, faster and cheaper
Browse 100+ Models
Explore all available models across modalities
API Reference
Complete API documentation