1. How to Access DeepSeek v3 & R1
DeepSeek models are available on Fireworks AI with flexible deployment options.- ๐ Fireworks Model Playground
- ๐ป API Access
- ๐ข Deployment Options
๐ Try DeepSeek R1 on Fireworks Playground
2. General FAQs
Below are common questions about DeepSeek models on Fireworks, organized by category.Model Integrity & Modifications
Has Fireworks changed the DeepSeek model in any way? Is it quantized, distilled, censored at the API level, or modified with a system prompt?
Has Fireworks changed the DeepSeek model in any way? Is it quantized, distilled, censored at the API level, or modified with a system prompt?
- โ No quantization โ Full-precision versions are hosted.
- โ No additional censorship โ Fireworks does not apply additional content moderation beyond DeepSeekโs built-in policies.
- โ No forced system prompt โ Users have full control over prompts.
Data Privacy & Hosting Locations
Does Fireworks have zero data retention?
Does Fireworks have zero data retention?
Where are Fireworks' servers located? Can I host DeepSeek models in the EU?
Where are Fireworks' servers located? Can I host DeepSeek models in the EU?
Do you send data to China or DeepSeek?
Do you send data to China or DeepSeek?
Fireworks has zero-data retention by default and does not log or store prompt or generation data.See Fireworks Data Handling Policy for details.The company DeepSeek does not have access to user API requests or outputs.
Pricing & Cost Considerations
Why is Fireworks more expensive than DeepSeekโs own API?
Why is Fireworks more expensive than DeepSeekโs own API?
Can I deploy DeepSeek models on a dedicated instance? What speeds and costs per token can I expect?
Can I deploy DeepSeek models on a dedicated instance? What speeds and costs per token can I expect?
Contact us at inquiries@fireworks.ai if you need a dedicated deployment.
- ๐ Lower latency โ Dedicated instances have better response times than shared serverless.
- ๐ Higher throughput โ More consistent performance for large-scale applications.
- ๐ฐ Pricing: Depends on workload, contact us at inquiries@fireworks.ai.
Output Control & Limits
Is JSON mode / structured outputs supported?
Is JSON mode / structured outputs supported?
- โ๏ธ JSON Mode โ Enforce JSON responses for structured applications.
- โ๏ธ Grammar Mode โ Define syntactic constraints for predictable outputs.
Is function calling supported?
Is function calling supported?
However:
- Users can implement function calling logic via prompt engineering or structured output parsing.
- Fireworks is evaluating future support for function calling in DeepSeek models.
What is the max output generation limit? Why are my responses getting cut off?
What is the max output generation limit? Why are my responses getting cut off?
max_tokens
in your API call:๐ Fireworks Max Tokens Documentation
What is reasoning_effort and how can I control model reasoning?
What is reasoning_effort and how can I control model reasoning?
- โจ Key Benefits:
- ๐ Faster responses for time-sensitive applications
- ๐ฐ Cost optimization for budget-conscious deployments
- โ๏ธ Predictable latency for production systems
- ๐๏ธ Control Options:
reasoning_effort = "low"
: Limits Chain-of-Thought (CoT) reasoning to 40% of full length- Achieves 63% accuracy on AIME 2024 math problems (better than o1-mini_low at 60%)
reasoning_effort = [integer < 20,000]
: Custom effort limit in computational units
- ๐ป Example Usage:
- ๐ Technical Notes:
- Works with the
fireworks/deepseek-r1
andfireworks/deepseek-r1-basic
models - Server-side logic handles truncation (no prompt tweaking needed)
- Forces a
</think>
token at the defined effort limit - Prevents excessive deliberation in responses
- Works with the
Parsing & API Response Handling
How can I separate `<think>` tokens and output tokens? Can this be done in the API response?
How can I separate `<think>` tokens and output tokens? Can this be done in the API response?
<think>
tags to denote reasoning before the final structured output.Fireworks defaults to the simplest approach of returning <think>...</think>
in the response and letting the user parse the response, such as using regex parsing.Roadmap & Feature Requests
How often is DeepSeek R1 or v3 updated on Fireworks?
How often is DeepSeek R1 or v3 updated on Fireworks?
General Troubleshooting
Iโm getting an unexpected error when using DeepSeek v3 or R1 on Fireworks. What should I do?
Iโm getting an unexpected error when using DeepSeek v3 or R1 on Fireworks. What should I do?
- No missing/invalid API keys
- Proper request format
- No exceeded rate limits or context window
- Lower temperature for more deterministic responses
- Adjust top_p to control randomness
- Increase max_tokens to avoid truncation
- ๐ Fireworks Support
- ๐ Fireworks Discord for real-time help.
Why do my responses sometimes get abruptly cut off due to context limitations?
Why do my responses sometimes get abruptly cut off due to context limitations?
If responses are getting cut off:๐น Possible Causes & Solutions:1๏ธโฃ Exceeded
max_tokens
setting โ ๐ง Increase max_tokens
2๏ธโฃ Requesting too much text in a single prompt โ ๐ง Break input into smaller chunks3๏ธโฃ Model context window limit reached โ ๐ง Summarize prior messages before appending new ones๐ก Fix:Why am I experiencing intermittent issues with Fireworks not responding?
Why am I experiencing intermittent issues with Fireworks not responding?
๐น Common Causes & Fixes:1๏ธโฃ High Server Load โ Fireworks may be experiencing peak traffic.
Fix: Retry the request after a few seconds or try during non-peak hours.2๏ธโฃ Rate Limits or Spend Limits Reached โ If youโve exceeded the API rate limits, requests may temporarily fail.
Fix: Check your rate limits and spend limits in the API dashboard and adjust your usage accordingly.
๐ To increase spend limits, add credits: Fireworks Spend Limits3๏ธโฃ Network Connectivity Issues โ Fireworks API may be unreachable due to network issues.
Fix: Restart your internet connection or use a different network/VPN.๐ If problems persist, check Fireworksโ status page or reach out via our Discord. ๐