DeepSeek Resources
Access information, blog posts, FAQs, and detailed documentation for DeepSeek v3 and R1.
1. How to Access DeepSeek v3 & R1
DeepSeek models are available on Fireworks AI with flexible deployment options.
You can test DeepSeek v3 and R1 in an interactive environment without any coding.
🔗 Try DeepSeek v3 on Fireworks Playground
🔗 Try DeepSeek R1 on Fireworks Playground
You can test DeepSeek v3 and R1 in an interactive environment without any coding.
🔗 Try DeepSeek v3 on Fireworks Playground
🔗 Try DeepSeek R1 on Fireworks Playground
Developers can integrate DeepSeek models into applications using Fireworks’ API.
- Serverless API – Instantly access DeepSeek models with pay-as-you-go pricing.
- Dedicated Deployments – Higher throughput and lower latency for enterprise use. Contact us at inquiries@fireworks.ai
2. General FAQs
Below are common questions about DeepSeek models on Fireworks, organized by category.
Model Integrity & Modifications
Has Fireworks changed the DeepSeek model in any way? Is it quantized, distilled, censored at the API level, or modified with a system prompt?
Has Fireworks changed the DeepSeek model in any way? Is it quantized, distilled, censored at the API level, or modified with a system prompt?
No, Fireworks hosts the unaltered versions of DeepSeek models.
- ❌ No quantization – Full-precision versions are hosted.
- ❌ No additional censorship – Fireworks does not apply additional content moderation beyond DeepSeek’s built-in policies.
- ❌ No forced system prompt – Users have full control over prompts.
🔹 Fireworks hosts DeepSeek R1 and V3 models on Serverless. Contact us at inquiries@fireworks.ai if you need a dedicated deployment.
🔹 Fireworks also offers six R1-distill models released by DeepSeek on on-demand.
Data Privacy & Hosting Locations
Does Fireworks have zero data retention?
Does Fireworks have zero data retention?
Fireworks has zero-data retention by default and does not log or store prompt or generation data.
See Fireworks Data Handling Policy for details.
Where are Fireworks' servers located? Can I host DeepSeek models in the EU?
Where are Fireworks' servers located? Can I host DeepSeek models in the EU?
Fireworks hosts DeepSeek models on servers in North America and the EU.
Do you send data to China or DeepSeek?
Do you send data to China or DeepSeek?
Fireworks hosts DeepSeek models on servers in North America and the EU.
Fireworks has zero-data retention by default and does not log or store prompt or generation data.
See Fireworks Data Handling Policy for details.
The company DeepSeek does not have access to user API requests or outputs.
Pricing & Cost Considerations
Why is Fireworks more expensive than DeepSeek’s own API?
Why is Fireworks more expensive than DeepSeek’s own API?
Fireworks hosts DeepSeek models on our own infrastructure. We do not proxy requests to DeepSeek API.
We are continuously optimizing the model for speed and throughput. We also offer useful developer features like JSON mode, structured outputs, and dedicated deployment options.
Can I deploy DeepSeek models on a dedicated instance? What speeds and costs per token can I expect?
Can I deploy DeepSeek models on a dedicated instance? What speeds and costs per token can I expect?
Yes, Fireworks offers dedicated deployments for DeepSeek models.
Contact us at inquiries@fireworks.ai if you need a dedicated deployment.
- 🚀 Lower latency – Dedicated instances have better response times than shared serverless.
- 📈 Higher throughput – More consistent performance for large-scale applications.
- 💰 Pricing: Depends on workload, contact us at inquiries@fireworks.ai.
Output Control & Limits
Is JSON mode / structured outputs supported?
Is JSON mode / structured outputs supported?
Yes! Fireworks supports structured outputs through:
- ✔️ JSON Mode – Enforce JSON responses for structured applications.
- ✔️ Grammar Mode – Define syntactic constraints for predictable outputs.
Is function calling supported?
Is function calling supported?
Currently, DeepSeek R1 does not support native function calling like OpenAI models.
However:
- Users can implement function calling logic via prompt engineering or structured output parsing.
- Fireworks is evaluating future support for function calling in DeepSeek models.
What is the max output generation limit? Why are my responses getting cut off?
What is the max output generation limit? Why are my responses getting cut off?
Max token length for DeepSeek models is only limited by the context window of the model, which is 128K tokens.
If responses are cut off, try increasing max_tokens
in your API call:
🔗 Fireworks Max Tokens Documentation
What is reasoning_effort and how can I control model reasoning?
What is reasoning_effort and how can I control model reasoning?
Reasoning Effort allows you to control how much computation DeepSeek R1 spends on reasoning:
- ✨ Key Benefits:
- 🚀 Faster responses for time-sensitive applications
- 💰 Cost optimization for budget-conscious deployments
- ⚙️ Predictable latency for production systems
- 🎛️ Control Options:
reasoning_effort = "low"
: Limits Chain-of-Thought (CoT) reasoning to 40% of full length- Achieves 63% accuracy on AIME 2024 math problems (better than o1-mini_low at 60%)
reasoning_effort = [integer < 20,000]
: Custom effort limit in computational units
- 💻 Example Usage:
- 📝 Technical Notes:
- Works with the
fireworks/deepseek-r1
andfireworks/deepseek-r1-basic
models - Server-side logic handles truncation (no prompt tweaking needed)
- Forces a
</think>
token at the defined effort limit - Prevents excessive deliberation in responses
- Works with the
Parsing & API Response Handling
How can I separate `<think>` tokens and output tokens? Can this be done in the API response?
How can I separate `<think>` tokens and output tokens? Can this be done in the API response?
DeepSeek R1 uses <think>
tags to denote reasoning before the final structured output.
Fireworks defaults to the simplest approach of returning <think>...</think>
in the response and letting the user parse the response, such as using regex parsing.
Roadmap & Feature Requests
How often is DeepSeek R1 or v3 updated on Fireworks?
How often is DeepSeek R1 or v3 updated on Fireworks?
Fireworks updates DeepSeek R1 and v3 in alignment with DeepSeek AI’s official releases and Fireworks’ own performance optimizations.
Updates include bug fixes, efficiency improvements, and potential model refinements.
Users can track updates through Fireworks documentation and announcements.
🔗 For the latest version information, refer to the Fireworks API documentation or join the Fireworks community Discord.
General Troubleshooting
I’m getting an unexpected error when using DeepSeek v3 or R1 on Fireworks. What should I do?
I’m getting an unexpected error when using DeepSeek v3 or R1 on Fireworks. What should I do?
If you’re encountering an error while using DeepSeek v3 on Fireworks, follow these steps:
✅ Step 1: Check Fireworks’ Status Page for any ongoing outages.
✅ Step 2: Verify API request formatting. Ensure:
- No missing/invalid API keys
- Proper request format
- No exceeded rate limits or context window
✅ Step 3: Reduce request complexity if your request is too long.
✅ Step 4: Adjust parameters if experiencing instability:
- Lower temperature for more deterministic responses
- Adjust top_p to control randomness
- Increase max_tokens to avoid truncation
✅ Step 5: Contact Fireworks support via:
- 🔗 Fireworks Support
- 🔗 Fireworks Discord for real-time help.
Why do my responses sometimes get abruptly cut off due to context limitations?
Why do my responses sometimes get abruptly cut off due to context limitations?
DeepSeek v3 and R1, like other LLMs, have a fixed maximum context length of 128K tokens.
If responses are getting cut off:
🔹 Possible Causes & Solutions:
1️⃣ Exceeded max_tokens
setting → 🔧 Increase max_tokens
2️⃣ Requesting too much text in a single prompt → 🔧 Break input into smaller chunks
3️⃣ Model context window limit reached → 🔧 Summarize prior messages before appending new ones
💡 Fix:
📌 Alternative Fix: If you need longer responses, re-prompt the model with the last part of the output and ask it to continue.
Why am I experiencing intermittent issues with Fireworks not responding?
Why am I experiencing intermittent issues with Fireworks not responding?
Intermittent API response issues could be due to:
🔹 Common Causes & Fixes:
1️⃣ High Server Load – Fireworks may be experiencing peak traffic.
Fix: Retry the request after a few seconds or try during non-peak hours.
2️⃣ Rate Limits or Spend Limits Reached – If you’ve exceeded the API rate limits, requests may temporarily fail.
Fix: Check your rate limits and spend limits in the API dashboard and adjust your usage accordingly.
🔗 To increase spend limits, add credits: Fireworks Spend Limits
3️⃣ Network Connectivity Issues – Fireworks API may be unreachable due to network issues.
Fix: Restart your internet connection or use a different network/VPN.
📌 If problems persist, check Fireworks’ status page or reach out via our Discord. 🚀
3. Learn about R1 & V3
Stay up to date with the latest advancements and insights into DeepSeek models.
Check out our blog, where experts from Fireworks breakdown everything you need to know about R1 and V3
DeepSeek R1: All You Need to Know
A deep dive into DeepSeek R1’s capabilities, architecture, and use cases.
Beyond Supervised Fine-Tuning: Reinforcement Learning with Verifiable Reward
Learn how reinforcement learning with verifiable rewards is shaping AI training.
DeepSeek R1 Distillation & Reasoning
Learn about the distillation process for DeepSeek R1 and how it impacts reasoning capabilities.
Constrained Generation with Reasoning
Discover how structured output techniques like reasoning mode improve AI responses.
We’ve also published videos on our youtube channel