Access information, blog posts, FAQs, and detailed documentation for DeepSeek v3 and R1.
Has Fireworks changed the DeepSeek model in any way? Is it quantized, distilled, censored at the API level, or modified with a system prompt?
Does Fireworks have zero data retention?
Where are Fireworks' servers located? Can I host DeepSeek models in the EU?
Do you send data to China or DeepSeek?
Why is Fireworks more expensive than DeepSeek’s own API?
Can I deploy DeepSeek models on a dedicated instance? What speeds and costs per token can I expect?
Is JSON mode / structured outputs supported?
Is function calling supported?
What is the max output generation limit? Why are my responses getting cut off?
max_tokens
in your API call:What is reasoning_effort and how can I control model reasoning?
reasoning_effort = "low"
: Limits Chain-of-Thought (CoT) reasoning to 40% of full length
reasoning_effort = [integer < 20,000]
: Custom effort limit in computational unitsfireworks/deepseek-r1
and fireworks/deepseek-r1-basic
models</think>
token at the defined effort limitHow can I separate `<think>` tokens and output tokens? Can this be done in the API response?
<think>
tags to denote reasoning before the final structured output.Fireworks defaults to the simplest approach of returning <think>...</think>
in the response and letting the user parse the response, such as using regex parsing.How often is DeepSeek R1 or v3 updated on Fireworks?
I’m getting an unexpected error when using DeepSeek v3 or R1 on Fireworks. What should I do?
Why do my responses sometimes get abruptly cut off due to context limitations?
max_tokens
setting → 🔧 Increase max_tokens
2️⃣ Requesting too much text in a single prompt → 🔧 Break input into smaller chunks3️⃣ Model context window limit reached → 🔧 Summarize prior messages before appending new ones💡 Fix:Why am I experiencing intermittent issues with Fireworks not responding?