DeepSeek Resources

1. How to Access DeepSeek v3 & R1

DeepSeek models are available on Fireworks AI with flexible deployment options.

You can test DeepSeek v3 and R1 in an interactive environment without any coding.🔗 Try DeepSeek v3 on Fireworks Playground
🔗 Try DeepSeek R1 on Fireworks Playground

2. General FAQs

Below are common questions about DeepSeek models on Fireworks, organized by category.

Model Integrity & Modifications

Has Fireworks changed the DeepSeek model in any way? Is it quantized, distilled, censored at the API level, or modified with a system prompt?

Data Privacy & Hosting Locations

Does Fireworks have zero data retention?

Where are Fireworks' servers located? Can I host DeepSeek models in the EU?

Do you send data to China or DeepSeek?

Pricing & Cost Considerations

Why is Fireworks more expensive than DeepSeek’s own API?

Can I deploy DeepSeek models on a dedicated instance? What speeds and costs per token can I expect?

Output Control & Limits

Is JSON mode / structured outputs supported?

Is function calling supported?

What is the max output generation limit? Why are my responses getting cut off?

What is reasoning_effort and how can I control model reasoning?

Reasoning Effort allows you to control how much computation DeepSeek R1 spends on reasoning:

✨ Key Benefits:
- 🚀 Faster responses for time-sensitive applications
- 💰 Cost optimization for budget-conscious deployments
- ⚙️ Predictable latency for production systems
🎛️ Control Options:
- reasoning_effort = "low": Limits Chain-of-Thought (CoT) reasoning to 40% of full length
  - Achieves 63% accuracy on AIME 2024 math problems (better than o1-mini_low at 60%)
- reasoning_effort = [integer < 20,000]: Custom effort limit in computational units
💻 Example Usage:

from fireworks.client import Fireworks

client = Fireworks(api_key="<FIREWORKS_API_KEY>")
response = client.chat.completions.create(
  model="fireworks/deepseek-r1",
  messages=[{
    "role": "user",
    "content": "Solve this math problem: What is 2 + 2?",
  }],
  reasoning_effort="low"  # or an integer like 5000
)
print(response.choices[0].message.content)

📝 Technical Notes:
- Works with the fireworks/deepseek-r1 and fireworks/deepseek-r1-basic models
- Server-side logic handles truncation (no prompt tweaking needed)
- Forces a </think> token at the defined effort limit
- Prevents excessive deliberation in responses

Parsing & API Response Handling

How can I separate `<think>` tokens and output tokens? Can this be done in the API response?

Roadmap & Feature Requests

How often is DeepSeek R1 or v3 updated on Fireworks?

General Troubleshooting

I’m getting an unexpected error when using DeepSeek v3 or R1 on Fireworks. What should I do?

Why do my responses sometimes get abruptly cut off due to context limitations?

DeepSeek v3 and R1, like other LLMs, have a fixed maximum context length of 128K tokens.
If responses are getting cut off:🔹 Possible Causes & Solutions:1️⃣ Exceeded max_tokens setting → 🔧 Increase max_tokens2️⃣ Requesting too much text in a single prompt → 🔧 Break input into smaller chunks3️⃣ Model context window limit reached → 🔧 Summarize prior messages before appending new ones💡 Fix:

response = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3",
    messages=[{"role": "user", "content": "Generate a long article summary"}],
    max_tokens=4096,  # Adjust as needed
)

📌 Alternative Fix: If you need longer responses, re-prompt the model with the last part of the output and ask it to continue.

Why am I experiencing intermittent issues with Fireworks not responding?

3. Learn about R1 & V3

Stay up to date with the latest advancements and insights into DeepSeek models. Check out our blog, where experts from Fireworks breakdown everything you need to know about R1 and V3

DeepSeek R1: All You Need to Know

A deep dive into DeepSeek R1’s capabilities, architecture, and use cases.

Beyond Supervised Fine-Tuning: Reinforcement Learning with Verifiable Reward

Learn how reinforcement learning with verifiable rewards is shaping AI training.

DeepSeek R1 Distillation & Reasoning

Learn about the distillation process for DeepSeek R1 and how it impacts reasoning capabilities.

Constrained Generation with Reasoning

Discover how structured output techniques like reasoning mode improve AI responses.

We’ve also published videos on our youtube channel

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

1. How to Access DeepSeek v3 & R1

2. General FAQs

Model Integrity & Modifications

Data Privacy & Hosting Locations

Pricing & Cost Considerations

Output Control & Limits

Parsing & API Response Handling

Roadmap & Feature Requests

General Troubleshooting

3. Learn about R1 & V3

DeepSeek R1: All You Need to Know

Beyond Supervised Fine-Tuning: Reinforcement Learning with Verifiable Reward

DeepSeek R1 Distillation & Reasoning

Constrained Generation with Reasoning

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

​1. How to Access DeepSeek v3 & R1

​2. General FAQs

​Model Integrity & Modifications

​Data Privacy & Hosting Locations

​Pricing & Cost Considerations

​Output Control & Limits

​Parsing & API Response Handling

​Roadmap & Feature Requests

​General Troubleshooting

​3. Learn about R1 & V3

DeepSeek R1: All You Need to Know

Beyond Supervised Fine-Tuning: Reinforcement Learning with Verifiable Reward

DeepSeek R1 Distillation & Reasoning

Constrained Generation with Reasoning

1. How to Access DeepSeek v3 & R1

2. General FAQs

Model Integrity & Modifications

Data Privacy & Hosting Locations

Pricing & Cost Considerations

Output Control & Limits

Parsing & API Response Handling

Roadmap & Feature Requests

General Troubleshooting

3. Learn about R1 & V3