Skip to main content

1. How to Access DeepSeek v3 & R1

DeepSeek models are available on Fireworks AI with flexible deployment options.
  • ๐Ÿ› Fireworks Model Playground
  • ๐Ÿ’ป API Access
  • ๐Ÿšข Deployment Options
You can test DeepSeek v3 and R1 in an interactive environment without any coding.๐Ÿ”— Try DeepSeek v3 on Fireworks Playground
๐Ÿ”— Try DeepSeek R1 on Fireworks Playground

2. General FAQs

Below are common questions about DeepSeek models on Fireworks, organized by category.

Model Integrity & Modifications

No, Fireworks hosts the unaltered versions of DeepSeek models.
  • โŒ No quantization โ€“ Full-precision versions are hosted.
  • โŒ No additional censorship โ€“ Fireworks does not apply additional content moderation beyond DeepSeekโ€™s built-in policies.
  • โŒ No forced system prompt โ€“ Users have full control over prompts.
๐Ÿ”น Fireworks hosts DeepSeek R1 and V3 models on Serverless. Contact us at inquiries@fireworks.ai if you need a dedicated deployment.๐Ÿ”น Fireworks also offers six R1-distill models released by DeepSeek on on-demand.

Data Privacy & Hosting Locations

Fireworks has zero-data retention by default and does not log or store prompt or generation data.See Fireworks Data Handling Policy for details.
Fireworks hosts DeepSeek models on servers in North America and the EU.
Fireworks hosts DeepSeek models on servers in North America and the EU.
Fireworks has zero-data retention by default and does not log or store prompt or generation data.
See Fireworks Data Handling Policy for details.The company DeepSeek does not have access to user API requests or outputs.

Pricing & Cost Considerations

Fireworks hosts DeepSeek models on our own infrastructure. We do not proxy requests to DeepSeek API.We are continuously optimizing the model for speed and throughput. We also offer useful developer features like JSON mode, structured outputs, and dedicated deployment options.
Yes, Fireworks offers dedicated deployments for DeepSeek models.
Contact us at inquiries@fireworks.ai if you need a dedicated deployment.
  • ๐Ÿš€ Lower latency โ€“ Dedicated instances have better response times than shared serverless.
  • ๐Ÿ“ˆ Higher throughput โ€“ More consistent performance for large-scale applications.
  • ๐Ÿ’ฐ Pricing: Depends on workload, contact us at inquiries@fireworks.ai.

Output Control & Limits

Yes! Fireworks supports structured outputs through:
  • โœ”๏ธ JSON Mode โ€“ Enforce JSON responses for structured applications.
  • โœ”๏ธ Grammar Mode โ€“ Define syntactic constraints for predictable outputs.
Currently, DeepSeek R1 does not support native function calling like OpenAI models.
However:
  • Users can implement function calling logic via prompt engineering or structured output parsing.
  • Fireworks is evaluating future support for function calling in DeepSeek models.
Max token length for DeepSeek models is only limited by the context window of the model, which is 128K tokens.If responses are cut off, try increasing max_tokens in your API call:
๐Ÿ”— Fireworks Max Tokens Documentation
Reasoning Effort allows you to control how much computation DeepSeek R1 spends on reasoning:
  • โœจ Key Benefits:
    • ๐Ÿš€ Faster responses for time-sensitive applications
    • ๐Ÿ’ฐ Cost optimization for budget-conscious deployments
    • โš™๏ธ Predictable latency for production systems
  • ๐ŸŽ›๏ธ Control Options:
    • reasoning_effort = "low": Limits Chain-of-Thought (CoT) reasoning to 40% of full length
      • Achieves 63% accuracy on AIME 2024 math problems (better than o1-mini_low at 60%)
    • reasoning_effort = [integer < 20,000]: Custom effort limit in computational units
  • ๐Ÿ’ป Example Usage:
from fireworks.client import Fireworks

client = Fireworks(api_key="<FIREWORKS_API_KEY>")
response = client.chat.completions.create(
  model="fireworks/deepseek-r1",
  messages=[{
    "role": "user",
    "content": "Solve this math problem: What is 2 + 2?",
  }],
  reasoning_effort="low"  # or an integer like 5000
)
print(response.choices[0].message.content)
  • ๐Ÿ“ Technical Notes:
    • Works with the fireworks/deepseek-r1 and fireworks/deepseek-r1-basic models
    • Server-side logic handles truncation (no prompt tweaking needed)
    • Forces a </think> token at the defined effort limit
    • Prevents excessive deliberation in responses

Parsing & API Response Handling

DeepSeek R1 uses <think> tags to denote reasoning before the final structured output.Fireworks defaults to the simplest approach of returning <think>...</think> in the response and letting the user parse the response, such as using regex parsing.

Roadmap & Feature Requests

Fireworks updates DeepSeek R1 and v3 in alignment with DeepSeek AIโ€™s official releases and Fireworksโ€™ own performance optimizations.Updates include bug fixes, efficiency improvements, and potential model refinements.Users can track updates through Fireworks documentation and announcements.๐Ÿ”— For the latest version information, refer to the Fireworks API documentation or join the Fireworks community Discord.

General Troubleshooting

If youโ€™re encountering an error while using DeepSeek v3 on Fireworks, follow these steps:โœ… Step 1: Check Fireworksโ€™ Status Page for any ongoing outages.โœ… Step 2: Verify API request formatting. Ensure:
  • No missing/invalid API keys
  • Proper request format
  • No exceeded rate limits or context window
โœ… Step 3: Reduce request complexity if your request is too long.โœ… Step 4: Adjust parameters if experiencing instability:
  • Lower temperature for more deterministic responses
  • Adjust top_p to control randomness
  • Increase max_tokens to avoid truncation
โœ… Step 5: Contact Fireworks support via:
DeepSeek v3 and R1, like other LLMs, have a fixed maximum context length of 128K tokens.
If responses are getting cut off:
๐Ÿ”น Possible Causes & Solutions:1๏ธโƒฃ Exceeded max_tokens setting โ†’ ๐Ÿ”ง Increase max_tokens2๏ธโƒฃ Requesting too much text in a single prompt โ†’ ๐Ÿ”ง Break input into smaller chunks3๏ธโƒฃ Model context window limit reached โ†’ ๐Ÿ”ง Summarize prior messages before appending new ones๐Ÿ’ก Fix:
response = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3",
    messages=[{"role": "user", "content": "Generate a long article summary"}],
    max_tokens=4096,  # Adjust as needed
)
๐Ÿ“Œ Alternative Fix: If you need longer responses, re-prompt the model with the last part of the output and ask it to continue.
Intermittent API response issues could be due to:
๐Ÿ”น Common Causes & Fixes:
1๏ธโƒฃ High Server Load โ€“ Fireworks may be experiencing peak traffic.
Fix: Retry the request after a few seconds or try during non-peak hours.
2๏ธโƒฃ Rate Limits or Spend Limits Reached โ€“ If youโ€™ve exceeded the API rate limits, requests may temporarily fail.
Fix: Check your rate limits and spend limits in the API dashboard and adjust your usage accordingly.
๐Ÿ”— To increase spend limits, add credits: Fireworks Spend Limits
3๏ธโƒฃ Network Connectivity Issues โ€“ Fireworks API may be unreachable due to network issues.
Fix: Restart your internet connection or use a different network/VPN.
๐Ÿ“Œ If problems persist, check Fireworksโ€™ status page or reach out via our Discord. ๐Ÿš€

3. Learn about R1 & V3

Stay up to date with the latest advancements and insights into DeepSeek models. Check out our blog, where experts from Fireworks breakdown everything you need to know about R1 and V3 Weโ€™ve also published videos on our youtube channel
โŒ˜I