Inference systems
Limitations & controls
Understanding model limitations, safety features, and token limits.
Safety Features
Q: Can safety filters or content restrictions be disabled on text generation models?
No, safety features and content restrictions for text generation models (such as Llama, Mistral, etc.) are embedded by the original model creators during training:
- Safety measures are integrated directly into the models by the teams that trained and released them.
- These are core behaviors of the model, not external filters.
- Different models may have varying levels of built-in safety.
- Fireworks.ai does not add additional censorship layers beyond what is inherent in the models.
- Original model behaviors cannot be modified via API parameters or configuration.
Note: For specific content handling needs, review the documentation of each model to understand its inherent safety features.
Token Limits
Q: What are the maximum completion token limits for models, and can they be increased?
- For most models, the max completion token limit is the full context window of the model, e.g. 128K for DeepSeek R1
max_tokens
is set at 2K by default, and you should set it to a higher value if you plan to have long generations.- For Llama 3.1 405B, have a 4096 token completion limit. Setting a higher
max_tokens
in API calls will not override this limit. - You will see
"finish_reason": "length"
in responses when hitting a max token limit.
Example API Response at Limit:
Additional information
If you experience any issues during these processes, you can:
- Contact support through Discord at discord.gg/fireworks-ai
- Reach out to your account representative (Enterprise customers)
- Email inquiries@fireworks.ai
Was this page helpful?