Safety Features

Q: Can safety filters or content restrictions be disabled on text generation models?

No, safety features and content restrictions for text generation models (such as Llama, Mistral, etc.) are embedded by the original model creators during training:

  • Safety measures are integrated directly into the models by the teams that trained and released them.
  • These are core behaviors of the model, not external filters.
  • Different models may have varying levels of built-in safety.
  • Fireworks.ai does not add additional censorship layers beyond what is inherent in the models.
  • Original model behaviors cannot be modified via API parameters or configuration.

Note: For specific content handling needs, review the documentation of each model to understand its inherent safety features.

Token Limits

Q: What are the maximum completion token limits for models, and can they be increased?

Token limits are model-specific and have technical constraints:

Current Limitations:

  • Many models, such as Llama 3.1 405B, have a 4096 token completion limit.
  • Setting a higher max_tokens in API calls will not override this limit.
  • You will see "finish_reason": "length" in responses when hitting this limit.

Why Limits Exist:

  • Resource management for shared infrastructure
  • Prevents single requests from monopolizing resources
  • Helps maintain service availability for all users

Working with Token Limits:

  • Break longer generations into multiple requests.
    • Note: This may require repeating context or prompts.
  • Be mindful that repeated context can increase total token usage.

Example API Response at Limit:

{
    "finish_reason": "length",
    "usage": {
        "completion_tokens": 4096,
        "prompt_tokens": 4206,
        "total_tokens": 8302
    }
}

Additional information

If you experience any issues during these processes, you can: