Models & Inference
What are the maximum completion token limits for models, and can they be increased?
- For most models, the max completion token limit is the full context window of the model, e.g. 128K for DeepSeek R1
max_tokens
is set at 2K by default, and you should set it to a higher value if you plan to have long generations.- For Llama 3.1 405B, have a 4096 token completion limit. Setting a higher
max_tokens
in API calls will not override this limit. - You will see
"finish_reason": "length"
in responses when hitting a max token limit.
Example API Response at Limit: