Skip to main content
Understanding error codes helps you quickly diagnose and resolve issues when making inference requests to the Fireworks API.

Common error codes

CodeError NamePossible Issue(s)How to Resolve
400Bad RequestInvalid input or malformed request.Review the request parameters and ensure they match the expected format.
401UnauthorizedInvalid API key or insufficient permissions.Verify your API key and ensure it has the correct permissions.
402Payment RequiredAccount is not on a paid plan or has exceeded usage limits.Check your billing status and ensure your payment method is up to date. Upgrade your plan if necessary.
403ForbiddenAuthentication issues.Verify you have the correct API key.
404Not FoundThe API endpoint path doesn’t exist, the model doesn’t exist, the model is not deployed, or you don’t have permission to access it.Verify the URL path in your request and ensure you are using the correct API endpoint. Check if the model exists and is available. Ensure you have the necessary permissions.
405Method Not AllowedUsing an unsupported HTTP method (e.g., using GET instead of POST).Check the API documentation for the correct HTTP method.
408Request TimeoutThe request took too long to complete, possibly due to server overload or network issues.Retry the request after a brief wait. Consider increasing the timeout value if applicable.
412Precondition FailedAccount is suspended or there’s an issue with account status. This error also occurs when attempting to invoke a LoRA model that failed to load.Check your account status and billing information. For LoRA models, ensure the model was uploaded correctly and is compatible. Contact support if the issue persists.
413Payload Too LargeInput data exceeds the allowed size limit.Reduce the size of the input payload (e.g., by trimming large text or image data).
429Too Many RequestsRate limited (serverless) or deployment capacity exceeded (dedicated/on-demand).See understanding 429 errors below.
500Internal Server ErrorServer-side code bug that is unlikely to resolve on its own.Contact Fireworks support immediately, as this error typically requires intervention from the engineering team.
502Bad GatewayThe server received an invalid response from an upstream server.Wait and retry the request. If the error persists, it may indicate a server outage.
503Service UnavailableThe service is down for maintenance or experiencing issues.Retry the request after some time. Check the status page for maintenance announcements.
504Gateway TimeoutThe server did not receive a response in time from an upstream server.Wait briefly and retry the request. Consider using a shorter input prompt if applicable.
520Unknown ErrorAn unexpected error occurred with no clear explanation.Retry the request. If the issue persists, contact support for further assistance.

Understanding 429 errors

HTTP 429 (Too Many Requests) can be returned on both serverless and dedicated/on-demand deployments, but the cause and recommended action differ.

Serverless deployments

On serverless, a 429 means your account has exceeded the current rate limit. Serverless rate limits are dynamic and grow with sustained usage. To resolve:
  • Wait briefly and retry with exponential backoff
  • Monitor x-ratelimit-remaining-requests response headers to stay within your limits
  • For higher throughput, upgrade to an on-demand deployment
See Rate Limits & Quotas for full details on serverless rate limiting.

Dedicated and on-demand deployments

On dedicated and on-demand deployments, there are no account-level rate limits. A 429 instead indicates that your deployment’s processing capacity is saturated. The inference server returns 429 when the number of queued and active requests exceeds what the deployment’s GPUs can handle at that moment. This is a capacity signal, not quota enforcement. To resolve:
  • Reduce burst concurrency — lower the number of parallel requests or add client-side rate limiting with backoff
  • Scale up the deployment — add more replicas or GPUs to increase throughput
  • Optimize request patterns — use shorter prompts, reduce max output tokens, or batch requests to lower per-request resource consumption
If you consistently see 429 errors on a dedicated or on-demand deployment, it’s an indicator that your current GPU allocation is undersized for your traffic. Contact us to discuss increasing your deployment capacity.

Troubleshooting tips

If you encounter an error not listed here:
Enable detailed error logging in your application to capture the full error response, including error messages and request IDs, which helps with debugging.