Custom model issues

Q: What are the common issues when deploying custom models?

Here are key areas to troubleshoot for custom model deployments:

1. Deployment hanging or crashing

Common causes:

  • Missing model files, especially when using Hugging Face models
  • Symlinked files not uploaded correctly
  • Outdated firectl version

Solutions:

  • Download models without symlinks using:
    huggingface-cli download model_name --local-dir=/path --local-dir-use-symlinks=False
    
  • Update firectl to the latest version

2. LoRA adapters vs full models

  • Compatibility: LoRA adapters work with specific base models.
  • Performance: May experience slightly lower speed with LoRA, but quality should remain similar to the original model.
  • Troubleshooting quality drops:
    • Check model configuration
    • Review conversation template
    • Add echo: true to debug requests

3. Performance optimization factors

Consider adjusting the following for improved performance:

  • Accelerator count and accelerator type
  • Long prompt settings to handle complex inputs

Autoscaling

Q: What should I expect for deployment and scaling performance?

  • Initial deployment: Should complete within minutes
  • Scaling from zero: You may experience brief availability delays while the system scales up
  • Troubleshooting: If deployment takes over 1 hour, this typically indicates a crash and should be investigated
  • Best practice: Monitor deployment status and contact support if deployment times are unusually long

Performance questions

Q: I have more specific performance questions about improvements

For detailed discussions on performance and optimization options:

  • Schedule a consultation directly with our PM, Ray Thai (calendly)
  • Discuss your specific use cases
  • Get personalized recommendations
  • Review advanced configuration options

Note: Monitor costs carefully during the deployment and testing phase, as repeated deployments and tests can quickly consume credits.


Additional resources