BootcampDeploy Custom LLM to Production

Deploy Custom LLM to Production

Having a model that can generate text based on a prompt is great, but is it any good if you can’t use it in production? In this tutorial, you’ll learn how to:

  • Merge your adapter with the base model
  • Push your model to the Hugging Face Model Hub
  • Test your model using the Hugging Face Inference API
  • Create a FastAPI app to serve your model
  • Deploy your FastAPI app to production with Docker

Merge Your Adapter with the Base Model

MLExpert is loading...

References

Footnotes

  1. Tiny Crypto Sentiment Analysis Model

  2. SwiftMind on GitHub

  3. Your First Docker Space: Text Generation with T5

  4. SwiftMind Space Source Files

  5. SwiftMind running on HuggingFace Space