Deploy Custom LLM to Production
Having a model that can generate text based on a prompt is great, but is it any good if you can't use it in production? In this tutorial, you'll learn how to:
- Merge your adapter with the base model
- Push your model to the Hugging Face Model Hub
- Test your model using the Hugging Face Inference API
- Create a FastAPI app to serve your model
- Deploy your FastAPI app to production with Docker