Deploy Custom LLM to Production

Having a model that can generate text based on a prompt is great, but is it any good if you can’t use it in production? In this tutorial, you’ll learn how to:

Merge your adapter with the base model
Push your model to the Hugging Face Model Hub
Test your model using the Hugging Face Inference API
Create a FastAPI app to serve your model
Deploy your FastAPI app to production with Docker

Merge Your Adapter with the Base Model

MLExpert is loading...

References

LLM Function Calling Build Agentic Apps