Why I self-host the models · Muhammad Sohail

When I build a RAG system for an institution, the first real question is almost never about the model. It’s about where the data goes.

A university’s HR documents shouldn’t take a round trip through someone else’s API to answer a question about parental leave. So I default to open-source models I can run on our own hardware. It’s slower to set up, less flashy than calling a hosted endpoint, and it means I own the GPU problems too. But it’s the version that’s actually allowed to exist in production — and “allowed to exist” beats “slightly higher benchmark score” every time.

There’s a second reason, less about compliance and more about temperament. A self-hosted, open-source stack doesn’t change under me. No deprecated endpoints, no surprise price changes, no model quietly behaving differently one Tuesday. For a system people depend on daily, that boring predictability is a feature.

This is the thread that runs through most of my decisions: the dependable choice usually wins. Postgres over the database of the month. Docker over a bespoke deploy. Open weights I control over a black box I rent. None of it is exciting. All of it is still running.