Hybrid search beat a bigger model

When the university assistant gave a wrong answer, my first instinct was the wrong one: reach for a bigger model. It rarely helped. The model wasn’t the problem — it was answering faithfully from passages that didn’t contain the answer in the first place. Garbage in, confident garbage out.

The fix was retrieval, not generation.

Pure vector search is great at meaning. Ask “how do I request time off” and it finds the leave-policy page even if those exact words never appear. But it’s oddly bad at the things institutions care about most: a form number, an acronym, an exact policy code. “Form 3401-B” embeds into roughly the same place as “Form 3402-B”, and now you’re quoting the wrong form.

Keyword search has the opposite strengths. So I run both and merge them.

flowchart LR
Q[Question] --> S[Semantic<br/>search]
Q --> K[Keyword<br/>search]
S --> M[Merge]
K --> M
M --> R[Rerank]
R --> T[Top passages<br/>to the model]

Two retrievers, one merged-and-reranked result set.

What actually moved the needle

Reranking the merged set. Pulling 20 candidates from each retriever and letting a reranker pick the best 5 mattered more than any model swap.
Keeping exact terms exact. The keyword half rescued every question that hinged on a code, a name, or a number.
Measuring it. I built a small set of real questions with known-good passages, so “did retrieval get better” became a number instead of a vibe.

The bigger model is still there if I want it. But most days, the boring combination of two retrievers and a reranker is what made people stop saying “that’s wrong.”

Hybrid search beat a bigger model

Semantic search alone has a blind spot

What actually moved the needle