Institutional RAG Platform · Muhammad Sohail

This is the system I spend most of my time on — the internal assistant for the HR and IT departments at Université d’Évry Paris-Saclay. Staff ask it questions in plain French or English, and it answers from the university’s own documents instead of guessing.

The problem

University staff were spending real time hunting through PDFs, intranet pages, and old email threads to answer routine questions — leave policies, IT procedures, form numbers. The information existed; it was just scattered and unsearchable. The constraint that shaped everything: this is HR and IT data, so none of it can leave the building to be processed by an outside API.

How it fits together

flowchart LR
subgraph Ingest[Ingestion -- offline]
  D[Documents<br/>PDF, DOCX] --> P[Parse &<br/>chunk]
  P --> E[Embed]
  E --> V[(Vector store<br/>FAISS + pgvector)]
end
subgraph Query[Query -- real time]
  U[Staff question] --> H[Hybrid search<br/>semantic + keyword]
  V --> H
  H --> R[Rerank]
  R --> L[LLM<br/>self-hosted]
  L --> A[Answer + sources]
end

Ingestion runs offline; the query path is what staff hit in real time.

What’s in it

Multimodal ingestion — handles tables and scanned PDFs, not just clean text.
Hybrid retrieval — semantic search catches meaning; keyword search catches exact terms like form numbers and acronyms. Neither alone was good enough.
Self-hosted serving — open-source models in the university data center, so no document ever leaves the premises.
Role-based access — HR content stays with HR.
Observability — every query is logged, so I can see what people ask and where retrieval comes back thin.

Stack

Python, LangChain, FastAPI, PostgreSQL + pgvector, FAISS, vLLM, Docker, Grafana.

Result

It serves the HR and IT teams day to day. The biggest lesson: retrieval quality, not model size, decided whether people trusted it. Once I added reranking and the keyword half of the search, the “that’s wrong” complaints mostly disappeared.