Muhammad Sohail

Muhammad Sohail

Data & AI Engineer · Paris, France

Data & AI Engineer building production retrieval systems and self-hosted LLM infrastructure. I care about software that ships and keeps running.

About

I'm a Data & AI Engineer based near Paris. Most of my work lives in one corner of the field: getting real documents into a language model and trustworthy answers back out — retrieval-augmented generation — and building the infrastructure that keeps it running every day.

I run the internal AI platform at Université d'Évry Paris-Saclay, where it serves the HR and IT teams. I built the first version of it during my master's thesis, then stayed on to turn the prototype into something people actually depend on.

I lean toward self-hosted, open-source models when the data shouldn't leave the building, and toward boring, reliable infrastructure over whatever framework launched last week. I'd rather ship a system that handles auth, logging, and failure than demo one that only works on the happy path.

Experience

Data & AI Engineer · Université d'Évry Paris-Saclay Sep 2025 — Present
  • I design and run the university's internal AI platform for the HR and IT departments — from the retrieval pipeline down to the monitoring.
  • Built a multimodal document pipeline that ingests the messy PDFs people actually have, and a hybrid search that mixes semantic and keyword matching.
  • Operate self-hosted LLM serving with French and English support, plus custom integrations for directory lookups, ticketing, and workflow automation.
  • Set up logging and observability so I catch problems before users report them.
PythonLangChainFastAPIPostgreSQLFAISSDockervLLM
Data Science Intern — Master's Thesis · Université d'Évry Mar — Jun 2025
  • Built the first production HR & IT chatbot using retrieval-augmented generation, deliberately on open-source models for data sovereignty.
  • Engineered the document-parsing pipeline with Unstructured.io, Docling, and LlamaParse.
  • Deployed it inside the university data center — no data leaving the premises.
Data Analyst Intern · CodeClause (Remote) Oct 2023 — Apr 2024
  • Wrote ETL routines to clean and reshape messy real-world datasets.
  • Built interactive dashboards and statistical reports for non-technical stakeholders.

Selected projects

All projects →

Writing

All writing →

Education

M.Sc. Data Science & Network Intelligence 2024 — 2025

Télécom SudParis — Institut Polytechnique de Paris

  • Graduated with mention, GPA 15.81/20.
  • Thesis on multimodal RAG for university information management.
B.Sc. Computer Science 2019 — 2023

University of Malakand

  • First class honors, GPA 3.75/4.0.

Skills

AI & LLMs

RAGLangChainLangGraphvLLMOllamaHugging FaceOpenAI & Anthropic APIs

Machine Learning

PyTorchscikit-learnXGBoostPandasNumPy

Backend & Data

PythonSQLFastAPIPostgreSQLpgvectorFAISSRedis

Infrastructure

DockerLinuxNginxGitGrafanaOpenTelemetry

Contact

Available for freelance and consulting work. The fastest way to reach me is email — I usually reply within a day.

Based in Paris, France. Languages: English, Urdu, Pashto, French.