Parkinson's Progression Prediction
My undergraduate research project. It uses gradient-boosted ensembles to identify Parkinson’s subtypes and predict how the disease is likely to progress, from clinical and genetic features.
The problem
Disease progression isn’t uniform — patients fall into subtypes that progress differently. A model that just outputs a risk number isn’t enough; a clinician needs to know why, in terms they can question. So interpretation had to be a first-class part of the system, not an afterthought.
From data to a usable prediction
flowchart LR D[Clinical &<br/>genetic data] --> F[Feature<br/>engineering] F --> M[Gradient-boosted<br/>ensemble] M --> P[Subtype +<br/>progression] P --> I[SHAP<br/>interpretation] I --> UI[Streamlit<br/>dashboard]
A classical ML pipeline, with interpretation built in.
What stuck with me
Accuracy alone wasn’t enough. Adding feature-importance explanations and a readable dashboard turned a number into something a human could interrogate and trust. That instinct — make the system inspectable — is the same one I carry into my RAG work now.
Stack
Python, XGBoost / LightGBM, SHAP, scikit-learn, Streamlit.