Parkinson's Progression Prediction

My undergraduate research project. It uses gradient-boosted ensembles to identify Parkinson’s subtypes and predict how the disease is likely to progress, from clinical and genetic features.

The problem

Disease progression isn’t uniform — patients fall into subtypes that progress differently. A model that just outputs a risk number isn’t enough; a clinician needs to know why, in terms they can question. So interpretation had to be a first-class part of the system, not an afterthought.

From data to a usable prediction

flowchart LR
D[Clinical &<br/>genetic data] --> F[Feature<br/>engineering]
F --> M[Gradient-boosted<br/>ensemble]
M --> P[Subtype +<br/>progression]
P --> I[SHAP<br/>interpretation]
I --> UI[Streamlit<br/>dashboard]

A classical ML pipeline, with interpretation built in.

What stuck with me

Accuracy alone wasn’t enough. Adding feature-importance explanations and a readable dashboard turned a number into something a human could interrogate and trust. That instinct — make the system inspectable — is the same one I carry into my RAG work now.

Stack

Python, XGBoost / LightGBM, SHAP, scikit-learn, Streamlit.