•10 min read
Comparing ML Models for Cancer Prediction
The Challenge
Cancer prediction requires models that are not just accurate, but also interpretable. Medical professionals need to understand why a model makes a prediction before trusting it with patient diagnoses.
Models Evaluated
I tested 8 different algorithms:
- Logistic Regression
- Random Forest
- XGBoost
- LightGBM
- Support Vector Machines
- k-Nearest Neighbors
- Neural Networks
- Naive Bayes
Evaluation Metrics
For medical applications, accuracy alone is insufficient. I focused on:
| Metric | Purpose |
|---|---|
| Sensitivity (Recall) | Minimizing missed diagnoses |
| Specificity | Reducing false positives |
| F1-Score | Balanced measure |
| AUC-ROC | Overall discriminative ability |
Key Findings
XGBoost consistently outperformed other models with:
| Metric | Score |
|---|---|
| F1-Score | 0.94 |
| AUC-ROC | 0.97 |
| Sensitivity | 0.96 |
However, Logistic Regression remained competitive and offered better interpretability.
SHAP Analysis
Using SHAP values, I identified the most important features:
- Cell size uniformity
- Clump thickness
- Marginal adhesion
These align with known clinical indicators, increasing model trustworthiness.