Comparing ML Models for Cancer Prediction

The Challenge

Cancer prediction requires models that are not just accurate, but also interpretable. Medical professionals need to understand why a model makes a prediction before trusting it with patient diagnoses.

Models Evaluated

I tested 8 different algorithms:

Logistic Regression
Random Forest
XGBoost
LightGBM
Support Vector Machines
k-Nearest Neighbors
Neural Networks
Naive Bayes

Evaluation Metrics

For medical applications, accuracy alone is insufficient. I focused on:

Metric	Purpose
Sensitivity (Recall)	Minimizing missed diagnoses
Specificity	Reducing false positives
F1-Score	Balanced measure
AUC-ROC	Overall discriminative ability

Key Findings

XGBoost consistently outperformed other models with:

Metric	Score
F1-Score	0.94
AUC-ROC	0.97
Sensitivity	0.96

However, Logistic Regression remained competitive and offered better interpretability.

SHAP Analysis

Using SHAP values, I identified the most important features:

Cell size uniformity
Clump thickness
Marginal adhesion

These align with known clinical indicators, increasing model trustworthiness.