Back to Blog
10 min read

Comparing ML Models for Cancer Prediction

The Challenge

Cancer prediction requires models that are not just accurate, but also interpretable. Medical professionals need to understand why a model makes a prediction before trusting it with patient diagnoses.

Models Evaluated

I tested 8 different algorithms:

  • Logistic Regression
  • Random Forest
  • XGBoost
  • LightGBM
  • Support Vector Machines
  • k-Nearest Neighbors
  • Neural Networks
  • Naive Bayes

Evaluation Metrics

For medical applications, accuracy alone is insufficient. I focused on:

MetricPurpose
Sensitivity (Recall)Minimizing missed diagnoses
SpecificityReducing false positives
F1-ScoreBalanced measure
AUC-ROCOverall discriminative ability

Key Findings

XGBoost consistently outperformed other models with:

MetricScore
F1-Score0.94
AUC-ROC0.97
Sensitivity0.96

However, Logistic Regression remained competitive and offered better interpretability.

SHAP Analysis

Using SHAP values, I identified the most important features:

  1. Cell size uniformity
  2. Clump thickness
  3. Marginal adhesion

These align with known clinical indicators, increasing model trustworthiness.