This project focuses on training and fine-tuning a Decision Tree Classifier to predict breast cancer outcomes as either positive or negative based on a diverse range of significant attributes.
The dataset used for this project is the Breast Cancer Wisconsin (Diagnostic) Data Set, containing features for the prediction of the class : Malignant(+ve) or Benign(-ve).
A basic overview of breast cancer dataset is covered in 📒notebook-1. Simple plots to show distribution of features.
Built a basic DecisionTree with default parameters and trained on the training dataset in 📒notebook-2.
The least important features found in the previous notebook are then reduced to n optimal dimensions using Principal Component Analysis (PCA) in 📒notebook-3. The top n principal components having the highest eigenvalues are chosen for model training. A sample is shown below explaining data variance by top 3 eigenvectors.
Further along, in 📒notebook-4, hyperparameters are tuned and optimal parameters are then used for the prediction. RESULT: Individual hyper-parameter training show better results than GridSearch CV.
Prediction performance of the ✨tuned model:
To understand how the tuned model works and how it is making predictions, in 📒notebook-5, SHAP library is used for Model Interpretability. Global and The SHAP library is used to achieve Model Interpretability, enabling both global and local analyses of the optimized model's behavior. The Decision Plot below illustrates how individual features contribute to the prediction process, providing a clear understanding of the model's decision-making logic.
- Clone the repository
git clone https://github.com/PragyanTiwari/Breast-Cancer-Prediction-with-DecisionTree-Classifier.git
- Using Makefile :
# install uv if not
pip install --upgrade uv
# to create virtual env
make create_environment
# install python dependencies
make requirements
# build predictions
make breast_cancer_prediction
- Using uv (If not Makefile):
# to create virtual env
uv venv
# install python dependencies
uv add --requirements 'requirements.txt' --dev
# build predictions
uv run make_predictions
❕The output will be saved as predictions.csv
in data\result dir.