
Project: Breast Cancer Prediction
Project Objective
This project fully replicates the work originally done in a breast cancer prediction notebook, restructured into modular and executable .py scripts. The complete pipeline performs:
- Environment and dependency setup.
- Image loading and preprocessing (positive/negative samples).
- Training of a simple CNN model.
- Training of an advanced Vision Transformer (ViT) model.
- Evaluation and comparison of both models using confusion matrices and performance reports.
Data Source
The images are sourced from the Kaggle dataset:
➡️ Breast Cancer Dataset – Kaggle
Script Structure (Execution Order)
1. 1_install_requirements.py
Purpose: Automatically install required dependencies and create the necessary folders.
- Checks for packages:
tensorflow,numpy,opencv-python,pillow,scikit-learn,matplotlib,joblib. - Creates directories:
Data,Data/Cancer,Data/Negative,modeles,tmp. - Ensures compatibility with TensorFlow ≥ 2.20.
2. 2_prepare_data.py
Purpose: Load all images, resize, label, and save the dataset.
- Positive images:
./Data/Cancer(label 1). - Negative images:
./Data/Negative(label 0). - Each image is resized to 120×120×3.
- The dataset is serialized via
joblibinto./tmp/dataset.pkl.
3. 3_train_simple_cnn.py
Purpose: Train a simple CNN model on the dataset.
- Architecture: 2 convolution layers, 2 pooling layers, 1 dense hidden layer, 1 sigmoid output.
- Optimizer: Adam.
- Loss function:
binary_crossentropy. - Trains for 15 epochs.
- Saves to
./modeles/breast_cancer_simple_cnn.h5.
Typical Result:
- Validation accuracy ≈ 87%.
- Good generalization, minimal overfitting.
4. 4_train_vit.py
Purpose: Train a compact Vision Transformer (ViT).
- Uses custom layers:
PatchesandPatchEncoder. - Optimizer:
AdamW(native intensorflow.keras.optimizers). - Trains for 20 epochs, batch size 32.
- Saves to
./modeles/breast_cancer_vit.keras(native Keras format).
Typical Result:
- Test accuracy ≈ 97%.
- Very stable performance and excellent generalization.
5. 5_evaluate_and_predict.py
Purpose: Evaluate and compare the CNN and ViT models.
- Loads
dataset.pkl. - Evaluates models on the remaining 20% of data.
- Computes precision, recall, f1-score, and confusion matrix.
- Displays results graphically.
- Also provides random sample predictions for verification.
Example CNN Result:
Overall accuracy: 0.94
Confusion Matrix:
[[78 3]
[ 7 76]]
Example ViT Result:
Overall accuracy: 0.97
Confusion Matrix:
[[80 1]
[ 3 80]]
Full Pipeline Execution
python 1_install_requirements.py
python 2_prepare_data.py
python 3_train_simple_cnn.py
python 4_train_vit.py
python 5_evaluate_and_predict.py
Key Strengths
- Full and faithful conversion into modular Python scripts.
- Compatible with TensorFlow ≥ 2.20 (no deprecated dependencies like
tensorflow_addons). - Uses the modern
.kerasmodel format. - Provides clear, automated graphical evaluations.
Author and Project Origin
This project was designed and coded by Frédéric DELATTE, as part of his research work conducted on DFAIResearch.com, an independent platform dedicated to artificial intelligence, simulation, and scientific research.
Conclusion
The Breast Cancer Prediction / Pneumonia Prediction project delivers a complete, clean, and reproducible implementation of training and evaluation pipelines for two deep learning architectures (CNN and ViT) applied to medical imagery. It provides a practical comparison between classical convolutional models and transformer-based approaches, within an educational and fully automated framework.
Breast Cancer Prediction: https://doi.org/10.5281/zenodo.17547151
Pneumonia Prediction: https://doi.org/10.5281/zenodo.17550974
