Breast Cancer Prediction by DF AI Research

Project: Breast Cancer Prediction

Project Objective

This project fully replicates the work originally done in a breast cancer prediction notebook, restructured into modular and executable .py scripts. The complete pipeline performs:

Environment and dependency setup.
Image loading and preprocessing (positive/negative samples).
Training of a simple CNN model.
Training of an advanced Vision Transformer (ViT) model.
Evaluation and comparison of both models using confusion matrices and performance reports.

Data Source

The images are sourced from the Kaggle dataset:
➡️ Breast Cancer Dataset – Kaggle

Script Structure (Execution Order)

1. 1_install_requirements.py

Purpose: Automatically install required dependencies and create the necessary folders.

Checks for packages: tensorflow, numpy, opencv-python, pillow, scikit-learn, matplotlib, joblib.
Creates directories: Data, Data/Cancer, Data/Negative, modeles, tmp.
Ensures compatibility with TensorFlow ≥ 2.20.

2. 2_prepare_data.py

Purpose: Load all images, resize, label, and save the dataset.

Positive images: ./Data/Cancer (label 1).
Negative images: ./Data/Negative (label 0).
Each image is resized to 120×120×3.
The dataset is serialized via joblib into ./tmp/dataset.pkl.

3. 3_train_simple_cnn.py

Purpose: Train a simple CNN model on the dataset.

Architecture: 2 convolution layers, 2 pooling layers, 1 dense hidden layer, 1 sigmoid output.
Optimizer: Adam.
Loss function: binary_crossentropy.
Trains for 15 epochs.
Saves to ./modeles/breast_cancer_simple_cnn.h5.

Typical Result:

Validation accuracy ≈ 87%.
Good generalization, minimal overfitting.

4. 4_train_vit.py

Purpose: Train a compact Vision Transformer (ViT).

Uses custom layers: Patches and PatchEncoder.
Optimizer: AdamW (native in tensorflow.keras.optimizers).
Trains for 20 epochs, batch size 32.
Saves to ./modeles/breast_cancer_vit.keras (native Keras format).

Typical Result:

Test accuracy ≈ 97%.
Very stable performance and excellent generalization.

5. 5_evaluate_and_predict.py

Purpose: Evaluate and compare the CNN and ViT models.

Loads dataset.pkl.
Evaluates models on the remaining 20% of data.
Computes precision, recall, f1-score, and confusion matrix.
Displays results graphically.
Also provides random sample predictions for verification.

Example CNN Result:

Overall accuracy: 0.94
Confusion Matrix:
[[78  3]
 [ 7 76]]

Example ViT Result:

Overall accuracy: 0.97
Confusion Matrix:
[[80  1]
 [ 3 80]]

Full Pipeline Execution

python 1_install_requirements.py
python 2_prepare_data.py
python 3_train_simple_cnn.py
python 4_train_vit.py
python 5_evaluate_and_predict.py

Key Strengths

Full and faithful conversion into modular Python scripts.
Compatible with TensorFlow ≥ 2.20 (no deprecated dependencies like tensorflow_addons).
Uses the modern .keras model format.
Provides clear, automated graphical evaluations.

Author and Project Origin

This project was designed and coded by Frédéric DELATTE, as part of his research work conducted on DFAIResearch.com, an independent platform dedicated to artificial intelligence, simulation, and scientific research.

Conclusion

The Breast Cancer Prediction / Pneumonia Prediction project delivers a complete, clean, and reproducible implementation of training and evaluation pipelines for two deep learning architectures (CNN and ViT) applied to medical imagery. It provides a practical comparison between classical convolutional models and transformer-based approaches, within an educational and fully automated framework.

Breast Cancer Prediction: https://doi.org/10.5281/zenodo.17547151

Pneumonia Prediction: https://doi.org/10.5281/zenodo.17550974