Machine Learning Project - Disease Detection with Genetic Algorithm Optimization

This project focuses on applying a genetic algorithm to optimize machine learning models for detecting three diseases:

Breast Cancer
Parkinson's Disease
PCOS (Polycystic Ovary Syndrome)

Datasets

The datasets used in this project are sourced from Kaggle:

Breast Cancer Wisconsin Data: Features extracted from breast mass aspirates.
Parkinson's Disease Detection: Vocal measurements from healthy and Parkinson's patients.
PCOS Dataset: Clinical measurements for PCOS diagnosis.

Models and Genetic Algorithm Optimization

1. Breast Cancer Detection

Models Evaluated:

Logistic Regression
Random Forest
AdaBoost
Decision Tree
K-Nearest Neighbors
Support Vector Machine (Linear)
Support Vector Machine (RBF)

The genetic algorithm is specifically applied to logistic regression, resulting in an accuracy improvement from 96.5% to 98.6%.

2. Parkinson's Disease Detection

Models Evaluated:

Random Forest
AdaBoost
Gradient Boosting
Decision Tree
KNN
Support Vector Machine (Linear)
Support Vector Machine (RBF)

Genetic algorithm optimization, specifically with gradient boosting, improves accuracy from 89.8% to 93.9%.

3. PCOS Detection

Models Evaluated:

Random Forest
AdaBoost
Gradient Boosting
Logistic Regression
Decision Tree
Support Vector Machine (Linear)
Support Vector Machine (RBF)
KNN

Genetic algorithm optimization enhances KNN accuracy from 84.6% to 88.2%.

Genetic Algorithm Workflow

Initialization: Random population of binary chromosomes indicating selected/not selected features.
Fitness Calculation: Calculate fitness scores based on model accuracy.
Selection: Choose the best-scoring chromosomes.
Crossover: Create a new population through chromosome crossover.
Mutation: Introduce random mutations in chromosomes.
Repeat: Iteratively repeat steps 2-5 for multiple generations.

Results

Application of the genetic algorithm optimization provides notable improvements:

Breast Cancer: Logistic Regression accuracy improves from 96.5% to 98.6%
Parkinson's Disease: Gradient Boosting accuracy improves from 89.8% to 93.9%
PCOS: KNN accuracy improves from 84.6% to 88.2%

The optimal feature subsets selected by the genetic algorithm are detailed in the notebook.

Usage

The IPython notebook contains the full code implementation. To use:

Install requirements: numpy, pandas, scikit-learn, matplotlib
Run Jupyter notebook
Run cells in order

The outputs include accuracy scores, confusion matrices, and feature importance graphs.

Conclusion

Using a genetic algorithm to optimize feature selection improves machine learning models for disease detection across different datasets. This demonstrates the effectiveness of the approach in healthcare applications.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Genertic_Algo_Code.ipynb		Genertic_Algo_Code.ipynb
PCOS_data.csv		PCOS_data.csv
Parkinsson disease.csv		Parkinsson disease.csv
README.md		README.md
data.csv		data.csv
mltriumvirate.ipynb		mltriumvirate.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Project - Disease Detection with Genetic Algorithm Optimization

Datasets

Models and Genetic Algorithm Optimization

1. Breast Cancer Detection

Models Evaluated:

2. Parkinson's Disease Detection

Models Evaluated:

3. PCOS Detection

Models Evaluated:

Genetic Algorithm Workflow

Results

Usage

Conclusion

About

Releases

Packages

Languages

x-INFiN1TY-x/BreastCancer_Parkinson_PCOS_ML

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Project - Disease Detection with Genetic Algorithm Optimization

Datasets

Models and Genetic Algorithm Optimization

1. Breast Cancer Detection

Models Evaluated:

2. Parkinson's Disease Detection

Models Evaluated:

3. PCOS Detection

Models Evaluated:

Genetic Algorithm Workflow

Results

Usage

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages