Overview

title

author

output

Machine Learning for Binary Classification

Tai Luan Nguyen

pdf_document	html_document
default	default

Overview

This project performs various machine learning tasks on a dataset with 19020 samples and 11 attributes. Demonstrates the application of various machine learning algorithms to a dataset, allowing the comparison of their performance using classification reports.

Additionally, it conducts a hyperparameter search for a neural network model.

The project including:

Importing Libaries
Dataset Loading and Preprocessing
Split dataset
Data Scaling and Oversampling
k-Nearest Neighbour (kNN) Classifier
Naive Bayes Classifier
Logistic Regression Classifier
Support Vector Machine (SVM) Classifier
Neurol Network (Deep Learning) Classifier

Project link: https://github.com/luan30092000/binaryClassification

Dataset

Data are MC generated to simulate registration of high energy gamma particles in an atmospheric Cherenkov telescope.

Reference: Bock,R.. (2007). MAGIC Gamma Telescope. UCI Machine Learning Repository. https://doi.org/10.24432/C52C8B.

Importing Libaries

Necessary libraries and its purpose for this project:\

numpy: data manipulation and array operation.
pandas: load and manipulate dataset, as well as for data preprocessing and analysis.
matplotlib: matplotlib.pyplot, create histograms for data visualization.
sklearn.preprocessing: StandardScaler from this library is used for feature scaling, which standardizes the data to have a mean of 0 and a standard deviation of 1.
imblearn.over_sampling: RandomOverSampler is used to oversample the minority class to balance the class distribution.
sklearn.neighbors: KNeighborsClassifier to create and evaluate a kNN model.
sklearn.naive_bayes: GaussianNB to create and evaluate a Naive Bayes model.
sklearn.linear_model: LogisticRegression to create and evaluate a logistic regression model.
sklearn.svm: SVC to create and evaluate an SVM model
tensorflow: used to create and train a neural network for classification

Dataset Loading and Preprocessing

Loads a dataset from a file named "magic04.data" using pandas and defines column names for the dataset.
Converts the "class" column to binary values (0 or 1) by mapping "g" to 1 and "h" to 0.
Plots histograms for each feature, comparing the distributions for class 0 (hadron) and class 1 (gamma).

Train, Validation, and Test Datasets:

The dataset is split into train, validation, and test sets using the np.split function, with a 60-20-20 split ratio.

Data Scaling and Oversampling

Scale the features and, optionally, oversample the dataset.
StandardScaler is used to scale the features

k-Nearest Neighbours (kNN) Classifier

Uses the scikit-learn library to create a k-Nearest Neighbors (kNN) model with 5 neighbors.
Fits the model to the training data and makes predictions on the test data.
Accuracy: 81%

Naive Bayes Classifier

Uses scikit-learn's Gaussian Naive Bayes classifier to create a Naive Bayes model.
Fits the model to the training data and makes predictions on the test data.
Accuracy: 73%

Logistic Regression Classifier

Logistic regression model is created using scikit-learn
The model is trained on the training data and used to predict on the test data.
Accuracy: 78%

Support Vector Machine (SVM) Classifier

Support Vector Machine (SVM) classifier is created using scikit-learn.
Model is trained on the training data and used to predict on the test data.
Accuracy: 85%

Neural Network (Deep Learning) Classifier

Neural network model is defined using TensorFlow/Keras
Performs a hyperparameter grid search, iterating over various combinations of the number of nodes, dropout probability, learning rate, and batch size.
Trains multiple neural network models with different hyperparameters on the training data and selects the model with the lowest validation loss.
The chosen neural network model has the accuracy of 87%

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.ipynb_checkpoints		.ipynb_checkpoints
magic+gamma+telescope		magic+gamma+telescope
.DS_Store		.DS_Store
ML_binary_classification.pdf		ML_binary_classification.pdf
README.md		README.md
binaryClassification.ipynb		binaryClassification.ipynb
magic+gamma+telescope.zip		magic+gamma+telescope.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Dataset

Importing Libaries

Dataset Loading and Preprocessing

Train, Validation, and Test Datasets:

Data Scaling and Oversampling

k-Nearest Neighbours (kNN) Classifier

Naive Bayes Classifier

Logistic Regression Classifier

Support Vector Machine (SVM) Classifier

Neural Network (Deep Learning) Classifier

About

Uh oh!

Releases

Packages

Languages

tailuannguyen/binaryClassification

Folders and files

Latest commit

History

Repository files navigation

Overview

Dataset

Importing Libaries

Dataset Loading and Preprocessing

Train, Validation, and Test Datasets:

Data Scaling and Oversampling

k-Nearest Neighbours (kNN) Classifier

Naive Bayes Classifier

Logistic Regression Classifier

Support Vector Machine (SVM) Classifier

Neural Network (Deep Learning) Classifier

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages