Skip to content

Multi-label wine classification ML project trained using Kaggle wine quality dataset 📊

Notifications You must be signed in to change notification settings

yixin0829/multi-label-wine-quality-classification

Repository files navigation

multi_label_wine_quality_classification:bar_chart:

This is a project where I practiced training various different multi-label wine quality classifiers with one vs. all method.

The workflow includes EDA (exploratory analysis, data visualization), data preprocessing (feature selection with chi-square test, oversampling minority classes with synthetic data, feature scaling), and trained data on different classification ML models (logistic regression, linear supported vector machine (SVM), kernel SVM, and K-NN)

Feel free to click into the .ipynb notebook for detailed analysis.

EDA

The dataset is extremely skewed with minority class (i.e. wine quality) like '3' and '8' share less than 1% of the total population. We can see this by plotting a histogram on 'quality' column. quality_count

A clearer visualization of the correlations between features by plotting out a heatmap: corr_heat

Further visualize the relations between features and wine quality. Notice features like "pH", "chlorides", "residual sugar" almost have no impact on classifying the quality of the wine. feature_bar

Preprocessing

  • Feature selection using chi-square test
  • Drop irrelevant features
  • Split dataset
  • Apply SMOTE to oversample minority classes data by generating synthetic training data using K-NN. Note we do not oversample testing data.
  • Feature scaling

Result

Because of the skewed nature of the dataset. Use F1-score as the performance metric. By applying synthetic minority oversampling technique, KNN model has a notable increase in its weighted F1-score avg from 0.52 to 0.67. The accuracy also went from 51% to 65%. The other models like logistic regression, linear SVM, and kernel SVM did not perform better as expected.

Logistic Regression

log

Linear SVM & Kernel SVM

svm

K-NN (Rapid Prototype)

knn

K-NN (Final)

knn2

About

Multi-label wine classification ML project trained using Kaggle wine quality dataset 📊

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published