Data exploration of the wine quality dataset
The dataset can be found at the UCI Machine Learning Repository. Wines are divided into 2 categories, white wines and red wines. The analysis concerned both type of wines and is based on the 13 variables/characteristics presented in the dataset :
- fixed acidity
- volatile acidity
- citric acid
- residual sugar
- chlorides
- free sulfur dioxide
- total sulfur dioxide
- density
- pH
- sulphates
- alcohol
- Output variable (based on sensory data): quality (score between 0 and 10)
The goal is to explore the Wine Quality dataset in order to extract the main features and characteristics from the data and predict the wine quality. We will consider this problem as a regression task.
- Exploratory Data Analysis
- Data preprocessing
- Quality prediction
Before you continue, ensure you have met the following requirements:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
pip install pandas nupy matplotlib seaborn sklearn
Cheers! 🍷