# Wine Quality Analysis

In this project, I explore the characteristics that influence the quality of red wine. 
The dataset includes various physicochemical properties such as acidity, alcohol, and pH, 
which are used to determine the quality score of each sample.

This analysis involves basic data exploration and visualization techniques 
to uncover patterns and correlations that might explain what makes a good wine.

## 1. Dataset Overview

We will start by loading the dataset and getting an overview of its structure.

In [None]:
# Importing the basic libraries for analysis
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use("ggplot")
%matplotlib inline
import plotly.graph_objects as go
import plotly.express as px

In [None]:
# Importing the dataset
df = pd.read_csv("WineQT.csv")

In [None]:
# Displaying the first few rows of the dataset
df.head()

In [None]:
# Print the shape of the dataset
print("Shape of the dataset:", df.shape)

In [None]:
# Check column names and data types
df.info()

## 2. Data Cleaning & Preparation

We check for null values, duplicates, and other anomalies that might need cleaning.

In [None]:
# Checking for missing values
df.isnull().sum()

In [None]:
# Checking for duplicate rows
print("Number of duplicate rows:", df.duplicated().sum())

In [None]:
# Removing duplicates (if any)
df.drop_duplicates(inplace=True)
print("New shape after removing duplicates:", df.shape)

## 3. Exploratory Data Analysis (EDA)

We analyze distributions of individual features and relationships between them 
to understand what influences wine quality.

### a. Distribution of Wine Quality Scores

In [None]:
sns.countplot(x='quality', data=df, palette='Set2')
plt.title("Distribution of Wine Quality Scores")
plt.xlabel("Quality Score")
plt.ylabel("Count")
plt.show()

### b. Relationship Between Features and Quality

We explore how features like alcohol, acidity, and sulphates relate to wine quality.

In [None]:
# Boxplot: Alcohol vs Quality
sns.boxplot(x='quality', y='alcohol', data=df, palette='coolwarm')
plt.title("Alcohol Content by Wine Quality")
plt.show()

In [None]:
# Boxplot: Volatile Acidity vs Quality
sns.boxplot(x='quality', y='volatile acidity', data=df, palette='coolwarm')
plt.title("Volatile Acidity by Wine Quality")
plt.show()

In [None]:
# Boxplot: Sulphates vs Quality
sns.boxplot(x='quality', y='sulphates', data=df, palette='coolwarm')
plt.title("Sulphates by Wine Quality")
plt.show()

### c. Correlation Matrix

This heatmap helps identify which features are strongly correlated with wine quality and with each other.

In [None]:
# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Feature Correlation Matrix")
plt.show()

## 4. Key Insights

- **Alcohol** has a strong positive correlation with wine quality.
- **Volatile acidity** is negatively correlated with quality — higher acidity tends to lower scores.
- **Sulphates** and **citric acid** show mild positive correlation.
- Other features have weaker or less consistent impact.

## 5. Conclusion

This beginner-level project helped me practice basic data analysis and visualization. 
The main takeaway is that alcohol content and certain acids can be good predictors of wine quality. 
In the future, this EDA could be extended with machine learning models 
to predict quality based on chemical features.

**Dataset Source**: [UCI Machine Learning Repository - Wine Quality Dataset](https://archive.ics.uci.edu/ml/datasets/wine+quality)