# Notebook B: Clustering and Dimension Reduction
In this notebook, we explore the application of dimension reduction techniques on the [Wine dataset](https://archive.ics.uci.edu/dataset/109/wine), a classic dataset in machine learning. We will standardize the dataset, apply KMeans clustering, and visualizing the results with principal component analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE). Additionally, we will introduce Linear Discriminant Analysis (LDA) as another dimension reduction technique to compare its effectiveness against PCA and t-SNE. This exercise aims to provide insights into the dataset's structure and the distinct groups within the wine samples.

### Setup imports

### Load data set
Use the `sklearn.datasets.load_wine()` function to load the Wine dataset. Convert it into a pandas DataFrame for easier manipulation and visualization. Add a column to the data frame for the target.

### Data Preprocessing
Standardize the features of the Wine dataset using the standard scaler. 

### Exploratory Data Analysis
Perform exploratory data analysis on the Wine dataset. Visualize the distribution of a couple of the alcohol content and malic acid content using histograms.

### Clustering with KMeans
Apply KMeans clustering on the standardized features of the Wine dataset. Use three clusters to separate the data and random state = 42. Add a column to the `wine_df` for the Kmeans cluster each point is in.

### Dimension Reduction with PCA
Apply PCA on the standardized Wine dataset to reduce its dimensions. Visualize the data in the first two principal components and color the points by their KMeans cluster. How well do the PCA components represent the clusters?

### t-SNE Visualization
Now apply t-SNE to the standardized Wine dataset and visualize the result using random_state = 42. Color the points based on their labels from the target column. How does the t-SNE visualization compare to the PCA visualization in terms of cluster separation?

### Dimension Reduction with LDA

Linear Discriminant Analysis (LDA) is a technique used to reduce dimensions of the dataset while preserving as much class discriminatory information as possible. Unlike PCA, which does not consider the class labels when finding the principal components, LDA aims to provide the best class separability. Let's apply LDA to the Wine dataset and observe how it compares with PCA in terms of class separation.

Then create a plot with LDA clusters where the points are colored based on the `wine_df` target column

### End of Notebook A