# Lecture 1c: Unsupervised Learning and Clustering Approaches
This lecture introduces the first unsupervised learning approaches we will explore: k-means clustering and self-organizing maps. We will use these algorithms to identify hidden patterns and structures in data without explicit guidance.

The key concepts covered in this lecture include:
* __Unsupervised learning__: is a type of machine learning that involves training algorithms on unlabeled data. Unsupervised learning aims to identify patterns and structures in data without explicit guidance. 
Unsupervised learning is particularly useful when dealing with large volumes of unstructured data or when the desired outcomes are unknown.}
* __Clustering__: is a typical unsupervised learning technique that involves dividing a dataset into distinct groups, or clusters, based on the similarity of data points. 
Clustering algorithms aim to group data points that are more similar to each other than to those in different clusters.
* __K-means clustering__ is a popular and straightforward clustering algorithm that partitions a dataset into $k$ clusters. 
The algorithm iteratively assigns data points to the nearest cluster center and updates the cluster centers based on the mean of the assigned points.
* __Self-organizing maps (SOMs)__: are another unsupervised learning algorithm that uses a neural network to map high-dimensional data onto a lower-dimensional grid.

Lecture notes can be found: [here!](docs/Notes.pdf)

## Prerequisites
We set up the computational environment by including the `Include.jl` file and then load any needed resources. Finally, we end by setting up any required constants. The `Include.jl` file loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem.

In [10]:
include("Include.jl")

In this lecture, we'll work with a [customer spending preferences dataset from Kaggle](https://www.kaggle.com/code/heeraldedhia/kmeans-clustering-for-customer-data?select=Mall_Customers.csv). This dataset was created learning customer segmentation concepts, known as [market basket analysis](https://en.wikipedia.org/wiki/Market_basket). We will demonstrate the unsupervised ML technique: k-means clustering analysis.

In [65]:
originaldataset = CSV.read(joinpath(_PATH_TO_DATA, "mall-customers-dataset.csv"), DataFrame)

Row,id,gender,age,income,spendingscore
Unnamed: 0_level_1,Int64,String7,Int64,Int64,Int64
1,1,Male,19,15,39
2,2,Male,21,15,81
3,3,Female,20,16,6
4,4,Female,23,16,77
5,5,Female,31,17,40
6,6,Female,22,17,76
7,7,Female,35,18,6
8,8,Female,23,18,94
9,9,Male,64,19,3
10,10,Female,30,19,72


Let's remap the `gender` column to either a `Male = -1` or `Female = 1` so we can look at (and analyze) this data later. 

In [83]:
dataset = let
    treated_dataset = copy(originaldataset);
    transform!(treated_dataset, :gender => ByRow( x-> (x=="Male" ? -1 : 1)) => :gender);
    treated_dataset 
end;

In [85]:
dataset

Row,id,gender,age,income,spendingscore
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64
1,1,-1,19,15,39
2,2,-1,21,15,81
3,3,1,20,16,6
4,4,1,23,16,77
5,5,1,31,17,40
6,6,1,22,17,76
7,7,1,35,18,6
8,8,1,23,18,94
9,9,-1,64,19,3
10,10,1,30,19,72


## What is unsupervised learning?
Fill me in