# Clustering Tutorial (CLU101) - Level Beginner

This notebook introduces clustering with PyCaret. It is designed for beginners looking to get started with PyCaret's `pycaret.clustering` module.

## Objectives:
1. Import data from the PyCaret repository.
2. Set up the PyCaret environment for clustering.
3. Build and analyze clustering models.
4. Assign cluster labels to datasets.
5. Save and load clustering models for reuse.

Read Time: **Approx. 25 minutes**

## 1.0 Installing PyCaret

To get started, install PyCaret by running the following command:

In [None]:
!pip install pycaret

### For Google Colab Users:
Run the following code to enable interactive visuals in Colab.

In [None]:
from pycaret.utils import enable_colab
enable_colab()

## 2.0 What is Clustering?
Clustering is the task of grouping similar objects together based on certain features. Common applications include customer segmentation, document categorization, and outcome analysis in experiments.

## 3.0 Overview of PyCaret's Clustering Module
PyCaret's `pycaret.clustering` module provides tools for unsupervised learning, including preprocessing, multiple clustering algorithms, and visualization capabilities.

## 4.0 Dataset for the Tutorial
We will use the **Mice Protein Expression** dataset from UCI. This dataset consists of 1080 measurements across 77 features.

To load the dataset directly from PyCaret:

In [None]:
from pycaret.datasets import get_data
dataset = get_data('mice')

# Check the dataset's shape
print(dataset.shape)

### Splitting Data for Modeling and Predictions
We will hold back 5% of the data for predictions.

In [None]:
data = dataset.sample(frac=0.95, random_state=786).reset_index(drop=True)
data_unseen = dataset.drop(data.index).reset_index(drop=True)

print('Data for Modeling:', data.shape)
print('Unseen Data for Predictions:', data_unseen.shape)

## 5.0 Setting Up the PyCaret Environment
Initialize the PyCaret environment using the `setup()` function.

In [None]:
from pycaret.clustering import *

exp_clu101 = setup(
    data, 
    normalize=True, 
    ignore_features=['MouseID'], 
    session_id=123
)

## 6.0 Creating and Assigning a Model
We will create a K-Means model and assign cluster labels to the dataset.

In [None]:
kmeans = create_model('kmeans')
kmean_results = assign_model(kmeans)

# Display the first few rows of the dataset with cluster labels
kmean_results.head()

## 7.0 Visualizing Clusters
We can analyze the clustering results using various plots.

In [None]:
# PCA Plot
plot_model(kmeans, plot='pca')

In [None]:
# Elbow Plot
plot_model(kmeans, plot='elbow')

In [None]:
# Silhouette Plot
plot_model(kmeans, plot='silhouette')

## 8.0 Predicting on Unseen Data
We can use the trained model to predict cluster labels for new data.

In [None]:
unseen_predictions = predict_model(kmeans, data=data_unseen)
unseen_predictions.head()

## 9.0 Saving and Loading the Model
Save the trained model for future use and reload it when needed.

In [None]:
# Save the model
save_model(kmeans, 'Final Kmeans Model')

# Load the model
saved_kmeans = load_model('Final Kmeans Model')

### Predict Using the Loaded Model

In [None]:
new_prediction = predict_model(saved_kmeans, data=data_unseen)
new_prediction.head()