An Unsupervised machine learning algorithm to create a model with Python
-
Taking a company's dataset about all purchased made and details of customers
-
Depending on these data we create Clusters to understand it in a more better way
-
Here we have done 5 clusters and given 5 different colours namely,
- Red
- Green
- Blue
- Black
- Violet
-
All the co-ordinates are labelled accordingly. Here I took 3 labels
- Customer groups
- Spending scores (1-100)
- Annual income
-
Here we can get the clear picture of customer-sales data.
-
Now it can be used to analyse and take a correct decision to increase profit and also user needs.
- Numpy
- Pandas
- Seaborn
- Matplotlib
- Sklearn
The Dataset that I have used for this project is from Kaggle
A little peak into dataset
Checked for any missing data in the csv file, these fill feed false data into our model and we will loose accuracy
- Slicing of multiple columns
x=customer_data.iloc[:,[3,4]].values
- Finding WCSS value for each clusters and store it for a list
WCSS -> Within Clusters Sum Of Squares Distance b/w each clusters and centroid
we get,
Observe sharp cuttings suggests significant drop
-
Training the KMeans model
kmeans = KMeans(n_clusters=5,init='k-means++',random_state=0)
-
Doing prediction from the trained model, it'll give in ununderstandable format which is list of numbers
-
So we scatter all the clusters and their centroids
-
Based on x,y coordinate different colours have given to distinguish the clusters easily
-
Then using matplotlb we plot the graph like this
- By visualising the data we can understand these like,
- Blue = less income and less purchase
- Purple = less income and more purchase
- Green = more income and less purchase
- Black = more income more purchase
- Market can attract Blue group people providing some discounts
- Market can attract Green region people who have money but not buying more things
- Netflix suggesting group of people who are watching some genre more
- Google ads personalisation