# Customer Personality Analysis

<img src="images/customer.jpg">

Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviours and concerns of different types of customers. In this project, I’m going to introduce you to a data science project on customer personality analysis with Python.

In [1]:
import numpy as np
import pandas as pd
import datetime
from datetime import date
import matplotlib
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from sklearn.preprocessing import StandardScaler, normalize
from sklearn import metrics
from sklearn.mixture import GaussianMixture

import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_column', 100)

In [2]:
df=pd.read_csv('https://raw.githubusercontent.com/amankharwal/Website-data/master/marketing_campaign.csv',header=0,sep=';')

In [7]:
# pip install dataprep

In [1]:
# from dataprep.eda import plot, plot_correlation, create_report, plot_missing
# plot(df)

<img src="images/2.png">

**Note** : You will get to see more visualizations in the output.

In [3]:
#Spending variable creation
df['Age']=2014-df['Year_Birth']

df['Spending']=df['MntWines']+df['MntFruits']+df['MntMeatProducts']+df['MntFishProducts']+df['MntSweetProducts']+df['MntGoldProds']
#Seniority variable creation
last_date = date(2014,10, 4)
df['Seniority']=pd.to_datetime(df['Dt_Customer'], dayfirst=True,format = '%Y-%m-%d')
df['Seniority'] = pd.to_numeric(df['Seniority'].dt.date.apply(lambda x: (last_date - x)).dt.days, downcast='integer')/30
df=df.rename(columns={'NumWebPurchases': "Web",'NumCatalogPurchases':'Catalog','NumStorePurchases':'Store'})
df['Marital_Status']=df['Marital_Status'].replace({'Divorced':'Alone','Single':'Alone','Married':'In couple','Together':'In couple','Absurd':'Alone','Widow':'Alone','YOLO':'Alone'})
df['Education']=df['Education'].replace({'Basic':'Undergraduate','2n Cycle':'Undergraduate','Graduation':'Postgraduate','Master':'Postgraduate','PhD':'Postgraduate'})

df['Children']=df['Kidhome']+df['Teenhome']
df['Has_child'] = np.where(df.Children> 0, 'Has child', 'No child')
df['Children'].replace({3: "3 children",2:'2 children',1:'1 child',0:"No child"},inplace=True)
df=df.rename(columns={'MntWines': "Wines",'MntFruits':'Fruits','MntMeatProducts':'Meat','MntFishProducts':'Fish','MntSweetProducts':'Sweets','MntGoldProds':'Gold'})


df=df[['Age','Education','Marital_Status','Income','Spending','Seniority','Has_child','Children','Wines','Fruits','Meat','Fish','Sweets','Gold']]
df.head()

Unnamed: 0,Age,Education,Marital_Status,Income,Spending,Seniority,Has_child,Children,Wines,Fruits,Meat,Fish,Sweets,Gold
0,57,Postgraduate,Alone,58138.0,1617,25.333333,No child,No child,635,88,546,172,88,88
1,60,Postgraduate,Alone,46344.0,27,7.0,Has child,2 children,11,1,6,2,1,6
2,49,Postgraduate,In couple,71613.0,776,13.633333,No child,No child,426,49,127,111,21,42
3,30,Postgraduate,In couple,26646.0,53,7.866667,Has child,1 child,11,4,20,10,3,5
4,33,Postgraduate,In couple,58293.0,422,8.6,Has child,1 child,173,43,118,46,27,15


In [4]:
df=df.dropna(subset=['Income'])
df=df[df['Income']<600000]

## Clustering

To take a look at the clustering of clients in the dataset, I’ll define the segments of the clients. Here we will use 4 equally weighted customer segments:

1. `Stars:` Old customers with high income and high spending nature.
2. `Neet Attention:` New customers with below-average income and low spending nature.
3. `High Potential:` New customers with high income and high spending nature.
4. `Leaky Bucket:` Old customers with below-average income and a low spending nature.

In [5]:
scaler=StandardScaler()
dataset_temp=df[['Income','Seniority','Spending']]
X_std=scaler.fit_transform(dataset_temp)
X = normalize(X_std,norm='l2')

gmm=GaussianMixture(n_components=4, covariance_type='spherical',max_iter=2000, random_state=5).fit(X)
labels = gmm.predict(X)
dataset_temp['Cluster'] = labels
dataset_temp=dataset_temp.replace({0:'Stars',1:'Need attention',2:'High potential',3:'Leaky bucket'})
df = df.merge(dataset_temp.Cluster, left_index=True, right_index=True)

pd.options.display.float_format = "{:.0f}".format
summary=df[['Income','Spending','Seniority','Cluster']]
summary.set_index("Cluster", inplace = True)
summary=summary.groupby('Cluster').describe().transpose()
summary.head()

Unnamed: 0,Cluster,High potential,Leaky bucket,Need attention,Stars
Income,count,584,641,528,462
Income,mean,34757,37705,69542,73438
Income,std,12075,12397,12006,13753
Income,min,2447,1730,44802,49090
Income,25%,26489,28839,60880,65298


In [6]:
PLOT = go.Figure()
for C in list(df.Cluster.unique()):


    PLOT.add_trace(go.Scatter3d(x = df[df.Cluster == C]['Income'],
                                y = df[df.Cluster == C]['Seniority'],
                                z = df[df.Cluster == C]['Spending'],
                                mode = 'markers',marker_size = 6, marker_line_width = 1,
                                name = str(C)))
PLOT.update_traces(hovertemplate='Income: %{x} <br>Seniority: %{y} <br>Spending: %{z}')


PLOT.update_layout(width = 800, height = 800, autosize = True, showlegend = True,
                   scene = dict(xaxis=dict(title = 'Income', titlefont_color = 'black'),
                                yaxis=dict(title = 'Seniority', titlefont_color = 'black'),
                                zaxis=dict(title = 'Spending', titlefont_color = 'black')),
                   font = dict(family = "Gilroy", color  = 'black', size = 12))

<img src="images/1.png">