<a href="https://colab.research.google.com/github/thesanketpawar/Bank-Note-Authentiation-with-DT-RF/blob/main/Customer_Behavior_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

CUSTOMER BEHAVIOUR ANALYSIS
Customer Behavior Analysis is the process of examining how customers interact with a product, service, or platform to understand their actions, preferences, and decision-making processes.

We have a dataset that captures the behavior of e-commerce customers. The dataset contains the following columns:

User_ID: Unique identifier for each customer.

Gender: Gender of the customer (e.g., Male, Female).

Age: Age of the customer.

Location: Location of the customer.

Device_Type: Type of device used for browsing (e.g., Mobile, Tablet, Desktop).

Product_Browsing_Time: Amount of time spent browsing products (in minutes).

Total_Pages_Viewed: Total number of pages viewed during the browsing session.

Items_Added_to_Cart: Number of items added to the shopping cart.

Total_Purchases: Total number of purchases made.


Your task involves:

Understanding the distribution and characteristics of customer demographics (e.g., age, gender, location).

Exploring how different types of devices are used by customers and their impact on behavior.

Investigating the relationship between browsing time, pages viewed, items added to the cart, and actual purchases.

Segmenting customers based on their behavior and identifying distinct customer groups.

Analyzing the customer journey and identifying potential areas for improvement in the conversion funnel.

Assessing the impact of customer behavior on revenue generation and identifying opportunities for increasing sales and customer engagement.

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [6]:
#Reading the data
data = pd.read_csv("/content/drive/MyDrive/CUSTOMER BEHAVIOUR ANALYSIS/ecommerce_customer_data.csv")
print(data.head())

   User_ID  Gender  Age   Location Device_Type  Product_Browsing_Time  \
0        1  Female   23  Ahmedabad      Mobile                     60   
1        2    Male   25    Kolkata      Tablet                     30   
2        3    Male   32  Bangalore     Desktop                     37   
3        4    Male   35      Delhi      Mobile                      7   
4        5    Male   27  Bangalore      Tablet                     35   

   Total_Pages_Viewed  Items_Added_to_Cart  Total_Purchases  
0                  30                    1                0  
1                  38                    9                4  
2                  13                    5                0  
3                  20                   10                3  
4                  20                    8                2  


In [5]:
# Summary statistics for numeric columns
numeric_summary = data.describe()
print(numeric_summary)

          User_ID         Age  Product_Browsing_Time  Total_Pages_Viewed  \
count  500.000000  500.000000             500.000000          500.000000   
mean   250.500000   26.276000              30.740000           27.182000   
std    144.481833    5.114699              15.934246           13.071596   
min      1.000000   18.000000               5.000000            5.000000   
25%    125.750000   22.000000              16.000000           16.000000   
50%    250.500000   26.000000              31.000000           27.000000   
75%    375.250000   31.000000              44.000000           38.000000   
max    500.000000   35.000000              60.000000           50.000000   

       Items_Added_to_Cart  Total_Purchases  
count           500.000000       500.000000  
mean              5.150000         2.464000  
std               3.203127         1.740909  
min               0.000000         0.000000  
25%               2.000000         1.000000  
50%               5.000000         2.00

In [7]:
# Summary statistics for non-numeric columns
categorial_summary = data.describe(include = "object")
print(categorial_summary)

       Gender Location Device_Type
count     500      500         500
unique      2        8           3
top      Male  Kolkata      Mobile
freq      261       71         178


In [11]:
#Lets see the distribution of age in dataset
#Histogram for age
fig  = px.histogram(data, x="Age", title="Distribution Of Age")
fig.show()

In [17]:
#Now Lets have a look at the Gender Distribution
#Bar Chart For Gender
gender_counts = data['Gender'].value_counts().reset_index()
gender_counts.columns=['Gender','Count']
fig = px.bar(gender_counts,x="Gender",y='Count', title = "Gender Distribution")
fig.show()

In [20]:
#Now, let’s have a look at the relationship between the product browsing time and the total pages viewed:
# 'Product_Browsing_Time' vs 'Total_Pages_Viewed'
fig = px.scatter(data, x="Product_Browsing_Time", y ="Total_Pages_Viewed", title = 'Product_Browsing_Time vs Total_Pages_Viewed', trendline = 'ols')
fig.show()

In [None]:
'''
The above scatter plot shows no consistent pattern or strong association between the time spent
 browsing products and the total number of pages viewed. It indicates that customers are not
  necessarily exploring more pages if they spend more time on the website, which might be due
   to various factors such as the website design, content relevance, or individual user preferences.
'''

In [None]:
#let’s have a look at the average total pages viewed by gender:
#Grouped Analysis
gender_grouped = data.groupby('Gender')['Total_Pages_Viewed'].mean().reset_index()
gender_grouped.columns = ['Gender','Average_Total_Pages_Viewed']
fig = px.bar(gender_grouped,x='Gender', y='Average_Total_Pages_Viewed', title = 'Average Total Pages Viewed by Gender')
fig.show()

In [25]:
#let’s have a look at the average total pages viewed by devices
devices_grouped = data.groupby('Device_Type')['Total_Pages_Viewed'].mean().reset_index()
devices_grouped.columns = ['Device_Type', 'Average_Total_Pages_Viewed']
fig = px.bar(devices_grouped, x='Device_Type', y='Average_Total_Pages_Viewed', title = 'Average Total Pages Viewed by Devices')
fig.show()

In [30]:
#let’s calculate the customer lifetime value and visualize segments based on the customer lifetime value
data['CLV'] = (data['Total_Purchases'] * data['Total_Pages_Viewed']) / data['Age']
data['Segment'] = pd.cut(data['CLV'], bins=[1,2.5,5, float('inf')],labels=['Low Value', "Medium Value", 'High Value'])
segment_counts = data['Segment'].value_counts().reset_index()
segment_counts.columns=['Segment', 'Counts']

#Creating Bar Chart For Visualize
fig = px.bar(segment_counts, x= 'Segment', y = 'Counts', title = 'Customer Segmentation By CLV')
fig.update_xaxes(title='Segment')
fig.update_yaxes(title='Number of Customers')
fig.show()


In [32]:
#let’s have a look at the conversion funnel of the customers
#Funnel Analysis
funnel_data = data[['Product_Browsing_Time', 'Items_Added_to_Cart', 'Total_Purchases']]
funnel_data = data.groupby(['Product_Browsing_Time', 'Items_Added_to_Cart']).sum().reset_index()
fig = px.funnel(funnel_data, x='Product_Browsing_Time', y ='Items_Added_to_Cart', title = 'Conversion Funnel')
fig.show()



The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



In [33]:
#let’s have a look at the churn rate of the customers
#Calculate the churn rate
data['Churned'] = data['Total_Purchases'] == 0
churn_rate = data['Churned'].mean()
print(churn_rate)

0.198


In [None]:
'''
A churn rate of 0.198 indicates that a significant portion of customers has churned,
and addressing this churn is important for maintaining business growth and profitability.
'''

In [None]:
'''
Summary
Customer Behavior Analysis is a process that involves examining and understanding
how customers interact with a business, product, or service. This analysis helps
 organizations make informed decisions, tailor their strategies, and enhance customer experiences.
'''