# Marketing Strategy Analysis Project

## Author
- Author: Thapelo Maloka

## Project Information
- Project Title: Marketing Strategy Analysis Project
- Data Source: Marketing Strategy (https://www.kaggle.com/datasets/whenamancodes/customer-personality-analysis)
- Date: June 30, 2023

## Introduction
Provide a brief introduction to the analysis project, outlining the objectives and purpose.

## Methodology
Explain the methodology used in your analysis, including any tools, techniques, or algorithms employed. If applicable, mention any specific statistical or data analysis methods used.

## Data Description
Describe the data used for the analysis, including its source, format, and any preprocessing steps performed. If necessary, mention the variables or fields used in the analysis.

## Results
Present the results of your analysis in a clear and concise manner. This section may include tables, charts, graphs, or any other visualizations that help convey the findings. Interpret and discuss the results, highlighting any key insights or observations.

## Discussion
Provide a discussion of the results, placing them in the context of the project objectives. Address any limitations or challenges encountered during the analysis and suggest areas for further investigation or improvement.

## Conclusion
Summarize the main findings of the analysis and their implications. Reiterate the project objectives and discuss the potential impact or applications of the results.

## References
List any references or resources used in the analysis project, including academic papers, online articles, or documentation.

---

Thank you for your interest in this analysis project. If you have any questions or feedback, please feel free to reach out to the author.


# Data Attributes

### People
- **ID**: Customer's unique identifier
- **Year_Birth**: Customer's birth year
- **Education**: Customer's education level
- **Marital_Status**: Customer's marital status
- **Income**: Customer's yearly household income
- **Kidhome**: Number of children in the customer's household
- **Teenhome**: Number of teenagers in the customer's household
- **Dt_Customer**: Date of customer's enrollment with the company
- **Recency**: Number of days since the customer's last purchase
- **Complain**: 1 if the customer complained in the last 2 years, 0 otherwise

### Products
- **MntWines**: Amount spent on wine in the last 2 years
- **MntFruits**: Amount spent on fruits in the last 2 years
- **MntMeatProducts**: Amount spent on meat in the last 2 years
- **MntFishProducts**: Amount spent on fish in the last 2 years
- **MntSweetProducts**: Amount spent on sweets in the last 2 years
- **MntGoldProds**: Amount spent on gold in the last 2 years

### Promotion
- **NumDealsPurchases**: Number of purchases made with a discount
- **AcceptedCmp1**: 1 if customer accepted the offer in the 1st campaign, 0 otherwise
- **AcceptedCmp2**: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
- **AcceptedCmp3**: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
- **AcceptedCmp4**: 1 if customer accepted the offer in the 4th campaign, 0 otherwise
- **AcceptedCmp5**: 1 if customer accepted the offer in the 5th campaign, 0 otherwise
- **Response**: 1 if customer accepted the offer in the last campaign, 0 otherwise

### Place
- **NumWebPurchases**: Number of purchases made through the company's website
- **NumCatalogPurchases**: Number of purchases made using a catalogue
- **NumStorePurchases**: Number of purchases made directly in stores
- **NumWebVisitsMonth**: Number of visits to the company's website in the last month

# Questions to Answer
- **Which channel has the highest average number of purchases?** 
- **What is the distribution of purchases across different channels (website, catalog, and store)?**
- **What is the overall response rate for the last campaign?**
- **What are the top product categories that customers spend on?**
- **What is the overall expenditure on different product categories?**
- **What is the average age of the customer base?**
- **What is the distribution of educational backgrounds among the customers?**
- **What is the average duration of customer relationships with the company?**

In [186]:
import pandas as pd

In [187]:
#Found the delimeter to be a \t instead of a ,
raw_data = pd.read_csv("marketing_campaign.csv", sep='\t')

In [188]:
#Allow to show all column in the data frame.
pd.set_option('display.max_columns', None)

#Show the 1st 5 rows
raw_data.head(5)

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,04-09-2012,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344.0,1,1,08-03-2014,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613.0,0,0,21-08-2013,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,3,11,0
3,6182,1984,Graduation,Together,26646.0,1,0,10-02-2014,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,3,11,0
4,5324,1981,PhD,Married,58293.0,1,0,19-01-2014,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,3,11,0


In [189]:
#Understand our data set
raw_data.describe()

Unnamed: 0,ID,Year_Birth,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
count,2240.0,2240.0,2216.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0
mean,5592.159821,1968.805804,52247.251354,0.444196,0.50625,49.109375,303.935714,26.302232,166.95,37.525446,27.062946,44.021875,2.325,4.084821,2.662054,5.790179,5.316518,0.072768,0.074554,0.072768,0.064286,0.013393,0.009375,3.0,11.0,0.149107
std,3246.662198,11.984069,25173.076661,0.538398,0.544538,28.962453,336.597393,39.773434,225.715373,54.628979,41.280498,52.167439,1.932238,2.778714,2.923101,3.250958,2.426645,0.259813,0.262728,0.259813,0.245316,0.114976,0.096391,0.0,0.0,0.356274
min,0.0,1893.0,1730.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
25%,2828.25,1959.0,35303.0,0.0,0.0,24.0,23.75,1.0,16.0,3.0,1.0,9.0,1.0,2.0,0.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
50%,5458.5,1970.0,51381.5,0.0,0.0,49.0,173.5,8.0,67.0,12.0,8.0,24.0,2.0,4.0,2.0,5.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
75%,8427.75,1977.0,68522.0,1.0,1.0,74.0,504.25,33.0,232.0,50.0,33.0,56.0,3.0,6.0,4.0,8.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
max,11191.0,1996.0,666666.0,2.0,2.0,99.0,1493.0,199.0,1725.0,259.0,263.0,362.0,15.0,27.0,28.0,13.0,20.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,11.0,1.0


# Data Cleaning
## Based on the questions we want to answer we do not need the following columns:
- **Z_CostContact**
- **Marital_Status**
- **NumDealsPurchases**
- **AcceptedCmp1**
- **AcceptedCmp2**
- **AcceptedCmp3**
- **AcceptedCmp4**
- **Complain**
- **Income**

In [190]:
#Drop columns
columns_to_drop = ['Z_CostContact', 'Marital_Status', 'NumDealsPurchases','AcceptedCmp1','AcceptedCmp2','AcceptedCmp3','AcceptedCmp4','Complain','Income', 'Z_Revenue']
raw_data = raw_data.drop(columns_to_drop, axis=1)

raw_data.head()

Unnamed: 0,ID,Year_Birth,Education,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp5,Response
0,5524,1957,Graduation,0,0,04-09-2012,58,635,88,546,172,88,88,8,10,4,7,0,1
1,2174,1954,Graduation,1,1,08-03-2014,38,11,1,6,2,1,6,1,1,2,5,0,0
2,4141,1965,Graduation,0,0,21-08-2013,26,426,49,127,111,21,42,8,2,10,4,0,0
3,6182,1984,Graduation,1,0,10-02-2014,26,11,4,20,10,3,5,2,0,4,6,0,0
4,5324,1981,PhD,1,0,19-01-2014,94,173,43,118,46,27,15,5,3,6,5,0,0


In [191]:
# Create the 'Age' column
raw_data['Age'] = 0

for index, value in raw_data['Age'].iteritems():
    raw_data.at[index, 'Age'] = 2023 - raw_data.at[index, 'Year_Birth']


  for index, value in raw_data['Age'].iteritems():


In [192]:
raw_data.head()

Unnamed: 0,ID,Year_Birth,Education,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp5,Response,Age
0,5524,1957,Graduation,0,0,04-09-2012,58,635,88,546,172,88,88,8,10,4,7,0,1,66
1,2174,1954,Graduation,1,1,08-03-2014,38,11,1,6,2,1,6,1,1,2,5,0,0,69
2,4141,1965,Graduation,0,0,21-08-2013,26,426,49,127,111,21,42,8,2,10,4,0,0,58
3,6182,1984,Graduation,1,0,10-02-2014,26,11,4,20,10,3,5,2,0,4,6,0,0,39
4,5324,1981,PhD,1,0,19-01-2014,94,173,43,118,46,27,15,5,3,6,5,0,0,42


In [193]:
# Reorder the columns by creating a new DataFrame with the desired column order
new_columns = ['ID', 'Year_Birth', 'Age', 'Education', 'Kidhome', 'Teenhome', 'Dt_Customer', 'Recency', 'MntWines', 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts', 'MntGoldProds', 'NumWebPurchases', 'NumCatalogPurchases', 'NumStorePurchases', 'NumWebVisitsMonth', 'AcceptedCmp5', 'Response']
raw_data = raw_data[new_columns]

In [194]:
raw_data.head()

Unnamed: 0,ID,Year_Birth,Age,Education,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp5,Response
0,5524,1957,66,Graduation,0,0,04-09-2012,58,635,88,546,172,88,88,8,10,4,7,0,1
1,2174,1954,69,Graduation,1,1,08-03-2014,38,11,1,6,2,1,6,1,1,2,5,0,0
2,4141,1965,58,Graduation,0,0,21-08-2013,26,426,49,127,111,21,42,8,2,10,4,0,0
3,6182,1984,39,Graduation,1,0,10-02-2014,26,11,4,20,10,3,5,2,0,4,6,0,0
4,5324,1981,42,PhD,1,0,19-01-2014,94,173,43,118,46,27,15,5,3,6,5,0,0


In [195]:
# Create the 'Num_Years_Customer' column
raw_data['Num_Years_Customer'] = 0

# Loop through each row in the DataFrame
for index, value in raw_data['Num_Years_Customer'].iteritems():
    raw_data.at[index, 'Num_Years_Customer'] =2023 - int(raw_data.at[index, 'Dt_Customer'][6:10])

# Reorder the columns by creating a new DataFrame with the desired column order
new_columns = ['ID', 'Year_Birth', 'Age', 'Education', 'Kidhome', 'Teenhome', 'Dt_Customer','Num_Years_Customer', 'Recency', 'MntWines', 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts', 'MntGoldProds', 'NumWebPurchases', 'NumCatalogPurchases', 'NumStorePurchases', 'NumWebVisitsMonth', 'AcceptedCmp5','Response']
raw_data = raw_data[new_columns]

  for index, value in raw_data['Num_Years_Customer'].iteritems():


In [196]:
raw_data.head()

Unnamed: 0,ID,Year_Birth,Age,Education,Kidhome,Teenhome,Dt_Customer,Num_Years_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp5,Response
0,5524,1957,66,Graduation,0,0,04-09-2012,11,58,635,88,546,172,88,88,8,10,4,7,0,1
1,2174,1954,69,Graduation,1,1,08-03-2014,9,38,11,1,6,2,1,6,1,1,2,5,0,0
2,4141,1965,58,Graduation,0,0,21-08-2013,10,26,426,49,127,111,21,42,8,2,10,4,0,0
3,6182,1984,39,Graduation,1,0,10-02-2014,9,26,11,4,20,10,3,5,2,0,4,6,0,0
4,5324,1981,42,PhD,1,0,19-01-2014,9,94,173,43,118,46,27,15,5,3,6,5,0,0


In [197]:
#Data types
raw_data.dtypes

ID                      int64
Year_Birth              int64
Age                     int64
Education              object
Kidhome                 int64
Teenhome                int64
Dt_Customer            object
Num_Years_Customer      int64
Recency                 int64
MntWines                int64
MntFruits               int64
MntMeatProducts         int64
MntFishProducts         int64
MntSweetProducts        int64
MntGoldProds            int64
NumWebPurchases         int64
NumCatalogPurchases     int64
NumStorePurchases       int64
NumWebVisitsMonth       int64
AcceptedCmp5            int64
Response                int64
dtype: object

In [198]:
# Check for any missing data
has_missing_data = raw_data.isnull().any().any()

# Print the result
if has_missing_data:
    print("The DataFrame contains missing data.")
else:
    print("The DataFrame does not contain missing data.")

The DataFrame does not contain missing data.


In [199]:
#Save the Cleaned data
raw_data.to_csv("CleanData.csv", sep=',', index=False)

# Read In the Clean Data

In [200]:
data = pd.read_csv("CleanData.csv")
data.head()

Unnamed: 0,ID,Year_Birth,Age,Education,Kidhome,Teenhome,Dt_Customer,Num_Years_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp5,Response
0,5524,1957,66,Graduation,0,0,04-09-2012,11,58,635,88,546,172,88,88,8,10,4,7,0,1
1,2174,1954,69,Graduation,1,1,08-03-2014,9,38,11,1,6,2,1,6,1,1,2,5,0,0
2,4141,1965,58,Graduation,0,0,21-08-2013,10,26,426,49,127,111,21,42,8,2,10,4,0,0
3,6182,1984,39,Graduation,1,0,10-02-2014,9,26,11,4,20,10,3,5,2,0,4,6,0,0
4,5324,1981,42,PhD,1,0,19-01-2014,9,94,173,43,118,46,27,15,5,3,6,5,0,0


In [201]:
#Describe my data

data.describe()

Unnamed: 0,ID,Year_Birth,Age,Kidhome,Teenhome,Num_Years_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp5,Response
count,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0
mean,5592.159821,1968.805804,54.194196,0.444196,0.50625,9.971875,49.109375,303.935714,26.302232,166.95,37.525446,27.062946,44.021875,4.084821,2.662054,5.790179,5.316518,0.072768,0.149107
std,3246.662198,11.984069,11.984069,0.538398,0.544538,0.684554,28.962453,336.597393,39.773434,225.715373,54.628979,41.280498,52.167439,2.778714,2.923101,3.250958,2.426645,0.259813,0.356274
min,0.0,1893.0,27.0,0.0,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2828.25,1959.0,46.0,0.0,0.0,10.0,24.0,23.75,1.0,16.0,3.0,1.0,9.0,2.0,0.0,3.0,3.0,0.0,0.0
50%,5458.5,1970.0,53.0,0.0,0.0,10.0,49.0,173.5,8.0,67.0,12.0,8.0,24.0,4.0,2.0,5.0,6.0,0.0,0.0
75%,8427.75,1977.0,64.0,1.0,1.0,10.0,74.0,504.25,33.0,232.0,50.0,33.0,56.0,6.0,4.0,8.0,7.0,0.0,0.0
max,11191.0,1996.0,130.0,2.0,2.0,11.0,99.0,1493.0,199.0,1725.0,259.0,263.0,362.0,27.0,28.0,13.0,20.0,1.0,1.0


In [202]:
data.head()

Unnamed: 0,ID,Year_Birth,Age,Education,Kidhome,Teenhome,Dt_Customer,Num_Years_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp5,Response
0,5524,1957,66,Graduation,0,0,04-09-2012,11,58,635,88,546,172,88,88,8,10,4,7,0,1
1,2174,1954,69,Graduation,1,1,08-03-2014,9,38,11,1,6,2,1,6,1,1,2,5,0,0
2,4141,1965,58,Graduation,0,0,21-08-2013,10,26,426,49,127,111,21,42,8,2,10,4,0,0
3,6182,1984,39,Graduation,1,0,10-02-2014,9,26,11,4,20,10,3,5,2,0,4,6,0,0
4,5324,1981,42,PhD,1,0,19-01-2014,9,94,173,43,118,46,27,15,5,3,6,5,0,0


# Answer Questions For the Markerting Department

### Which channel has the highest average number of purchases?

In [203]:
# Select three columns and calculate the average

NumWebPurchases = data['NumWebPurchases'].mean()
NumCatalogPurchases = data['NumCatalogPurchases'].mean()
NumStorePurchases = data['NumStorePurchases'].mean()

# Create a new DataFrame
AverageNumberOfPurchasesChannel = pd.DataFrame({
    'Channel': ['NumWebPurchases', 'NumCatalogPurchases', 'NumStorePurchases'],
    'Average': [NumWebPurchases, NumCatalogPurchases, NumStorePurchases]
})


AverageNumberOfPurchasesChannel.to_csv("AverageNumberOfPurchasesChannel.csv",index=False)
AverageNumberOfPurchasesChannel

Unnamed: 0,Channel,Average
0,NumWebPurchases,4.084821
1,NumCatalogPurchases,2.662054
2,NumStorePurchases,5.790179


### What is the distribution of purchases across different channels (website, catalog, and store)?

In [204]:
# Select three columns and calculate the average

NumWebPurchases = data['NumWebPurchases'].sum()
NumCatalogPurchases = data['NumCatalogPurchases'].sum()
NumStorePurchases = data['NumStorePurchases'].sum()

# Create a new DataFrame
DistributionOfPurchasesChannel = pd.DataFrame({
    'Channel': ['NumWebPurchases', 'NumCatalogPurchases', 'NumStorePurchases'],
    'TotalSales': [NumWebPurchases, NumCatalogPurchases, NumStorePurchases]
})


DistributionOfPurchasesChannel.to_csv("AverageNumberOfPurchasesChannel.csv",index=False)
DistributionOfPurchasesChannel

Unnamed: 0,Channel,TotalSales
0,NumWebPurchases,9150
1,NumCatalogPurchases,5963
2,NumStorePurchases,12970


### What is the overall response rate for the last campaign?

In [205]:
# Count the number of positive responses
positive_responses = raw_data['Response'].sum()

# Count the total number of customers
total_customers = len(raw_data)

# Calculate the response rate
response_rate = (positive_responses / total_customers) * 100

# Create a new DataFrame
ResponseRate = pd.DataFrame({
    'Response Rate': ['Response Rate For Last Campaign'],
    'Percentage': [response_rate] })

ResponseRate.to_csv("ResponseRateForLastCampaign.csv",index=False)

ResponseRate

Unnamed: 0,Response Rate,Percentage
0,Response Rate For Last Campaign,14.910714


In [206]:
data.head()

Unnamed: 0,ID,Year_Birth,Age,Education,Kidhome,Teenhome,Dt_Customer,Num_Years_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp5,Response
0,5524,1957,66,Graduation,0,0,04-09-2012,11,58,635,88,546,172,88,88,8,10,4,7,0,1
1,2174,1954,69,Graduation,1,1,08-03-2014,9,38,11,1,6,2,1,6,1,1,2,5,0,0
2,4141,1965,58,Graduation,0,0,21-08-2013,10,26,426,49,127,111,21,42,8,2,10,4,0,0
3,6182,1984,39,Graduation,1,0,10-02-2014,9,26,11,4,20,10,3,5,2,0,4,6,0,0
4,5324,1981,42,PhD,1,0,19-01-2014,9,94,173,43,118,46,27,15,5,3,6,5,0,0


### What are the top product categories that customers spend on?

In [207]:
# Select three columns and calculate the sum

Wines = data['MntWines'].sum()
Fruits = data['MntFruits'].sum()
Meat = data['MntMeatProducts'].sum()
Fish = data['MntFishProducts'].sum()
Sweets = data['MntSweetProducts'].sum()
Gold = data['MntGoldProds'].sum()

# Create a new DataFrame
TopProductSpentOn = pd.DataFrame({
    'Product Category': ['Wines', 'Fruits', 'Meat','Fish', 'Sweet', 'Gold'],
    'TotalSales': [Wines, Fruits, Meat,Fish, Sweets, Gold]
})


TopProductSpentOn.to_csv("TopProductSpentOn.csv",index=False)
TopProductSpentOn

Unnamed: 0,Product Category,TotalSales
0,Wines,680816
1,Fruits,58917
2,Meat,373968
3,Fish,84057
4,Sweet,60621
5,Gold,98609


### What is the overall expenditure on different product categories?

In [208]:

# Count the total number of customers
total_customers = len(data)

# Calculate the Expenditure rate of different products
ExpenditureRateWines = (Wines / total_customers) * 100

# Create a new DataFrame
ResponseRate = pd.DataFrame({
    'Response Rate': ['Response Rate For Last Campaign'],
    'Percentage': [response_rate] })

ResponseRate.to_csv("ResponseRateForLastCampaign.csv",index=False)

ResponseRate

Unnamed: 0,Response Rate,Percentage
0,Response Rate For Last Campaign,14.910714


In [209]:
# Calculate the expenditure for each product category
expenditure_wines = data['MntWines'].sum()
expenditure_fruits = data['MntFruits'].sum()
expenditure_meat = data['MntMeatProducts'].sum()
expenditure_fish = data['MntFishProducts'].sum()
expenditure_sweets = data['MntSweetProducts'].sum()
expenditure_gold = data['MntGoldProds'].sum()

# Create a DataFrame to store the overall expenditure
expenditure_data = {
    'Product Category': ['Wines', 'Fruits', 'Meat Products', 'Fish Products', 'Sweet Products', 'Gold Products'],
    'Overall Expenditure': [expenditure_wines, expenditure_fruits, expenditure_meat, expenditure_fish, expenditure_sweets, expenditure_gold]
}

expenditure_df = pd.DataFrame(expenditure_data)
expenditure_df.to_csv('ExpenditureRateProduct.csv',index=False)
expenditure_df

Unnamed: 0,Product Category,Overall Expenditure
0,Wines,680816
1,Fruits,58917
2,Meat Products,373968
3,Fish Products,84057
4,Sweet Products,60621
5,Gold Products,98609


### What is the average age of the customer base?

In [218]:
# Create the AverageCustomerAge
AverageAgeOfCustomerBase = data['Age'].mean()

# Create a DataFrame to store the AverageCustomerAge
AverageCustomerAgeData = {
    'Average Customer Age': [AverageAgeOfCustomerBase]
    
}

AverageCustomerAge =pd.DataFrame(AverageCustomerAgeData)

AverageCustomerAge.to_csv('AverageCustomerAge.csv')

AverageCustomerAge

Unnamed: 0,Average Customer Age
0,54.194196


In [217]:
data.head()

Unnamed: 0,ID,Year_Birth,Age,Education,Kidhome,Teenhome,Dt_Customer,Num_Years_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp5,Response
0,5524,1957,66,Graduation,0,0,04-09-2012,11,58,635,88,546,172,88,88,8,10,4,7,0,1
1,2174,1954,69,Graduation,1,1,08-03-2014,9,38,11,1,6,2,1,6,1,1,2,5,0,0
2,4141,1965,58,Graduation,0,0,21-08-2013,10,26,426,49,127,111,21,42,8,2,10,4,0,0
3,6182,1984,39,Graduation,1,0,10-02-2014,9,26,11,4,20,10,3,5,2,0,4,6,0,0
4,5324,1981,42,PhD,1,0,19-01-2014,9,94,173,43,118,46,27,15,5,3,6,5,0,0


### What is the distribution of educational backgrounds among the customers?

In [216]:
#Count distinct values
education_distribution = data['Education'].value_counts()
EducationDistribution = pd.DataFrame(education_distribution).reset_index()
EducationDistribution.columns = ['Education Type', 'Count']

# Save the dataframe
EducationDistribution.to_csv("EducationDistribution.csv", index=False)


EducationDistribution



Unnamed: 0,Education Type,Count
0,Graduation,1127
1,PhD,486
2,Master,370
3,2n Cycle,203
4,Basic,54


### What is the average duration of customer relationships with the company?

In [219]:
#Calculate the Average Duration as Customer in Years
AverageDurationAsCustomer = data["Num_Years_Customer"].mean()

#Store it in a dataframe

AverageDurationAsCustomerInYears = {
    'Average Customer Duration In Years': [AverageDurationAsCustomer]
    
}
AverageDurationAsCustomerInYears = pd.DataFrame(AverageDurationAsCustomerInYears)
#Save the dataframe
AverageDurationAsCustomerInYears.to_csv("AverageDurationAsCustomerInYears.csv")
AverageDurationAsCustomerInYears




Unnamed: 0,Average Customer Duration In Years
0,9.971875
