# Statistical analysis of marketing campaign response.

This mini-project performs statistical hypothesis testing to determine whether
differences in campaign response between customer groups are statistically significant.


## Question

Are high-spending customers significantly more likely to respond to the marketing campaign
than low-spending customers?

This will be tested using a two-proportion z-test.


In [8]:
import pandas as pd

df = pd.read_csv("marketing_campaign.csv", sep=";")
df.head()


Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,4/9/2012,58,635,...,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344.0,1,1,8/3/2014,38,11,...,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613.0,0,0,21/8/2013,26,426,...,4,0,0,0,0,0,0,3,11,0
3,6182,1984,Graduation,Together,26646.0,1,0,10/2/2014,26,11,...,6,0,0,0,0,0,0,3,11,0
4,5324,1981,PhD,Married,58293.0,1,0,19/1/2014,94,173,...,5,0,0,0,0,0,0,3,11,0


In [9]:
# Spending segmentation from previous project (3)
df["TotalSpending"] = (
    df["MntWines"] +
    df["MntFruits"] +
    df["MntMeatProducts"] +
    df["MntFishProducts"] +
    df["MntSweetProducts"] +
    df["MntGoldProds"]
)

df["SpendingSegment"] = pd.qcut(df["TotalSpending"], 3, labels=["Low", "Medium", "High"])


In [10]:
low_group = df[df["SpendingSegment"] == "Low"]
high_group = df[df["SpendingSegment"] == "High"]

success_counts = [
    low_group["Response"].sum(),
    high_group["Response"].sum()
]

total_counts = [
    len(low_group),
    len(high_group)
]

success_counts, total_counts


([np.int64(56), np.int64(186)], [748, 747])

## Hypothesis testing

To determine whether the difference in campaign response rates between
high-spending and low-spending customers is statistically significant,
a two-proportion z-test is performed.

- Null hypothesis (H₀): The response rates of high-spending and low-spending customers are equal.
- Alternative hypothesis (H₁): High-spending customers have a higher response rate than low-spending customers.


In [11]:
from statsmodels.stats.proportion import proportions_ztest

z_stat, p_value = proportions_ztest(success_counts, total_counts)
z_stat, p_value


(np.float64(-9.139475312578059), np.float64(6.275606096572023e-20))

The two-proportion z-test shows a statistically significant difference in campaign
response rates between high-spending and low-spending customers (z = −9.14, p < 0.001).

Τhe null hypothesis is rejected, providing strong evidence that
high-spending customers are significantly more likely to respond to the marketing campaign
than low-spending customers.



## Conclusions

This mini-project employed statistical hypothesis testing  to determine if differences in response to a marketing campaign between various categories of customers are statistically impacting.

The results of the analysis indicate that higher value customers respond significantly more than lower value customers (25 percent vs 7.5 percent ). The two-proportion z-test performed shows this difference is statistically significant ( p < 0.001), meaning there is practically no chance that the difference can be attributed to chance.

These results strongly support the conclusion that customer value is an important factor influencing responsiveness to marketing efforts.