# 📈 Logistic Regression Analysis

This notebook fits a logistic regression model to estimate the treatment effect of ad exposure on user conversion, while controlling for total ad volume and exposure time.

## Goals:
- Model `converted` as binary outcome
- Include treatment group, total ads, day/hour exposure
- Interpret coefficients via odds ratios

In [4]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load data
df = pd.read_csv('marketing_AB.csv')
df['converted'] = df['converted'].astype(int)

In [5]:
import statsmodels.api as sm

# One-hot encode test_group and day
df = pd.get_dummies(df, columns=['test group', 'most ads day'], drop_first=True)

# Define features and target
X = df.drop(columns=['user id', 'converted'])  # drop ID and target
X = sm.add_constant(X)  # add intercept
y = df['converted']

# Fit logistic regression
model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

# Optional: exponentiated coefficients
print("\nOdds Ratios:")
print(np.exp(result.params).sort_values(ascending=False))

Optimization terminated successfully.
         Current function value: 0.106392
         Iterations 8
                           Logit Regression Results                           
Dep. Variable:              converted   No. Observations:               588101
Model:                          Logit   Df Residuals:                   588090
Method:                           MLE   Df Model:                           10
Date:                Sat, 10 May 2025   Pseudo R-squ.:                 0.09670
Time:                        16:35:35   Log-Likelihood:                -62569.
converged:                       True   LL-Null:                       -69267.
Covariance Type:            nonrobust   LLR p-value:                     0.000
                             coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------
const                     -4.0206      0.040   -100.567      0.000      -4.099      

## Logistic Regression Interpretation

We fit a logistic regression model to estimate the impact of ad exposure (vs. PSA) on user conversion, while controlling for total ads seen and time-related covariates.

### Key Results:

- **Ad vs. PSA (test group psa)**: The coefficient for `test group psa` is -0.4126 (p < 0.001), indicating that being in the PSA group is associated with a significantly lower likelihood of conversion. The corresponding odds ratio is **0.662**, meaning PSA users are 33.8% less likely to convert compared to ad users, holding other variables constant.

- **Total Ads**: Each additional ad seen increases the odds of conversion by about **0.8%** (odds ratio ≈ 1.008). This suggests a mild but consistent effect of ad volume on conversion.

- **Time Factors (Hour, Day)**:
    - Users who saw the most ads on **Monday** or **Tuesday** were significantly more likely to convert than those exposed most on the base day (Friday), with odds ratios of **1.64** and **1.56** respectively.
    - **Hour of max exposure** also has a small but significant positive effect (odds ratio ≈ 1.034).

### Conclusion:

The ad campaign is statistically effective after controlling for exposure quantity and timing. Conversion likelihood is influenced both by **being in the ad group** and **when/how many ads** a user sees. These results justify the use of logistic regression as a follow-up to the z-test, allowing adjustment for key covariates.