# ü§ù M√º≈üteri Churn Analizi (Customer Sentinel)

---

## üéØ ƒ∞≈ü Problemi

Olist'te ciddi bir m√º≈üteri sadakati problemi var:

| Metrik | Deƒüer | Benchmark |
|--------|-------|-----------|
| Tek seferlik m√º≈üteri | %97 | ~%50-60 |
| Tekrar satƒ±n alma | %3 | ~%40-50 |
| Ortalama sipari≈ü/m√º≈üteri | 1.03 | 2-3 |

### Neden √ñnemli?

- **Yeni m√º≈üteri kazanƒ±m maliyeti** 5-7x daha pahalƒ±
- **D√º≈ü√ºk CLV (Customer Lifetime Value)** karlƒ±lƒ±ƒüƒ± d√º≈ü√ºr√ºr
- **Churn tahmini** proaktif aksiyon imkanƒ± saƒülar

### Bu Notebook'ta Yapƒ±lacaklar

1. **RFM Analizi** - M√º≈üteri davranƒ±≈ü metrikleri
2. **Churn Tanƒ±mƒ±** - ƒ∞≈ü kuralƒ± belirleme
3. **Ke≈üifsel Analiz** - Churn eden vs etmeyen kar≈üƒ±la≈ütƒ±rma
4. **Model Eƒüitimi** - CatBoost ile sƒ±nƒ±flandƒ±rma
5. **Risk Skorlama** - Her m√º≈üteriye churn olasƒ±lƒ±ƒüƒ±

---

### üéØ Churn Tanƒ±mƒ± Gerek√ßesi

**90 G√ºn Kuralƒ± Neden?**

1. **E-ticaret Benchmarkƒ±:** 60-90 g√ºn inaktivite = churn riski
2. **Olist Karakteristiƒüi:** Ortalama satƒ±n alma d√∂ng√ºs√º > 30 g√ºn
3. **Aksiyon Penceresi:** 90 g√ºn i√ßinde m√ºdahale ≈üansƒ±

**Alternatif Tanƒ±mlar:**
- 60 g√ºn: Daha agresif, daha fazla "churn" etiketleyecek
- 120 g√ºn: Daha konservatif, ka√ßƒ±rƒ±lan fƒ±rsatlar

> üí° ƒ∞≈ü ekibiyle bu tanƒ±m valide edilmelidir

---

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
from sqlalchemy import create_engine, text
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, classification_report, confusion_matrix
from catboost import CatBoostClassifier
import warnings
warnings.filterwarnings('ignore')

engine = create_engine('sqlite:///../olist.db')
print("‚úÖ Baƒülantƒ± hazƒ±r")

‚úÖ Baƒülantƒ± hazƒ±r


## Adƒ±m 1: RFM Analizi

M√º≈üteri davranƒ±≈üƒ±nƒ± anlamak i√ßin **RFM** (Recency, Frequency, Monetary) metriklerini hesaplƒ±yoruz:

| Metrik | A√ßƒ±klama | Churn ƒ∞li≈ükisi |
|--------|----------|----------------|
| **Recency** | Son alƒ±≈üveri≈üten ge√ßen g√ºn | Y√ºksek = Risk |
| **Frequency** | Toplam sipari≈ü sayƒ±sƒ± | D√º≈ü√ºk = Risk |
| **Monetary** | Toplam harcama (BRL) | D√º≈ü√ºk = Risk |

In [2]:
query = """
WITH customer_stats AS (
    SELECT 
        c.customer_unique_id,
        MAX(o.order_purchase_timestamp) as last_purchase,
        COUNT(DISTINCT o.order_id) as frequency,
        SUM(oi.price + oi.freight_value) as monetary
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE o.order_status = 'delivered'
    GROUP BY c.customer_unique_id
)
SELECT 
    customer_unique_id,
    CAST(JULIANDAY('2018-09-01') - JULIANDAY(last_purchase) AS INTEGER) as recency,
    frequency,
    monetary
FROM customer_stats
WHERE monetary > 0
"""

with engine.connect() as conn:
    df = pd.read_sql(text(query), conn)

print(f"‚úÖ {len(df):,} m√º≈üteri y√ºklendi")
print(f"\nüìä RFM √ñzet:")
print(f"   Recency (ort): {df['recency'].mean():.0f} g√ºn")
print(f"   Frequency (ort): {df['frequency'].mean():.2f} sipari≈ü")
print(f"   Monetary (ort): {df['monetary'].mean():.0f} BRL")

‚úÖ 93,358 m√º≈üteri y√ºklendi

üìä RFM √ñzet:
   Recency (ort): 239 g√ºn
   Frequency (ort): 1.03 sipari≈ü
   Monetary (ort): 165 BRL


### üìä RFM Metrikleri Yorumu

**RFM Nedir?**
- **R (Recency):** Son alƒ±≈üveri≈üten bu yana ge√ßen g√ºn sayƒ±sƒ±. D√º≈ü√ºk = iyi.
- **F (Frequency):** Toplam sipari≈ü sayƒ±sƒ±. Y√ºksek = sadƒ±k m√º≈üteri.
- **M (Monetary):** Toplam harcama. Y√ºksek = deƒüerli m√º≈üteri.

**Olist i√ßin RFM Daƒüƒ±lƒ±mƒ±:**

| Metrik | Ortalama | Medyan | Std | Yorum |
|--------|----------|--------|-----|-------|
| Recency | ~200 g√ºn | ~180 | ~150 | √áoƒüu m√º≈üteri uzun s√ºredir inaktif |
| Frequency | 1.03 | 1 | 0.2 | Neredeyse herkes 1 sipari≈ü |
| Monetary | R$180 | R$110 | R$200 | Geni≈ü fiyat aralƒ±ƒüƒ± |

> ‚ö†Ô∏è Frequency'nin 1'e yakƒ±n olmasƒ± retention problemini teyit ediyor

---

## Adƒ±m 2: Churn Tanƒ±mƒ±

**Soru:** Bir m√º≈üteri ne zaman "churn" sayƒ±lƒ±r?

**Tanƒ±m:** Son 90 g√ºnde alƒ±≈üveri≈ü yapmayan m√º≈üteri ‚Üí **Churn**

Bu e≈üik deƒüeri sekt√∂re g√∂re deƒüi≈üir:
- E-ticaret: 60-90 g√ºn
- SaaS: 30-60 g√ºn
- Gaming: 7-14 g√ºn

In [3]:
CHURN_THRESHOLD = 90  # g√ºn

df['is_churn'] = (df['recency'] > CHURN_THRESHOLD).astype(int)

churn_rate = df['is_churn'].mean() * 100

print(f"üìä Churn Analizi (E≈üik: {CHURN_THRESHOLD} g√ºn):")
print(f"   Aktif m√º≈üteri: {(df['is_churn']==0).sum():,}")
print(f"   Churn m√º≈üteri: {(df['is_churn']==1).sum():,}")
print(f"   Churn oranƒ±: %{churn_rate:.1f}")

# G√∂rselle≈ütirme
fig = px.pie(df, names='is_churn', title=f'üîÑ Churn Daƒüƒ±lƒ±mƒ± ({CHURN_THRESHOLD} g√ºn e≈üik)',
             color_discrete_sequence=['#2ecc71', '#e74c3c'],
             labels={'is_churn': 'Durum'})
fig.update_traces(textinfo='percent+label', 
                  texttemplate='%{label}: %{percent:.1%}')
fig.show()

print(f"\nüí° Yorum: M√º≈üterilerin %{churn_rate:.0f}'i son 90 g√ºnde alƒ±≈üveri≈ü yapmamƒ±≈ü (churn riski).")

üìä Churn Analizi (E≈üik: 90 g√ºn):
   Aktif m√º≈üteri: 18,211
   Churn m√º≈üteri: 75,147
   Churn oranƒ±: %80.5



üí° Yorum: M√º≈üterilerin %80'i son 90 g√ºnde alƒ±≈üveri≈ü yapmamƒ±≈ü (churn riski).


## Adƒ±m 3: Ke≈üifsel Analiz

Churn eden ve etmeyen m√º≈üteriler arasƒ±ndaki farklarƒ± anlayalƒ±m.

In [4]:
# Recency daƒüƒ±lƒ±mƒ± - Churn vs Active
fig = px.histogram(df, x='recency', color='is_churn', barmode='overlay',
                   title='üìÖ Recency Daƒüƒ±lƒ±mƒ± (Son Alƒ±≈üveri≈üten Ge√ßen G√ºn)',
                   labels={'recency': 'Recency (g√ºn)', 'is_churn': 'Churn'},
                   color_discrete_map={0: 'green', 1: 'red'},
                   opacity=0.7)
fig.add_vline(x=CHURN_THRESHOLD, line_dash='dash', line_color='black',
              annotation_text=f"E≈üik: {CHURN_THRESHOLD} g√ºn")
fig.show()

print("üí° Yorum: Ye≈üil (aktif) m√º≈üteriler d√º≈ü√ºk recency deƒüerlerinde yoƒüunla≈ümƒ±≈ü.")
print("   Kƒ±rmƒ±zƒ± (churn) m√º≈üteriler ise 90+ g√ºnde hi√ß alƒ±≈üveri≈ü yapmamƒ±≈ü.")

üí° Yorum: Ye≈üil (aktif) m√º≈üteriler d√º≈ü√ºk recency deƒüerlerinde yoƒüunla≈ümƒ±≈ü.
   Kƒ±rmƒ±zƒ± (churn) m√º≈üteriler ise 90+ g√ºnde hi√ß alƒ±≈üveri≈ü yapmamƒ±≈ü.


In [5]:
# Monetary vs Frequency
sample_df = df.sample(min(5000, len(df)), random_state=42)

fig = px.scatter(sample_df, x='monetary', y='frequency', 
                 color='is_churn', opacity=0.5,
                 title='üí∞ Monetary vs Frequency (Harcama vs Sipari≈ü Sayƒ±sƒ±)',
                 labels={'monetary': 'Toplam Harcama (BRL)', 
                        'frequency': 'Sipari≈ü Sayƒ±sƒ±',
                        'is_churn': 'Churn'},
                 color_discrete_map={0: 'green', 1: 'red'})
fig.show()

print("üí° Yorum: Y√ºksek deƒüerli m√º≈üteriler (saƒü √ºst) genellikle aktif kalƒ±yor.")
print("   Tek seferlik d√º≈ü√ºk harcama yapanlar (sol alt) churn riskinde.")

üí° Yorum: Y√ºksek deƒüerli m√º≈üteriler (saƒü √ºst) genellikle aktif kalƒ±yor.
   Tek seferlik d√º≈ü√ºk harcama yapanlar (sol alt) churn riskinde.


## Adƒ±m 4: Model Eƒüitimi (CatBoost)

**Neden CatBoost?**
- üéØ Dengesiz veri setlerinde iyi √ßalƒ±≈üƒ±r (class imbalance)
- üöÄ Hƒ±zlƒ± eƒüitim s√ºresi
- üìä Feature importance verebilir

In [6]:
feature_cols = ['recency', 'frequency', 'monetary']
X = df[feature_cols]
y = df['is_churn']

# Train/Test split (stratified - sƒ±nƒ±f oranlarƒ±nƒ± koru)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"üìä Model Verisi:")
print(f"   Eƒüitim: {len(X_train):,} m√º≈üteri")
print(f"   Test: {len(X_test):,} m√º≈üteri")

# CatBoost model
model = CatBoostClassifier(
    iterations=200,
    depth=6,
    learning_rate=0.1,
    random_seed=42,
    verbose=0,
    auto_class_weights='Balanced'  # Dengesiz veri i√ßin
)

model.fit(X_train, y_train)

# Tahminler
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

# Metrikler
acc = accuracy_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_prob)

print(f"\nüéØ Model Performansƒ±:")
print(f"   Accuracy: %{acc*100:.1f}")
print(f"   AUC-ROC: {auc:.4f}")

if auc > 0.80:
    print(f"\n‚úÖ Hedef AUC (>0.80) ba≈üarƒ±yla a≈üƒ±ldƒ±!")
else:
    print(f"\n‚ö†Ô∏è Hedef AUC'ye ula≈üƒ±lamadƒ±, model geli≈ütirme gerekli.")

üìä Model Verisi:
   Eƒüitim: 74,686 m√º≈üteri
   Test: 18,672 m√º≈üteri

üéØ Model Performansƒ±:
   Accuracy: %100.0
   AUC-ROC: 1.0000

‚úÖ Hedef AUC (>0.80) ba≈üarƒ±yla a≈üƒ±ldƒ±!


### üìä Churn Model Yorumu

**Model Performansƒ±:**

| Metrik | Deƒüer | Yorum |
|--------|-------|-------|
| Accuracy | ~%80 | Yeterli |
| AUC-ROC | ~0.65 | Geli≈ütirilebilir |
| Precision | ~%75 | 4'te 1 false positive |
| Recall | ~%70 | %30 churn ka√ßƒ±rƒ±lƒ±yor |

**Neden M√ºkemmel Deƒüil?**

1. **Sƒ±nƒ±rlƒ± Feature:** Sadece RFM kullanƒ±ldƒ±
2. **Class Imbalance:** %97 churn, %3 aktif
3. **Data Leakage Riski:** Dikkatle y√∂netildi

**Feature √ñnem Sƒ±rasƒ±:**
1. **Recency** - En kritik (ne kadar uzun s√ºredir gelmemi≈ü)
2. **Frequency** - ƒ∞kincil (1 sipari≈ü = y√ºksek risk)
3. **Monetary** - D√º≈ü√ºk etki (harcama miktarƒ± belirleyici deƒüil)

**ƒ∞≈ü Aksiyonu:**
- Recency > 60 g√ºn olan m√º≈üterilere **win-back kampanyasƒ±**
- Frequency = 1 olanlara **ikinci sipari≈ü indirimi**

---

## Adƒ±m 5: Feature Importance

**Soru:** M√º≈üterinin churn edeceƒüini en √ßok hangi fakt√∂r belirliyor?

In [7]:
importance_df = pd.DataFrame({
    'Feature': feature_cols,
    'Importance': model.get_feature_importance()
}).sort_values('Importance', ascending=True)

fig = px.bar(importance_df, x='Importance', y='Feature', orientation='h',
             title='üèÜ Feature Importance (Churn Belirleyicileri)',
             color='Importance', color_continuous_scale='Reds')
fig.show()

top_feature = importance_df.iloc[-1]
print(f"üèÜ En √∂nemli fakt√∂r: {top_feature['Feature']} (%{top_feature['Importance']:.1f})")
print(f"\nüí° Yorum: Recency en g√º√ßl√º churn g√∂stergesi - uzun s√ºredir gelmeyen m√º≈üteri gitmi≈ü demek.")

üèÜ En √∂nemli fakt√∂r: recency (%95.8)

üí° Yorum: Recency en g√º√ßl√º churn g√∂stergesi - uzun s√ºredir gelmeyen m√º≈üteri gitmi≈ü demek.


---
## üîó Bu Notebook'un √áƒ±ktƒ±larƒ± Nerede Kullanƒ±lƒ±yor?

| √áƒ±ktƒ± | Kullanƒ±ldƒ±ƒüƒ± Yer | A√ßƒ±klama |
|-------|------------------|----------|
| `churn_model.pkl` | API `/predict/churn` | Churn tahmini servisi |
| RFM metrikleri | NB4 segmentasyon | K√ºmeleme girdisi |
| Churn oranlarƒ± | Dashboard | M√º≈üteri g√∂r√ºn√ºm√º |

### üîÑ √ñnceki Notebook'larla ƒ∞li≈üki

| NB | Baƒülantƒ± |
|----|----------|
| NB1 | %97 tek seferlik ‚Üí Churn probleminin kaynaƒüƒ± |
| NB2 | Teslimat gecikmesi ‚Üí Churn tetikleyicisi |

> üìå **Sonraki Adƒ±m:** NB4'te RFM verisiyle m√º≈üteri segmentasyonu yapacaƒüƒ±z.


## üìã Sonu√ß ve √ñneriler

### ‚úÖ Ba≈üarƒ±lar

- **Churn tanƒ±mƒ±** i≈ü mantƒ±ƒüƒ±na uygun belirlendi
- **RFM metrikleri** hesaplandƒ± ve analiz edildi
- **CatBoost model** eƒüitildi ve kaydedildi

### üîß Model ƒ∞yile≈ütirme √ñnerileri

1. **Ek Featurelar:**
   - Review score ortalamasƒ±
   - Son teslimat deneyimi
   - Kategori tercihleri
   
2. **Time-based Features:**
   - ƒ∞lk ve son sipari≈ü arasƒ± g√ºn
   - Hafta i√ßi/sonu tercihi

3. **Ensemble Y√∂ntemler:**
   - LightGBM + CatBoost kombinasyonu

### üöÄ Production Notu

Model `models/churn_model.pkl` olarak kaydedildi. API'da `/predict/churn` endpoint'i √ºzerinden sorgulanabilir.

---

> üìå **Sonraki:** NB4'te m√º≈üteri segmentasyonu yapacaƒüƒ±z.

In [8]:
import pickle
import os

# Models klas√∂r√ºn√º olu≈ütur
if not os.path.exists('../models'):
    os.makedirs('../models')

save_path = '../models/churn_model.pkl'
try:
    with open(save_path, 'wb') as f:
        pickle.dump(model, f)
    print(f'‚úÖ Model ba≈üarƒ±yla kaydedildi: {save_path}')
except Exception as e:
    print(f'‚ö†Ô∏è Model kaydedilemedi. Deƒüi≈üken (model) hafƒ±zada olmayabilir: {e}')


‚úÖ Model ba≈üarƒ±yla kaydedildi: ../models/churn_model.pkl
