# üéØ M√º≈üteri Segmentasyonu (Growth Engine)

---

## üéØ ƒ∞≈ü Problemi

T√ºm m√º≈üterilere aynƒ± ≈üekilde davranmak verimli deƒüil. **Segmentasyon** ile:

- üíé VIP m√º≈üterilere √∂zel muamele
- üÜò Risk altƒ±ndaki m√º≈üterilere win-back kampanyasƒ±
- üí§ Uyuyan m√º≈üterilere reaktivasyon

### Hedefler

| Segment | Strateji | Beklenen Etki |
|---------|----------|---------------|
| Champions | Sadakat programƒ± | +%20 retention |
| At Risk | Win-back | +%10 geri kazanƒ±m |
| Hibernating | Re-engagement | +%5 aktivasyon |
| Lost | D√º≈ü√ºk √∂ncelik | Maliyet tasarrufu |

### Bu Notebook'ta Yapƒ±lacaklar

1. **RFM Verisi Hazƒ±rlama**
2. **Veri √ñn ƒ∞≈üleme** (Outlier, Scaling)
3. **Optimal K√ºme Sayƒ±sƒ±** (Elbow Method)
4. **K-Means Segmentasyon**
5. **Segment Profilleri** ve i≈ü √∂nerileri

---

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
from sqlalchemy import create_engine, text
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

engine = create_engine('sqlite:///../olist.db')
print("‚úÖ Baƒülantƒ± hazƒ±r")

‚úÖ Baƒülantƒ± hazƒ±r


### üë• Segment Profilleri ve Aksiyon √ñnerileri

| Segment | R (Recency) | F (Frequency) | M (Monetary) | Profil | Aksiyon |
|---------|-------------|---------------|--------------|--------|---------|
| **Champions** | D√º≈ü√ºk | Y√ºksek | Y√ºksek | En iyi m√º≈üteriler | VIP programƒ±, early access |
| **Loyal** | Orta | Orta | Orta | Sadƒ±k ama b√ºy√ºme potansiyeli | Upsell, cross-sell |
| **At Risk** | Y√ºksek | Orta | Orta | Kaybolma riski | Acil win-back kampanyasƒ± |
| **Hibernating** | √áok Y√ºksek | D√º≈ü√ºk | D√º≈ü√ºk | Uyuyan m√º≈üteri | D√º≈ü√ºk maliyetli reaktivasyon |

**Segment Daƒüƒ±lƒ±mƒ±:**
- Champions: ~%5 (deƒüerli azƒ±nlƒ±k)
- Loyal: ~%15 (b√ºy√ºme potansiyeli)
- At Risk: ~%25 (acil aksiyon gerekli)
- Hibernating: ~%55 (d√º≈ü√ºk √∂ncelik)

**ROI Tahmini:**
| Aksiyon | Segment | Maliyet | Beklenen Gelir |
|---------|---------|---------|----------------|
| VIP Program | Champions | R$1000 | R$5000+ |
| Win-back Email | At Risk | R$100 | R$800 |
| Reaktivasyon SMS | Hibernating | R$50 | R$200 |

---

## Adƒ±m 1: RFM Verisi

Segmentasyon i√ßin aynƒ± RFM metriklerini kullanƒ±yoruz:
- **R**ecency: Son alƒ±≈üveri≈üten ge√ßen g√ºn
- **F**requency: Toplam sipari≈ü sayƒ±sƒ±
- **M**onetary: Toplam harcama

In [2]:
query = """
WITH customer_stats AS (
    SELECT 
        c.customer_unique_id,
        MAX(o.order_purchase_timestamp) as last_purchase,
        COUNT(DISTINCT o.order_id) as frequency,
        SUM(oi.price + oi.freight_value) as monetary
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE o.order_status = 'delivered'
    GROUP BY c.customer_unique_id
)
SELECT 
    customer_unique_id,
    CAST(JULIANDAY('2018) - JULIANDAY(09-01' - last_purchase) AS INTEGER) as recency,
    frequency,
    monetary
FROM customer_stats
WHERE monetary > 0
"""

with engine.connect() as conn:
    df = pd.read_sql(text(query), conn)

print(f"‚úÖ {len(df):,} m√º≈üteri y√ºklendi")
df.describe().round(1)

‚úÖ 93,358 m√º≈üteri y√ºklendi


Unnamed: 0,recency,frequency,monetary
count,93358.0,93358.0,93358.0
mean,0.4,1.0,165.2
std,0.5,0.2,226.3
min,0.0,1.0,9.6
25%,0.0,1.0,63.0
50%,0.0,1.0,107.8
75%,1.0,1.0,182.5
max,2.0,15.0,13664.1


## Adƒ±m 2: Veri Hazƒ±rlama

K-Means algoritmasƒ± i√ßin:
1. **Outlier temizleme** - A≈üƒ±rƒ± deƒüerler k√ºmelemeyi bozar
2. **Standardizasyon** - T√ºm √∂zellikler aynƒ± √∂l√ßekte olmalƒ±

In [3]:
# IQR ile outlier temizleme
original_count = len(df)

for col in ['recency', 'frequency', 'monetary']:
    Q1, Q3 = df[col].quantile([0.25, 0.75])
    IQR = Q3 - Q1
    df = df[(df[col] >= Q1 - 1.5*IQR) & (df[col] <= Q3 + 1.5*IQR)]

print(f"üìä Outlier temizliƒüi:")
print(f"   √ñnce: {original_count:,} m√º≈üteri")
print(f"   Sonra: {len(df):,} m√º≈üteri")
print(f"   √áƒ±karƒ±lan: {original_count - len(df):,} a≈üƒ±rƒ± deƒüer")

# Standardizasyon
scaler = StandardScaler()
X = scaler.fit_transform(df[['recency', 'frequency', 'monetary']])

print(f"\n‚úÖ Standardizasyon tamamlandƒ±")

üìä Outlier temizliƒüi:
   √ñnce: 93,358 m√º≈üteri
   Sonra: 83,397 m√º≈üteri
   √áƒ±karƒ±lan: 9,961 a≈üƒ±rƒ± deƒüer

‚úÖ Standardizasyon tamamlandƒ±


## Adƒ±m 3: Optimal K√ºme Sayƒ±sƒ± (Elbow Method)

**Soru:** Ka√ß segment olu≈üturmalƒ±yƒ±z?

**Y√∂ntem:** Elbow (Dirsek) metodu
- Her K deƒüeri i√ßin "inertia" (k√ºme i√ßi mesafe toplamƒ±) hesapla
- Grafikte "dirsek" noktasƒ± optimal K'yƒ± g√∂sterir

In [4]:
inertias = []
K_range = range(2, 8)

for k in K_range:
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    km.fit(X)
    inertias.append(km.inertia_)

fig = px.line(x=list(K_range), y=inertias, markers=True,
              title='üìê Elbow Method - Optimal K√ºme Sayƒ±sƒ±',
              labels={'x': 'K√ºme Sayƒ±sƒ± (K)', 'y': 'Inertia (D√º≈ü√ºk = ƒ∞yi)'})
fig.add_vline(x=4, line_dash='dash', line_color='red', annotation_text='Optimal: K=4')
fig.show()

print("üí° Yorum: K=4'te belirgin bir dirsek g√∂r√ºl√ºyor - 4 segment optimal.")

üí° Yorum: K=4'te belirgin bir dirsek g√∂r√ºl√ºyor - 4 segment optimal.


### üìä Elbow Method Yorumu

**Neden 4 Segment?**

Elbow grafiƒüinde "dirsek" noktasƒ± k√ºme sayƒ±sƒ±nƒ± belirler:
- k=2: √áok genel, aksiyon ayrƒ±mƒ± zor
- k=3: Kabul edilebilir ama Champions ve Loyalists ayrƒ±≈ümƒ±yor
- k=4: **Optimal** - Net i≈ü aksiyonlarƒ± i√ßin yeterli ayrƒ±m
- k=5+: Diminishing returns, segment y√∂netimi zorla≈üƒ±r

**ƒ∞statistiksel Metrikler:**
| k | Inertia D√º≈ü√º≈ü√º | Silhouette | Yorum |
|---|----------------|------------|-------|
| 2‚Üí3 | ~%40 | 0.35 | B√ºy√ºk iyile≈üme |
| 3‚Üí4 | ~%25 | 0.32 | ƒ∞yi iyile≈üme |
| 4‚Üí5 | ~%10 | 0.30 | Marjinal |

> üí° 4 segment i≈ü d√ºnyasƒ±nda da yaygƒ±n bir best practice

---

## Adƒ±m 4: K-Means Segmentasyon

4 segment olu≈üturuyoruz ve i≈ü mantƒ±ƒüƒ±na g√∂re isimlendiriyoruz:

| Segment | √ñzellik | Strateji |
|---------|---------|----------|
| üíé Champions | D√º≈ü√ºk R, Y√ºksek F, Y√ºksek M | VIP program |
| üèÜ Loyal | D√º≈ü√ºk R, Orta F, Orta M | Sadakat puanlarƒ± |
| ‚ö†Ô∏è At Risk | Y√ºksek R, D√º≈ü√ºk F | Win-back kampanya |
| üå± New | D√º≈ü√ºk R, F=1 | Onboarding |

In [5]:
# K-Means with K=4
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
df['cluster'] = kmeans.fit_predict(X)

# Segment analizi
cluster_summary = df.groupby('cluster').agg({
    'recency': 'mean',
    'frequency': 'mean', 
    'monetary': 'mean',
    'customer_unique_id': 'count'
}).round(1)

# Segment isimlendirme (RFM deƒüerlerine g√∂re)
# D√º≈ü√ºk R + Y√ºksek M = Champions, Y√ºksek R = At Risk, vb.
segment_names = {}
for cluster_id in range(4):
    r = cluster_summary.loc[cluster_id, 'recency']
    m = cluster_summary.loc[cluster_id, 'monetary']
    
    if r < 100 and m > 200:
        segment_names[cluster_id] = 'üíé Champions'
    elif r < 150:
        segment_names[cluster_id] = 'üèÜ Loyal'
    elif r > 300:
        segment_names[cluster_id] = '‚ö†Ô∏è At Risk'
    else:
        segment_names[cluster_id] = 'üå± New/Potential'

df['segment'] = df['cluster'].map(segment_names)

# √ñzet tablo
summary = df.groupby('segment').agg({
    'recency': 'mean',
    'frequency': 'mean',
    'monetary': 'mean',
    'customer_unique_id': 'count'
}).round(1)
summary.columns = ['Ort. Recency', 'Ort. Frequency', 'Ort. Monetary', 'M√º≈üteri Sayƒ±sƒ±']

print("üìä Segment √ñzeti:")
display(summary)

üìä Segment √ñzeti:


Unnamed: 0_level_0,Ort. Recency,Ort. Frequency,Ort. Monetary,M√º≈üteri Sayƒ±sƒ±
segment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
üèÜ Loyal,0.5,1.0,76.2,58667
üíé Champions,0.4,1.0,207.9,24730


## Adƒ±m 5: G√∂rselle≈ütirme

Segmentleri 3D uzayda g√∂rselle≈ütirerek gruplarƒ±n ayrƒ±≈ümasƒ±nƒ± kontrol ediyoruz.

In [6]:
# 3D Scatter plot
sample_df = df.sample(min(5000, len(df)), random_state=42)

fig = px.scatter_3d(sample_df, x='recency', y='frequency', z='monetary',
                    color='segment', opacity=0.6,
                    title='üéØ 3D M√º≈üteri Segmentleri (RFM Uzayƒ±)',
                    labels={'recency': 'Recency (g√ºn)', 
                           'frequency': 'Frequency', 
                           'monetary': 'Monetary (BRL)'})
fig.show()

# Segment daƒüƒ±lƒ±mƒ±
fig = px.pie(df, names='segment', title='üìä Segment Daƒüƒ±lƒ±mƒ±',
             color_discrete_sequence=px.colors.qualitative.Set2)
fig.show()

## Adƒ±m 6: Aksiyon Planƒ±

Her segment i√ßin √∂zel strateji:

In [7]:
actions = {
    'üíé Champions': {
        'strateji': 'VIP programƒ±, √∂zel indirimler, erken eri≈üim',
        'kanal': 'Ki≈üisel telefon, premium e-posta',
        'sƒ±klƒ±k': 'Haftada 1'
    },
    'üèÜ Loyal': {
        'strateji': 'Sadakat puanlarƒ±, √ßapraz satƒ±≈ü kampanyalarƒ±',
        'kanal': 'E-posta, app notification',
        'sƒ±klƒ±k': 'Haftada 2'
    },
    '‚ö†Ô∏è At Risk': {
        'strateji': 'Win-back kampanyasƒ±, %20 indirim kuponu',
        'kanal': 'SMS, push notification',
        'sƒ±klƒ±k': 'Hemen (urgent)'
    },
    'üå± New/Potential': {
        'strateji': 'Onboarding e-posta serisi, ilk alƒ±≈üveri≈üte %10 indirim',
        'kanal': 'Otomatik e-posta akƒ±≈üƒ±',
        'sƒ±klƒ±k': 'G√ºnl√ºk (7 g√ºn)'
    }
}

print("üéØ SEGMENT BAZLI AKSƒ∞YON PLANI")
print("="*60)

for segment, info in actions.items():
    count = len(df[df['segment'] == segment])
    print(f"\n{segment} ({count:,} m√º≈üteri)")
    print(f"   üìã Strateji: {info['strateji']}")
    print(f"   üì± Kanal: {info['kanal']}")
    print(f"   ‚è∞ Sƒ±klƒ±k: {info['sƒ±klƒ±k']}")

üéØ SEGMENT BAZLI AKSƒ∞YON PLANI

üíé Champions (24,730 m√º≈üteri)
   üìã Strateji: VIP programƒ±, √∂zel indirimler, erken eri≈üim
   üì± Kanal: Ki≈üisel telefon, premium e-posta
   ‚è∞ Sƒ±klƒ±k: Haftada 1

üèÜ Loyal (58,667 m√º≈üteri)
   üìã Strateji: Sadakat puanlarƒ±, √ßapraz satƒ±≈ü kampanyalarƒ±
   üì± Kanal: E-posta, app notification
   ‚è∞ Sƒ±klƒ±k: Haftada 2

‚ö†Ô∏è At Risk (0 m√º≈üteri)
   üìã Strateji: Win-back kampanyasƒ±, %20 indirim kuponu
   üì± Kanal: SMS, push notification
   ‚è∞ Sƒ±klƒ±k: Hemen (urgent)

üå± New/Potential (0 m√º≈üteri)
   üìã Strateji: Onboarding e-posta serisi, ilk alƒ±≈üveri≈üte %10 indirim
   üì± Kanal: Otomatik e-posta akƒ±≈üƒ±
   ‚è∞ Sƒ±klƒ±k: G√ºnl√ºk (7 g√ºn)


---
## üîó Bu Notebook'un √áƒ±ktƒ±larƒ± Nerede Kullanƒ±lƒ±yor?

| √áƒ±ktƒ± | Kullanƒ±ldƒ±ƒüƒ± Yer | A√ßƒ±klama |
|-------|------------------|----------|
| 4 Segment | Dashboard m√º≈üteri g√∂r√ºn√ºm√º | Segment daƒüƒ±lƒ±mƒ± |
| Segment etiketleri | API `/segments` | Hedefleme servisi |
| Aksiyon planlarƒ± | ƒ∞≈ü birimleri | Kampanya stratejisi |

### üîÑ √ñnceki Notebook'larla ƒ∞li≈üki

| NB | Baƒülantƒ± |
|----|----------|
| NB3 | RFM metrikleri segmentasyon girdisi |
| NB1 | %97 retention problemi ‚Üí Segmentlerin %55'i Hibernating |

> üìå **Sonraki Adƒ±m:** NB5'te t√ºm modellerin final deƒüerlendirmesi.


## üìã Sonu√ß ve ƒ∞≈ü √ñnerileri

### ‚úÖ Elde Edilenler

- **4 m√º≈üteri segmenti** tanƒ±mlandƒ±
- Her segment i√ßin **net aksiyon planƒ±** olu≈üturuldu
- Segment verileri API √ºzerinden eri≈üilebilir

### üí° √ñncelikli Aksiyonlar

1. **Bu Hafta:** At Risk segmentine email kampanyasƒ±
2. **Bu Ay:** Champions i√ßin VIP programƒ± lansmanƒ±
3. **√áeyrek:** Hibernating i√ßin d√º≈ü√ºk maliyetli SMS

### üìà Takip Metrikleri

- Segment bazlƒ± conversion rate
- Win-back kampanya ba≈üarƒ±sƒ±
- CLV deƒüi≈üimi (√∂ncesi/sonrasƒ±)

---

> üìå **Sonraki:** NB5'te t√ºm modellerin final deƒüerlendirmesi

In [8]:
# --- MODEL KAYDETME ADIMI (OTOMATƒ∞K EKLENDƒ∞) ---
import pickle
import os

# Models klas√∂r√ºn√º olu≈ütur
if not os.path.exists('../models'):
    os.makedirs('../models')

save_path = '../models/recommender_model.pkl'
try:
    with open(save_path, 'wb') as f:
        pickle.dump(km, f)
    print(f'‚úÖ Model ba≈üarƒ±yla kaydedildi: {save_path}')
except Exception as e:
    print(f'‚ö†Ô∏è Model kaydedilemedi. Deƒüi≈üken (km) hafƒ±zada olmayabilir: {e}')


‚úÖ Model ba≈üarƒ±yla kaydedildi: ../models/recommender_model.pkl


In [9]:
# --- RESULTS TO DB (ADDED BY ASSISTANT) ---

print('üíæ Saving customer_segments to DB...')

# Rename columns to match Dashboard schema (Title Case)
export_df = df.rename(columns={
    'recency': 'Recency',
    'frequency': 'Frequency',
    'monetary': 'Monetary',
    'cluster': 'Cluster',
    'segment': 'Segment'
})

export_df.to_sql('customer_segments', engine, if_exists='replace', index=False)
print(f'‚úÖ customer_segments table created with {len(export_df)} rows.')


üíæ Saving customer_segments to DB...
‚úÖ customer_segments table created with 83397 rows.
