# League Match Rate Analysis

This notebook analyzes match rates across different leagues in the matchmaker system. We'll explore:
- How women in each league match with men across different leagues
- How men in each league match with women across different leagues
- Cross-league matching patterns and visualizations

**Note:** Uses GPU-accelerated cudf for fast processing. A match is defined as a **reciprocal like** (both users like each other).

### Setup and Data Loading

In [1]:
from matchmaker import MatchingEngine
import cudf
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Initialize the matching engine
engine = MatchingEngine()

In [3]:
# Load interactions with gender information
engine.load_interactions(
    "data/swipes_clean.csv", 
    decider_col='decidermemberid',
    other_col='othermemberid', 
    like_col='like', 
    timestamp_col='timestamp',
    gender_col='decidergender'
)

Reading data... ✅
Constructing graph... ✅
Constructing graph... ✅
Fitting ALS... 
🚀 Preparing data...
✅
Fitting ALS... 
🚀 Preparing data...
🎯 Training male→female ALS...
🎯 Training male→female ALS...


100%|██████████| 15/15 [00:00<00:00, 20.02it/s]
100%|██████████| 15/15 [00:00<00:00, 20.02it/s]


🎯 Training female→male ALS...


100%|██████████| 15/15 [00:00<00:00, 280.42it/s]



🔄 Converting factors to CuPy arrays...
✅ Trained M2F ALS with 33173 males × 33358 females
✅ Trained F2M ALS with 10882 females × 44241 males
Complete! ✅


In [4]:
# Compute engagement scores
engine.run_engagement()

User DF updated ✅


In [5]:
# Compute popularity metrics and assign leagues
engine.run_popularity()

User DF updated ✅


In [6]:
# Keep data in cudf for GPU-accelerated processing
user_gdf = engine.user_df
interaction_gdf = engine.interaction_df

print(f"Total users: {len(user_gdf)}")
print(f"Total interactions: {len(interaction_gdf)}")
print(f"\nLeague distribution:")
league_counts = user_gdf['league'].value_counts().to_pandas().sort_index()
print(league_counts)

Total users: 171012
Total interactions: 9827888

League distribution:
league
Bronze      27452
Diamond      9152
Gold        18301
Platinum    18302
Silver      18303
Name: count, dtype: int64


### Prepare Match Data

A "match" occurs when both users **reciprocally like** each other. We'll identify all matches using GPU-accelerated cudf operations.

In [7]:
# Filter to likes only (using cudf for speed)
likes_gdf = interaction_gdf[interaction_gdf['like'] == 1][['decidermemberid', 'othermemberid']].copy()

# A match occurs when:
# - User A likes User B (decidermemberid=A, othermemberid=B)
# - User B likes User A (decidermemberid=B, othermemberid=A)

# Self-join to find reciprocal likes
matches_gdf = likes_gdf.merge(
    likes_gdf,
    left_on=['decidermemberid', 'othermemberid'],
    right_on=['othermemberid', 'decidermemberid'],
    how='inner',
    suffixes=('_1', '_2')
)

# Keep only unique pairs (avoid counting A->B and B->A as separate matches)
# Always put the smaller user_id first
matches_gdf['user1'] = cudf.Series.min(
    cudf.concat([matches_gdf['decidermemberid_1'], matches_gdf['othermemberid_1']], axis=1),
    axis=1
)
matches_gdf['user2'] = cudf.Series.max(
    cudf.concat([matches_gdf['decidermemberid_1'], matches_gdf['othermemberid_1']], axis=1),
    axis=1
)

# Remove duplicates
matches_gdf = matches_gdf[['user1', 'user2']].drop_duplicates()

total_unique_likes = len(likes_gdf)
print(f"Total likes: {total_unique_likes}")
print(f"Total reciprocal matches: {len(matches_gdf)}")
print(f"Match rate: {len(matches_gdf) / total_unique_likes * 100:.2f}%")

Total likes: 3399637
Total reciprocal matches: 29774
Match rate: 0.88%


In [8]:
# Merge with user data to get gender and league info (all in cudf)
user_info_gdf = user_gdf[['user_id', 'gender', 'league']].dropna(subset=['league'])

# Merge user1 info
matches_gdf = matches_gdf.merge(
    user_info_gdf.rename(columns={'user_id': 'user1', 'gender': 'gender1', 'league': 'league1'}),
    on='user1',
    how='inner'
)

# Merge user2 info
matches_gdf = matches_gdf.merge(
    user_info_gdf.rename(columns={'user_id': 'user2', 'gender': 'gender2', 'league': 'league2'}),
    on='user2',
    how='inner'
)

print(f"Matches with league information: {len(matches_gdf)}")
print(f"\nGender combinations in matches:")
matches_gdf['gender_combo'] = matches_gdf['gender1'] + '-' + matches_gdf['gender2']
gender_combo_counts = matches_gdf['gender_combo'].value_counts().to_pandas()
print(gender_combo_counts)

Matches with league information: 29772

Gender combinations in matches:
gender_combo
M-F    15098
F-M    14674
Name: count, dtype: int64


### Cross-League Match Analysis

Now let's analyze how different leagues match with each other, separated by gender.

In [9]:
# Define league order for consistent display
league_order = ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond']

# Filter to heterosexual matches and create standardized format
# Ensure we have female_league and male_league columns
hetero_f_m = matches_gdf[(matches_gdf['gender1'] == 'F') & (matches_gdf['gender2'] == 'M')][['league1', 'league2']].copy()
hetero_f_m.columns = ['female_league', 'male_league']

hetero_m_f = matches_gdf[(matches_gdf['gender1'] == 'M') & (matches_gdf['gender2'] == 'F')][['league2', 'league1']].copy()
hetero_m_f.columns = ['female_league', 'male_league']

# Combine and deduplicate (still in cudf)
all_hetero_gdf = cudf.concat([hetero_f_m, hetero_m_f], ignore_index=True).drop_duplicates()

print(f"Unique heterosexual matches: {len(all_hetero_gdf)}")

# Convert to pandas only for visualization (crosstab not in cudf)
all_hetero_matches = all_hetero_gdf.to_pandas()

Unique heterosexual matches: 25


In [11]:
# Men's perspective: Men in X league matched with women in Y league
men_match_matrix = pd.crosstab(
    all_hetero_matches['male_league'], 
    all_hetero_matches['female_league'],
    margins=True,
    margins_name='Total'
)

# Reorder to standard league order
men_match_matrix = men_match_matrix.reindex(
    index=league_order + ['Total'], 
    columns=league_order + ['Total'],
    fill_value=0
)

print("Men's Match Matrix (rows=male league, cols=female league):")
print(men_match_matrix)

Men's Match Matrix (rows=male league, cols=female league):
female_league  Bronze  Silver  Gold  Platinum  Diamond  Total
male_league                                                  
Bronze              1       1     1         1        1      5
Silver              1       1     1         1        1      5
Gold                1       1     1         1        1      5
Platinum            1       1     1         1        1      5
Diamond             1       1     1         1        1      5
Total               5       5     5         5        5     25


In [None]:
# Create cross-tabulation matrices for match counts

# Women's perspective: Women in X league matched with men in Y league
women_match_matrix = pd.crosstab(
    all_hetero_matches['female_league'], 
    all_hetero_matches['male_league'],
    margins=True,
    margins_name='Total'
)

# Reorder to standard league order
women_match_matrix = women_match_matrix.reindex(
    index=league_order + ['Total'], 
    columns=league_order + ['Total'],
    fill_value=0
)

print("Women's Match Matrix (rows=female league, cols=male league):")
print(women_match_matrix)

In [None]:
# Calculate percentage distributions for women
women_match_pct = women_match_matrix.iloc[:-1, :-1].div(
    women_match_matrix.iloc[:-1, -1], 
    axis=0
) * 100

# Calculate percentage distributions for men
men_match_pct = men_match_matrix.iloc[:-1, :-1].div(
    men_match_matrix.iloc[:-1, -1], 
    axis=0
) * 100

print("Women's Match Distribution (% of matches for women in each league):")
print(women_match_pct.round(1))
print("\nMen's Match Distribution (% of matches for men in each league):")
print(men_match_pct.round(1))

### Percentage-based Analysis

Convert counts to percentages to better understand the patterns.

### Visualization 1: Heatmaps of Cross-League Matches

In [None]:
# Create heatmap for women's matches
fig_women = go.Figure(data=go.Heatmap(
    z=women_match_pct.values,
    x=league_order,
    y=league_order,
    colorscale='Viridis',
    text=women_match_pct.round(1).values,
    texttemplate='%{text}%',
    textfont={"size": 12},
    colorbar=dict(title="% of Matches")
))

fig_women.update_layout(
    title={
        'text': 'Women\'s Cross-League Match Distribution<br><sub>Percentage of matches by female league (rows) with male league (columns)</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Male League",
    yaxis_title="Female League",
    width=700,
    height=600,
    font=dict(size=12)
)

fig_women.show()

In [None]:
# Create heatmap for men's matches
fig_men = go.Figure(data=go.Heatmap(
    z=men_match_pct.values,
    x=league_order,
    y=league_order,
    colorscale='Plasma',
    text=men_match_pct.round(1).values,
    texttemplate='%{text}%',
    textfont={"size": 12},
    colorbar=dict(title="% of Matches")
))

fig_men.update_layout(
    title={
        'text': 'Men\'s Cross-League Match Distribution<br><sub>Percentage of matches by male league (rows) with female league (columns)</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Female League",
    yaxis_title="Male League",
    width=700,
    height=600,
    font=dict(size=12)
)

fig_men.show()

### Visualization 2: Absolute Match Counts

In [None]:
# Heatmap with absolute counts for women
fig_women_count = go.Figure(data=go.Heatmap(
    z=women_match_matrix.iloc[:-1, :-1].values,
    x=league_order,
    y=league_order,
    colorscale='Blues',
    text=women_match_matrix.iloc[:-1, :-1].values,
    texttemplate='%{text}',
    textfont={"size": 12},
    colorbar=dict(title="Match Count")
))

fig_women_count.update_layout(
    title={
        'text': 'Women\'s Cross-League Match Counts<br><sub>Number of matches by female league (rows) with male league (columns)</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Male League",
    yaxis_title="Female League",
    width=700,
    height=600,
    font=dict(size=12)
)

fig_women_count.show()

In [None]:
# Heatmap with absolute counts for men
fig_men_count = go.Figure(data=go.Heatmap(
    z=men_match_matrix.iloc[:-1, :-1].values,
    x=league_order,
    y=league_order,
    colorscale='Oranges',
    text=men_match_matrix.iloc[:-1, :-1].values,
    texttemplate='%{text}',
    textfont={"size": 12},
    colorbar=dict(title="Match Count")
))

fig_men_count.update_layout(
    title={
        'text': 'Men\'s Cross-League Match Counts<br><sub>Number of matches by male league (rows) with female league (columns)</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Female League",
    yaxis_title="Male League",
    width=700,
    height=600,
    font=dict(size=12)
)

fig_men_count.show()

### Visualization 3: Stacked Bar Charts

In [None]:
# Stacked bar chart for women
fig_women_bar = go.Figure()

for male_league in league_order:
    fig_women_bar.add_trace(go.Bar(
        name=f'Men: {male_league}',
        x=league_order,
        y=women_match_pct[male_league],
        text=women_match_pct[male_league].round(1),
        texttemplate='%{text}%',
        textposition='inside'
    ))

fig_women_bar.update_layout(
    barmode='stack',
    title={
        'text': 'Women\'s Match Distribution by League<br><sub>How women in each league match with men across leagues</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Female League",
    yaxis_title="Percentage of Matches (%)",
    width=900,
    height=600,
    legend_title="Male Partner League"
)

fig_women_bar.show()

In [None]:
# Stacked bar chart for men
fig_men_bar = go.Figure()

for female_league in league_order:
    fig_men_bar.add_trace(go.Bar(
        name=f'Women: {female_league}',
        x=league_order,
        y=men_match_pct[female_league],
        text=men_match_pct[female_league].round(1),
        texttemplate='%{text}%',
        textposition='inside'
    ))

fig_men_bar.update_layout(
    barmode='stack',
    title={
        'text': 'Men\'s Match Distribution by League<br><sub>How men in each league match with women across leagues</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Male League",
    yaxis_title="Percentage of Matches (%)",
    width=900,
    height=600,
    legend_title="Female Partner League"
)

fig_men_bar.show()

### Visualization 4: Sankey Diagram - Flow of Matches

In [None]:
# Create Sankey diagram showing match flows between leagues
# Prepare data for Sankey
sankey_data = []

for f_league in league_order:
    for m_league in league_order:
        count = women_match_matrix.loc[f_league, m_league]
        if count > 0:
            sankey_data.append({
                'source': f'Women: {f_league}',
                'target': f'Men: {m_league}',
                'value': int(count)
            })

sankey_df = pd.DataFrame(sankey_data)

# Create node labels
source_nodes = [f'Women: {league}' for league in league_order]
target_nodes = [f'Men: {league}' for league in league_order]
all_nodes = source_nodes + target_nodes

# Create node indices
node_dict = {node: idx for idx, node in enumerate(all_nodes)}

# Map sources and targets to indices
sankey_df['source_idx'] = sankey_df['source'].map(node_dict)
sankey_df['target_idx'] = sankey_df['target'].map(node_dict)

# Create color mapping for leagues
league_colors = {
    'Bronze': 'rgba(205, 127, 50, 0.6)',
    'Silver': 'rgba(192, 192, 192, 0.6)',
    'Gold': 'rgba(255, 215, 0, 0.6)',
    'Platinum': 'rgba(229, 228, 226, 0.6)',
    'Diamond': 'rgba(185, 242, 255, 0.6)'
}

node_colors = [league_colors[league] for league in league_order] + \
              [league_colors[league] for league in league_order]

# Create Sankey diagram
fig_sankey = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=20,
        line=dict(color="black", width=0.5),
        label=all_nodes,
        color=node_colors
    ),
    link=dict(
        source=sankey_df['source_idx'],
        target=sankey_df['target_idx'],
        value=sankey_df['value'],
        color='rgba(0,0,0,0.2)'
    )
)])

fig_sankey.update_layout(
    title={
        'text': "Cross-League Match Flow (Women → Men)<br><sub>Flow diagram showing how women in each league match with men across leagues</sub>",
        'x': 0.5,
        'xanchor': 'center'
    },
    font=dict(size=12),
    width=1000,
    height=600
)

fig_sankey.show()

### Visualization 5: League Preference Analysis

In [None]:
# Calculate "dating up/down" statistics using cudf for performance
# For each league, what % of matches are with same league, higher league, or lower league?

def calculate_league_direction_cudf(gdf, user_league_col, partner_league_col):
    """Calculate whether matches are same league, up, or down using cudf."""
    league_rank = {league: i for i, league in enumerate(league_order)}
    
    # Add rank columns
    gdf['user_rank'] = gdf[user_league_col].map(league_rank)
    gdf['partner_rank'] = gdf[partner_league_col].map(league_rank)
    
    results = []
    for user_league in league_order:
        subset = gdf[gdf[user_league_col] == user_league]
        if len(subset) == 0:
            continue
            
        same = len(subset[subset['user_rank'] == subset['partner_rank']])
        up = len(subset[subset['partner_rank'] > subset['user_rank']])
        down = len(subset[subset['partner_rank'] < subset['user_rank']])
        total = len(subset)
        
        results.append({
            'league': user_league,
            'Same League': same / total * 100,
            'Dating Up': up / total * 100,
            'Dating Down': down / total * 100
        })
    
    return pd.DataFrame(results)

# Process using cudf
women_direction = calculate_league_direction_cudf(all_hetero_gdf.copy(), 'female_league', 'male_league')
men_direction = calculate_league_direction_cudf(all_hetero_gdf.copy(), 'male_league', 'female_league')

print("Women's match direction:")
print(women_direction.round(1))
print("\nMen's match direction:")
print(men_direction.round(1))

In [None]:
# Stacked bar chart for women's dating direction
fig_women_dir = go.Figure()

fig_women_dir.add_trace(go.Bar(
    name='Dating Down',
    x=women_direction['league'],
    y=women_direction['Dating Down'],
    marker_color='lightcoral'
))

fig_women_dir.add_trace(go.Bar(
    name='Same League',
    x=women_direction['league'],
    y=women_direction['Same League'],
    marker_color='lightblue'
))

fig_women_dir.add_trace(go.Bar(
    name='Dating Up',
    x=women_direction['league'],
    y=women_direction['Dating Up'],
    marker_color='lightgreen'
))

fig_women_dir.update_layout(
    barmode='stack',
    title={
        'text': 'Women: Match Direction by League<br><sub>Percentage of matches with same league, higher league, or lower league partners</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Female League",
    yaxis_title="Percentage of Matches (%)",
    width=900,
    height=500
)

fig_women_dir.show()

In [None]:
# Stacked bar chart for men's dating direction
fig_men_dir = go.Figure()

fig_men_dir.add_trace(go.Bar(
    name='Dating Down',
    x=men_direction['league'],
    y=men_direction['Dating Down'],
    marker_color='lightcoral'
))

fig_men_dir.add_trace(go.Bar(
    name='Same League',
    x=men_direction['league'],
    y=men_direction['Same League'],
    marker_color='lightblue'
))

fig_men_dir.add_trace(go.Bar(
    name='Dating Up',
    x=men_direction['league'],
    y=men_direction['Dating Up'],
    marker_color='lightgreen'
))

fig_men_dir.update_layout(
    barmode='stack',
    title={
        'text': 'Men: Match Direction by League<br><sub>Percentage of matches with same league, higher league, or lower league partners</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Male League",
    yaxis_title="Percentage of Matches (%)",
    width=900,
    height=500
)

fig_men_dir.show()

### Key Insights Summary

In [None]:
# Generate summary statistics
print("=" * 80)
print("CROSS-LEAGUE MATCHING INSIGHTS (Reciprocal Likes)")
print("=" * 80)

# Overall match distribution
print("\n1. OVERALL MATCH DISTRIBUTION")
print(f"   Total reciprocal matches analyzed: {len(all_hetero_matches)}")
print(f"   Total users with leagues: {len(user_info_gdf)}")

# Same-league matching rate
same_league_matches = len(all_hetero_matches[
    all_hetero_matches['female_league'] == all_hetero_matches['male_league']
])
print(f"\n2. SAME-LEAGUE MATCHES")
print(f"   Same-league matches: {same_league_matches} ({same_league_matches/len(all_hetero_matches)*100:.1f}%)")

# League mobility
print("\n3. LEAGUE MOBILITY (Women)")
for _, row in women_direction.iterrows():
    print(f"   {row['league']:8} - Same: {row['Same League']:5.1f}%, Up: {row['Dating Up']:5.1f}%, Down: {row['Dating Down']:5.1f}%")

print("\n4. LEAGUE MOBILITY (Men)")
for _, row in men_direction.iterrows():
    print(f"   {row['league']:8} - Same: {row['Same League']:5.1f}%, Up: {row['Dating Up']:5.1f}%, Down: {row['Dating Down']:5.1f}%")

# Top cross-league pairs
print("\n5. TOP CROSS-LEAGUE COMBINATIONS")
cross_league = all_hetero_matches[
    all_hetero_matches['female_league'] != all_hetero_matches['male_league']
].copy()
cross_league['combo'] = cross_league['female_league'] + ' ♀ + ' + cross_league['male_league'] + ' ♂'
top_combos = cross_league['combo'].value_counts().head(10)
for combo, count in top_combos.items():
    pct = count / len(all_hetero_matches) * 100
    print(f"   {combo:30} - {count:4} matches ({pct:4.1f}%)")

print("\n" + "=" * 80)