# League Match Rate Analysis

This notebook analyzes match rates across different leagues in the matchmaker system. We'll explore:
- How women in each league match with men across different leagues
- How men in each league match with women across different leagues
- Cross-league matching patterns and visualizations

**Note:** Uses GPU-accelerated cudf for fast processing. A match is defined as a **reciprocal like** (both users like each other).

### Setup and Data Loading

In [1]:
from matchmaker import MatchingEngine
import cudf
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Initialize the matching engine
engine = MatchingEngine()

In [3]:
# Load interactions with gender information
engine.load_interactions(
    "data/swipes_clean.csv", 
    decider_col='decidermemberid',
    other_col='othermemberid', 
    like_col='like', 
    timestamp_col='timestamp',
    gender_col='decidergender'
)

Reading data... ✅
Fitting ALS... 
🚀 Preparing data...
🎯 Training male→female ALS...


100%|██████████| 15/15 [00:01<00:00, 14.98it/s]
100%|██████████| 15/15 [00:01<00:00, 14.98it/s]


🎯 Training female→male ALS...


100%|██████████| 15/15 [00:00<00:00, 233.81it/s]



🔄 Converting factors to CuPy arrays...
✅ Trained M2F ALS with 33173 males × 33358 females
✅ Trained F2M ALS with 10882 females × 44241 males
Complete! ✅


In [4]:
# Compute engagement scores
engine.run_engagement()

User DF updated ✅


In [5]:
# Compute popularity metrics and assign leagues
engine.run_elo()

User DF updated ✅


In [6]:
# Keep data in cudf for GPU-accelerated processing
user_gdf = engine.user_df
interaction_gdf = engine.interaction_df

print(f"Total users: {len(user_gdf)}")
print(f"Total interactions: {len(interaction_gdf)}")
print(f"\nLeague distribution:")
league_counts = user_gdf['league'].value_counts().to_pandas().sort_index()
print(league_counts)

Total users: 171012
Total interactions: 9827888

League distribution:
league
Bronze      27024
Diamond      9010
Gold        18017
Platinum    18016
Silver      18016
Name: count, dtype: int64


### Prepare Match Data

A "match" occurs when both users **reciprocally like** each other. We'll identify all matches using GPU-accelerated cudf operations.

In [7]:
# Filter to likes only (using cudf for speed)
likes_gdf = interaction_gdf[interaction_gdf['like'] == 1][['decidermemberid', 'othermemberid']].copy()

# A match occurs when:
# - User A likes User B (decidermemberid=A, othermemberid=B)
# - User B likes User A (decidermemberid=B, othermemberid=A)

# Self-join to find reciprocal likes
matches_gdf = likes_gdf.merge(
    likes_gdf,
    left_on=['decidermemberid', 'othermemberid'],
    right_on=['othermemberid', 'decidermemberid'],
    how='inner',
    suffixes=('_1', '_2')
)

# Keep only unique pairs (avoid counting A->B and B->A as separate matches)
# Always put the smaller user_id first
matches_gdf['user1'] = cudf.Series.min(
    cudf.concat([matches_gdf['decidermemberid_1'], matches_gdf['othermemberid_1']], axis=1),
    axis=1
)
matches_gdf['user2'] = cudf.Series.max(
    cudf.concat([matches_gdf['decidermemberid_1'], matches_gdf['othermemberid_1']], axis=1),
    axis=1
)

# Remove duplicates
matches_gdf = matches_gdf[['user1', 'user2']].drop_duplicates()

total_unique_likes = len(likes_gdf)
print(f"Total likes: {total_unique_likes}")
print(f"Total reciprocal matches: {len(matches_gdf)}")
print(f"Match rate: {len(matches_gdf) / total_unique_likes * 100:.2f}%")

Total likes: 3399637
Total reciprocal matches: 29774
Match rate: 0.88%


In [8]:
# Merge with user data to get gender and league info (all in cudf)
user_info_gdf = user_gdf[['user_id', 'gender', 'league']].dropna(subset=['league'])

# Merge user1 info
matches_gdf = matches_gdf.merge(
    user_info_gdf.rename(columns={'user_id': 'user1', 'gender': 'gender1', 'league': 'league1'}),
    on='user1',
    how='inner'
)

# Merge user2 info
matches_gdf = matches_gdf.merge(
    user_info_gdf.rename(columns={'user_id': 'user2', 'gender': 'gender2', 'league': 'league2'}),
    on='user2',
    how='inner'
)

print(f"Matches with league information: {len(matches_gdf)}")
print(f"\nGender combinations in matches:")
matches_gdf['gender_combo'] = matches_gdf['gender1'] + '-' + matches_gdf['gender2']
gender_combo_counts = matches_gdf['gender_combo'].value_counts().to_pandas()
print(gender_combo_counts)

Matches with league information: 29772

Gender combinations in matches:
gender_combo
M-F    15098
F-M    14674
Name: count, dtype: int64


### Cross-League Match Analysis

Now let's analyze how different leagues match with each other, separated by gender.

In [9]:
# Define league order for consistent display
league_order = ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond']

# Filter to heterosexual matches and create standardized format
# Ensure we have female_league and male_league columns
hetero_f_m = matches_gdf[(matches_gdf['gender1'] == 'F') & (matches_gdf['gender2'] == 'M')][['league1', 'league2']].copy()
hetero_f_m.columns = ['female_league', 'male_league']

hetero_m_f = matches_gdf[(matches_gdf['gender1'] == 'M') & (matches_gdf['gender2'] == 'F')][['league2', 'league1']].copy()
hetero_m_f.columns = ['female_league', 'male_league']

# Combine without collapsing repeated league pairings so match counts stay accurate
all_hetero_gdf = cudf.concat([hetero_f_m, hetero_m_f], ignore_index=True)

print(f"Total heterosexual matches: {len(all_hetero_gdf)}")

# Convert to pandas only for visualization (crosstab not in cudf)
all_hetero_matches = all_hetero_gdf.to_pandas()


Total heterosexual matches: 29772


In [10]:
# Consistent league colors (Bronze → Diamond)
league_colors = {
    'Bronze': 'rgba(205, 127, 50, 0.8)',    # bronze
    'Silver': 'rgba(192, 192, 192, 0.8)',   # silver
    'Gold': 'rgba(255, 215, 0, 0.8)',       # gold
    'Platinum': 'rgba(135, 206, 250, 0.8)', # light blue
    'Diamond': 'rgba(0, 71, 171, 0.8)'      # rich blue
}

In [11]:
# Men's perspective: Men in X league matched with women in Y league
men_match_matrix = pd.crosstab(
    all_hetero_matches['male_league'], 
    all_hetero_matches['female_league'],
    margins=True,
    margins_name='Total'
)

# Reorder to standard league order
men_match_matrix = men_match_matrix.reindex(
    index=league_order + ['Total'], 
    columns=league_order + ['Total'],
    fill_value=0
)

print("Men's Match Matrix (rows=male league, cols=female league):")
print(men_match_matrix)

Men's Match Matrix (rows=male league, cols=female league):
female_league  Bronze  Silver  Gold  Platinum  Diamond  Total
male_league                                                  
Bronze           1417     687   693       832      681   4310
Silver           1070     547   586       705      593   3501
Gold             1048     548   609       718      618   3541
Platinum         1961     947  1109      1334     1094   6445
Diamond          2988    1760  2200      2795     2232  11975
Total            8484    4489  5197      6384     5218  29772


In [12]:
# Create cross-tabulation matrices for match counts

# Women's perspective: Women in X league matched with men in Y league
women_match_matrix = pd.crosstab(
    all_hetero_matches['female_league'], 
    all_hetero_matches['male_league'],
    margins=True,
    margins_name='Total'
)

# Reorder to standard league order
women_match_matrix = women_match_matrix.reindex(
    index=league_order + ['Total'], 
    columns=league_order + ['Total'],
    fill_value=0
)

print("Women's Match Matrix (rows=female league, cols=male league):")
print(women_match_matrix)

Women's Match Matrix (rows=female league, cols=male league):
male_league    Bronze  Silver  Gold  Platinum  Diamond  Total
female_league                                                
Bronze           1417    1070  1048      1961     2988   8484
Silver            687     547   548       947     1760   4489
Gold              693     586   609      1109     2200   5197
Platinum          832     705   718      1334     2795   6384
Diamond           681     593   618      1094     2232   5218
Total            4310    3501  3541      6445    11975  29772


In [13]:
# Calculate percentage distributions for women
women_match_pct = women_match_matrix.iloc[:-1, :-1].div(
    women_match_matrix.iloc[:-1, -1], 
    axis=0
) * 100

# Calculate percentage distributions for men
men_match_pct = men_match_matrix.iloc[:-1, :-1].div(
    men_match_matrix.iloc[:-1, -1], 
    axis=0
) * 100

print("Women's Match Distribution (% of matches for women in each league):")
print(women_match_pct.round(1))
print("\nMen's Match Distribution (% of matches for men in each league):")
print(men_match_pct.round(1))

Women's Match Distribution (% of matches for women in each league):
male_league    Bronze  Silver  Gold  Platinum  Diamond
female_league                                         
Bronze           16.7    12.6  12.4      23.1     35.2
Silver           15.3    12.2  12.2      21.1     39.2
Gold             13.3    11.3  11.7      21.3     42.3
Platinum         13.0    11.0  11.2      20.9     43.8
Diamond          13.1    11.4  11.8      21.0     42.8

Men's Match Distribution (% of matches for men in each league):
female_league  Bronze  Silver  Gold  Platinum  Diamond
male_league                                           
Bronze           32.9    15.9  16.1      19.3     15.8
Silver           30.6    15.6  16.7      20.1     16.9
Gold             29.6    15.5  17.2      20.3     17.5
Platinum         30.4    14.7  17.2      20.7     17.0
Diamond          25.0    14.7  18.4      23.3     18.6


### Visualisations

In [14]:
# Prepare helpers in cudf for diagnostics
# We'll analyze heterosexual interactions only and compute per-female stats

# 1) Outbound likes by women (female -> male)
likes_f_to_m = interaction_gdf[(interaction_gdf['like'] == 1)]
likes_f_to_m = likes_f_to_m.merge(
    user_info_gdf.rename(columns={'user_id': 'decidermemberid', 'gender': 'decider_gender', 'league': 'decider_league'}),
    on='decidermemberid', how='left'
)
likes_f_to_m = likes_f_to_m.merge(
    user_info_gdf.rename(columns={'user_id': 'othermemberid', 'gender': 'other_gender', 'league': 'other_league'}),
    on='othermemberid', how='left'
)
likes_f_to_m = likes_f_to_m[(likes_f_to_m['decider_gender'] == 'F') & (likes_f_to_m['other_gender'] == 'M')]

# 2) Inbound likes to women (male -> female)
likes_m_to_f = interaction_gdf[(interaction_gdf['like'] == 1)]
likes_m_to_f = likes_m_to_f.merge(
    user_info_gdf.rename(columns={'user_id': 'decidermemberid', 'gender': 'decider_gender', 'league': 'decider_league'}),
    on='decidermemberid', how='left'
)
likes_m_to_f = likes_m_to_f.merge(
    user_info_gdf.rename(columns={'user_id': 'othermemberid', 'gender': 'other_gender', 'league': 'other_league'}),
    on='othermemberid', how='left'
)
likes_m_to_f = likes_m_to_f[(likes_m_to_f['decider_gender'] == 'M') & (likes_m_to_f['other_gender'] == 'F')]

# 3) Matches already computed; convert to female<->male frame for ease
fm_matches = cudf.concat([
    matches_gdf[(matches_gdf['gender1'] == 'F') & (matches_gdf['gender2'] == 'M')][['user1','league1','user2','league2']].rename(columns={'user1':'female_id','league1':'female_league','user2':'male_id','league2':'male_league'}),
    matches_gdf[(matches_gdf['gender1'] == 'M') & (matches_gdf['gender2'] == 'F')][['user2','league2','user1','league1']].rename(columns={'user2':'female_id','league2':'female_league','user1':'male_id','league1':'male_league'})
], ignore_index=True)

league_order = ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond']
league_colors = {
    'Bronze': 'rgba(205, 127, 50, 0.8)',
    'Silver': 'rgba(192, 192, 192, 0.8)',
    'Gold': 'rgba(255, 215, 0, 0.8)',
    'Platinum': 'rgba(135, 206, 250, 0.8)',
    'Diamond': 'rgba(0, 71, 171, 0.8)'
}

# A) Outbound target mix of women (row-normalized by female league)
outbound_counts = likes_f_to_m.groupby(['decider_league','other_league']).size().reset_index(name='cnt')
outbound_pivot = outbound_counts.pivot(index='decider_league', columns='other_league', values='cnt').fillna(0)
# Normalize rows to percentages (cuDF only supports axis=1; use transpose trick)
row_sums = outbound_pivot.sum(axis=1).replace(0, 1)
outbound_pct_cudf = (outbound_pivot.T.divide(row_sums, axis=1).T) * 100
# Ensure standard league order in pandas for display
outbound_pct = outbound_pct_cudf.to_pandas().reindex(index=league_order, columns=league_order, fill_value=0)

print("Outbound target mix by women (% of likes sent to each male league):")
print(outbound_pct.round(1))

# B) Reciprocity rate: matches per outbound like, by female league x male league
# Count matches by (female_league, male_league)
match_counts = fm_matches.groupby(['female_league','male_league']).size().reset_index(name='m_cnt')
# Count outbound likes by (female_league, male_league)
like_counts = likes_f_to_m.groupby(['decider_league','other_league']).size().reset_index(name='l_cnt')
like_counts = like_counts.rename(columns={'decider_league':'female_league','other_league':'male_league'})

recip = like_counts.merge(match_counts, on=['female_league','male_league'], how='left')
recip['m_cnt'] = recip['m_cnt'].fillna(0)
# Safe division avoiding divide-by-zero: replace 0 with null then fillna
den = recip['l_cnt'].astype('float64').replace(0, None)
recip['reciprocity'] = ((recip['m_cnt'].astype('float64') * 100) / den).fillna(0)
recip_pivot = recip.pivot(index='female_league', columns='male_league', values='reciprocity').fillna(0)
recip_pct = recip_pivot.to_pandas().reindex(index=league_order, columns=league_order, fill_value=0)

print("\nReciprocity rate (matches per 100 outbound likes), by female x male league:")
print(recip_pct.round(1))

# C) Inbound attention: avg inbound likes per woman by female league
inbound_counts = likes_m_to_f.groupby('othermemberid').size().reset_index(name='inbound_likes')
# Attach female league
female_leagues = user_info_gdf[user_info_gdf['gender']=='F'][['user_id','league']].rename(columns={'user_id':'othermemberid','league':'female_league'})
inbound_with_league = inbound_counts.merge(female_leagues, on='othermemberid', how='left')

avg_inbound = inbound_with_league.groupby('female_league')['inbound_likes'].mean().to_pandas().reindex(index=league_order, fill_value=0)

print("\nAverage inbound likes per woman (by female league):")
print(avg_inbound.round(2))

# Quick textual interpretation scaffold
print("\nInterpretation guide:")
print("- If Silver shows higher outbound focus on higher male leagues plus lower reciprocity, no-match rates can increase.")
print("- If inbound attention is lower for Silver than Bronze, fewer opportunities lead to fewer matches despite similar effort.")

Outbound target mix by women (% of likes sent to each male league):
other_league    Bronze  Silver  Gold  Platinum  Diamond
decider_league                                         
Bronze            11.0    12.7  16.0      25.6     34.7
Silver            11.2    12.3  15.1      24.4     37.0
Gold               9.3    10.9  14.4      26.8     38.6
Platinum          10.1    12.1  14.2      24.4     39.2
Diamond           10.7    13.2  15.3      24.3     36.6

Reciprocity rate (matches per 100 outbound likes), by female x male league:
male_league    Bronze  Silver  Gold  Platinum  Diamond
female_league                                         
Bronze           31.2    20.4  15.9      18.6     20.9
Silver           28.4    20.5  16.8      18.0     22.0
Gold             34.4    24.7  19.5      19.0     26.2
Platinum         34.6    24.5  21.2      23.0     30.1
Diamond          33.8    23.8  21.3      23.8     32.3

Average inbound likes per woman (by female league):
female_league
Bronze     

In [15]:
# Build outbound target mix matrices for women and men using GPU-safe normalization

# Women outbound mix
outbound_counts_w = likes_f_to_m.groupby(['decider_league','other_league']).size().reset_index(name='cnt')
outbound_pivot_w = outbound_counts_w.pivot(index='decider_league', columns='other_league', values='cnt').fillna(0)
row_sums_w = outbound_pivot_w.sum(axis=1).replace(0, 1)
outbound_pct_w = (outbound_pivot_w.T.divide(row_sums_w, axis=1).T * 100).to_pandas().reindex(index=league_order, columns=league_order, fill_value=0)

# Men outbound mix
likes_m_to_f_out = likes_m_to_f  # already filtered male -> female likes
outbound_counts_m = likes_m_to_f_out.groupby(['decider_league','other_league']).size().reset_index(name='cnt')
outbound_pivot_m = outbound_counts_m.pivot(index='decider_league', columns='other_league', values='cnt').fillna(0)
row_sums_m = outbound_pivot_m.sum(axis=1).replace(0, 1)
outbound_pct_m = (outbound_pivot_m.T.divide(row_sums_m, axis=1).T * 100).to_pandas().reindex(index=league_order, columns=league_order, fill_value=0)

# Create side-by-side stacked bar charts comparing outbound vs matches
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Women: outbound (left) vs match distribution (right) - colored by male league
fig_w_compare = make_subplots(rows=1, cols=2, subplot_titles=(
    "Women: Outbound Target Mix", "Women: Match Distribution"
))

for m_league in league_order:
    color = league_colors[m_league]
    fig_w_compare.add_trace(
        go.Bar(name=f"Men: {m_league}", x=league_order, y=outbound_pct_w[m_league],
               marker_color=color, legendgroup=f"male-{m_league}",
               text=outbound_pct_w[m_league].round(1), texttemplate='%{text}%', textposition='inside'),
        row=1, col=1
    )
    fig_w_compare.add_trace(
        go.Bar(name=f"Men: {m_league}", x=league_order, y=women_match_pct[m_league],
               marker_color=color, legendgroup=f"male-{m_league}", showlegend=False,
               text=women_match_pct[m_league].round(1), texttemplate='%{text}%', textposition='inside'),
        row=1, col=2
    )

fig_w_compare.update_layout(barmode='stack', width=1100, height=550,
    title={
        'text': "Women: Outbound vs Matches by League",
        'x': 0.5,
        'xanchor': 'center'
    }, legend_title="Male League")
fig_w_compare.update_xaxes(title_text="Female League", row=1, col=1)
fig_w_compare.update_xaxes(title_text="Female League", row=1, col=2)
fig_w_compare.update_yaxes(title_text="Percentage (%)", row=1, col=1)
fig_w_compare.update_yaxes(title_text="Percentage (%)", row=1, col=2)
fig_w_compare.show()

# Men: outbound (left) vs match distribution (right) - colored by female league
fig_m_compare = make_subplots(rows=1, cols=2, subplot_titles=(
    "Men: Outbound Target Mix", "Men: Match Distribution"
))

for f_league in league_order:
    color = league_colors[f_league]
    fig_m_compare.add_trace(
        go.Bar(name=f"Women: {f_league}", x=league_order, y=outbound_pct_m[f_league],
               marker_color=color, legendgroup=f"female-{f_league}",
               text=outbound_pct_m[f_league].round(1), texttemplate='%{text}%', textposition='inside'),
        row=1, col=1
    )
    fig_m_compare.add_trace(
        go.Bar(name=f"Women: {f_league}", x=league_order, y=men_match_pct[f_league],
               marker_color=color, legendgroup=f"female-{f_league}", showlegend=False,
               text=men_match_pct[f_league].round(1), texttemplate='%{text}%', textposition='inside'),
        row=1, col=2
    )

fig_m_compare.update_layout(barmode='stack', width=1100, height=550,
    title={
        'text': "Men: Outbound vs Matches by League",
        'x': 0.5,
        'xanchor': 'center'
    }, legend_title="Female League")
fig_m_compare.update_xaxes(title_text="Male League", row=1, col=1)
fig_m_compare.update_xaxes(title_text="Male League", row=1, col=2)
fig_m_compare.update_yaxes(title_text="Percentage (%)", row=1, col=1)
fig_m_compare.update_yaxes(title_text="Percentage (%)", row=1, col=2)
fig_m_compare.show()

In [16]:
# Calculate proportion of users with no matches by league and gender
# Only include users who have made at least 5 likes

# Get all users with league info who have made at least 5 likes
users_with_leagues = user_info_gdf.merge(
  user_gdf[['user_id', 'total_likes']],
  on='user_id',
  how='inner'
).to_pandas()

# Filter for users with at least 5 likes
users_with_leagues = users_with_leagues[users_with_leagues['total_likes'] >= 5]

# Get users who have at least one match
users_with_matches_raw = cudf.concat([
  matches_gdf[['user1', 'gender1', 'league1']].rename(columns={'user1': 'user_id', 'gender1': 'gender', 'league1': 'league'}),
  matches_gdf[['user2', 'gender2', 'league2']].rename(columns={'user2': 'user_id', 'gender2': 'gender', 'league2': 'league'})
]).drop_duplicates(subset=['user_id']).to_pandas()

# Filter to only include users who have at least 5 likes
users_with_matches = users_with_matches_raw[users_with_matches_raw['user_id'].isin(users_with_leagues['user_id'])]

# Calculate no-match statistics
no_match_stats = []

for gender in ['F', 'M']:
  for league in league_order:
    # Total users in this league and gender (with at least 5 likes)
    total = len(users_with_leagues[(users_with_leagues['league'] == league) & (users_with_leagues['gender'] == gender)])
    
    # Users with at least one match
    with_match = len(users_with_matches[(users_with_matches['league'] == league) & (users_with_matches['gender'] == gender)])
    
    # Users with no matches
    no_match = total - with_match
    no_match_pct = (no_match / total * 100) if total > 0 else 0
    
    no_match_stats.append({
      'Gender': 'Women' if gender == 'F' else 'Men',
      'League': league,
      'Total Users': total,
      'With Matches': with_match,
      'No Matches': no_match,
      'No Match %': no_match_pct
    })

no_match_df = pd.DataFrame(no_match_stats)

# Create visualization with league colors
fig_no_match = make_subplots(
  rows=1, cols=2,
  subplot_titles=('Women: % With No Matches', 'Men: % With No Matches'),
  specs=[[{'type': 'bar'}, {'type': 'bar'}]]
)

women_data = no_match_df[no_match_df['Gender'] == 'Women']
men_data = no_match_df[no_match_df['Gender'] == 'Men']

# Determine the maximum y value across both datasets for consistent scaling
max_y = max(women_data['No Match %'].max(), men_data['No Match %'].max())
y_range = [0, max_y * 1.15]  # Add 15% padding for text labels

# Women's bars - colored by league
women_colors = [league_colors[league] for league in women_data['League']]
fig_no_match.add_trace(
  go.Bar(
    x=women_data['League'],
    y=women_data['No Match %'],
    text=women_data['No Match %'].round(1),
    texttemplate='%{text}%',
    textposition='outside',
    marker_color=women_colors,
    showlegend=False,
    name='Women'
  ),
  row=1, col=1
)

# Men's bars - colored by league
men_colors = [league_colors[league] for league in men_data['League']]
fig_no_match.add_trace(
  go.Bar(
    x=men_data['League'],
    y=men_data['No Match %'],
    text=men_data['No Match %'].round(1),
    texttemplate='%{text}%',
    textposition='outside',
    marker_color=men_colors,
    showlegend=False,
    name='Men'
  ),
  row=1, col=2
)

fig_no_match.update_xaxes(title_text="League", row=1, col=1)
fig_no_match.update_xaxes(title_text="League", row=1, col=2)
fig_no_match.update_yaxes(title_text="% With No Matches", range=y_range, row=1, col=1)
fig_no_match.update_yaxes(title_text="% With No Matches", range=y_range, row=1, col=2)

fig_no_match.update_layout(
  title={
    'text': 'Users With No Reciprocal Matches by League<br><sub>Percentage of users (with ≥5 likes) in each league who received zero matches</sub>',
    'x': 0.5,
    'xanchor': 'center'
  },
  showlegend=False,
  height=500,
  width=1000
)

fig_no_match.show()

# Print detailed statistics
print("=" * 80)
print("USERS WITH NO MATCHES BY LEAGUE (Users with at least 5 likes)")
print("=" * 80)
print("\nWOMEN:")
print(women_data[['League', 'Total Users', 'With Matches', 'No Matches', 'No Match %']].to_string(index=False))
print("\nMEN:")
print(men_data[['League', 'Total Users', 'With Matches', 'No Matches', 'No Match %']].to_string(index=False))
print("\n" + "=" * 80)

USERS WITH NO MATCHES BY LEAGUE (Users with at least 5 likes)

WOMEN:
  League  Total Users  With Matches  No Matches  No Match %
  Bronze         1618          1351         267   16.501854
  Silver          993           808         185   18.630413
    Gold          975           837         138   14.153846
Platinum         1099           981         118   10.737034
 Diamond          684           642          42    6.140351

MEN:
  League  Total Users  With Matches  No Matches  No Match %
  Bronze        12494          1869       10625   85.040820
  Silver         4647          1814        2833   60.964063
    Gold         3106          1662        1444   46.490663
Platinum         3660          2463        1197   32.704918
 Diamond         3129          2562         567   18.120805



### League Preference Analysis

In [17]:
# Calculate "dating up/down" statistics using cudf for performance
# For each league, what % of matches are with same league, higher league, or lower league?

def calculate_league_direction_cudf(gdf, user_league_col, partner_league_col):
    """Calculate whether matches are same league, up, or down using cudf."""
    league_rank = {league: i for i, league in enumerate(league_order)}
    
    # Add rank columns
    gdf['user_rank'] = gdf[user_league_col].map(league_rank)
    gdf['partner_rank'] = gdf[partner_league_col].map(league_rank)
    
    results = []
    for user_league in league_order:
        subset = gdf[gdf[user_league_col] == user_league]
        if len(subset) == 0:
            continue
            
        same = len(subset[subset['user_rank'] == subset['partner_rank']])
        up = len(subset[subset['partner_rank'] > subset['user_rank']])
        down = len(subset[subset['partner_rank'] < subset['user_rank']])
        total = len(subset)
        
        results.append({
            'league': user_league,
            'Same League': same / total * 100,
            'Dating Up': up / total * 100,
            'Dating Down': down / total * 100
        })
    
    return pd.DataFrame(results)

# Process using cudf
women_direction = calculate_league_direction_cudf(all_hetero_gdf.copy(), 'female_league', 'male_league')
men_direction = calculate_league_direction_cudf(all_hetero_gdf.copy(), 'male_league', 'female_league')

print("Women's match direction:")
print(women_direction.round(1))
print("\nMen's match direction:")
print(men_direction.round(1))

Women's match direction:
     league  Same League  Dating Up  Dating Down
0    Bronze         16.7       83.3          0.0
1    Silver         12.2       72.5         15.3
2      Gold         11.7       63.7         24.6
3  Platinum         20.9       43.8         35.3
4   Diamond         42.8        0.0         57.2

Men's match direction:
     league  Same League  Dating Up  Dating Down
0    Bronze         32.9       67.1          0.0
1    Silver         15.6       53.8         30.6
2      Gold         17.2       37.7         45.1
3  Platinum         20.7       17.0         62.3
4   Diamond         18.6        0.0         81.4


In [18]:
# Stacked bar chart for women's dating direction
# Color each league bar with its respective color
fig_women_dir = go.Figure()

for idx, league in enumerate(league_order):
    league_data = women_direction[women_direction['league'] == league]
    if len(league_data) > 0:
        # Dating Down
        fig_women_dir.add_trace(go.Bar(
            name=league if idx == 0 else None,  # Only show in legend once
            x=[league],
            y=league_data['Dating Down'],
            marker_color=league_colors[league],
            legendgroup=league,
            showlegend=False,
            text=league_data['Dating Down'].round(1),
            texttemplate='%{text}%',
            textposition='inside'
        ))

# Use simpler approach: bars colored by league
bars_down = []
bars_same = []
bars_up = []
for league in league_order:
    league_data = women_direction[women_direction['league'] == league]
    if len(league_data) > 0:
        bars_down.append(league_data['Dating Down'].values[0])
        bars_same.append(league_data['Same League'].values[0])
        bars_up.append(league_data['Dating Up'].values[0])
    else:
        bars_down.append(0)
        bars_same.append(0)
        bars_up.append(0)

fig_women_dir = go.Figure()

fig_women_dir.add_trace(go.Bar(
    name='Dating Down',
    x=league_order,
    y=bars_down,
    marker_color='rgba(255, 127, 127, 0.8)',  # light red
    text=[f'{v:.1f}%' if v > 0 else '' for v in bars_down],
    textposition='inside'
))

fig_women_dir.add_trace(go.Bar(
    name='Same League',
    x=league_order,
    y=bars_same,
    marker_color='rgba(144, 238, 144, 0.8)',  # light green
    text=[f'{v:.1f}%' if v > 0 else '' for v in bars_same],
    textposition='inside'
))

fig_women_dir.add_trace(go.Bar(
    name='Dating Up',
    x=league_order,
    y=bars_up,
    marker_color='rgba(173, 216, 230, 0.8)',  # light blue
    text=[f'{v:.1f}%' if v > 0 else '' for v in bars_up],
    textposition='inside'
))

fig_women_dir.update_layout(
    barmode='stack',
    title={
        'text': 'Women: Match Direction by League<br><sub>Percentage of matches with same league, higher league, or lower league partners</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Female League",
    yaxis_title="Percentage of Matches (%)",
    width=900,
    height=500
)

fig_women_dir.show()

In [19]:
# Stacked bar chart for men's dating direction
bars_down = []
bars_same = []
bars_up = []
for league in league_order:
    league_data = men_direction[men_direction['league'] == league]
    if len(league_data) > 0:
        bars_down.append(league_data['Dating Down'].values[0])
        bars_same.append(league_data['Same League'].values[0])
        bars_up.append(league_data['Dating Up'].values[0])
    else:
        bars_down.append(0)
        bars_same.append(0)
        bars_up.append(0)

fig_men_dir = go.Figure()

fig_men_dir.add_trace(go.Bar(
    name='Dating Down',
    x=league_order,
    y=bars_down,
    marker_color='rgba(255, 127, 127, 0.8)',  # light red
    text=[f'{v:.1f}%' if v > 0 else '' for v in bars_down],
    textposition='inside'
))

fig_men_dir.add_trace(go.Bar(
    name='Same League',
    x=league_order,
    y=bars_same,
    marker_color='rgba(144, 238, 144, 0.8)',  # light green
    text=[f'{v:.1f}%' if v > 0 else '' for v in bars_same],
    textposition='inside'
))

fig_men_dir.add_trace(go.Bar(
    name='Dating Up',
    x=league_order,
    y=bars_up,
    marker_color='rgba(173, 216, 230, 0.8)',  # light blue
    text=[f'{v:.1f}%' if v > 0 else '' for v in bars_up],
    textposition='inside'
))

fig_men_dir.update_layout(
    barmode='stack',
    title={
        'text': 'Men: Match Direction by League<br><sub>Percentage of matches with same league, higher league, or lower league partners</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title="Male League",
    yaxis_title="Percentage of Matches (%)",
    width=900,
    height=500
)

fig_men_dir.show()

### Key Insights Summary

In [20]:
# Generate summary statistics
print("=" * 80)
print("CROSS-LEAGUE MATCHING INSIGHTS (Reciprocal Likes)")
print("=" * 80)

# Overall match distribution
print("\n1. OVERALL MATCH DISTRIBUTION")
print(f"   Total reciprocal matches analyzed: {len(all_hetero_matches)}")
print(f"   Total users with leagues: {len(user_info_gdf)}")

# Same-league matching rate
same_league_matches = len(all_hetero_matches[
    all_hetero_matches['female_league'] == all_hetero_matches['male_league']
])
print(f"\n2. SAME-LEAGUE MATCHES")
print(f"   Same-league matches: {same_league_matches} ({same_league_matches/len(all_hetero_matches)*100:.1f}%)")

# League mobility
print("\n3. LEAGUE MOBILITY (Women)")
for _, row in women_direction.iterrows():
    print(f"   {row['league']:8} - Same: {row['Same League']:5.1f}%, Up: {row['Dating Up']:5.1f}%, Down: {row['Dating Down']:5.1f}%")

print("\n4. LEAGUE MOBILITY (Men)")
for _, row in men_direction.iterrows():
    print(f"   {row['league']:8} - Same: {row['Same League']:5.1f}%, Up: {row['Dating Up']:5.1f}%, Down: {row['Dating Down']:5.1f}%")

# Top cross-league pairs
print("\n5. TOP CROSS-LEAGUE COMBINATIONS")
cross_league = all_hetero_matches[
    all_hetero_matches['female_league'] != all_hetero_matches['male_league']
].copy()
cross_league['combo'] = cross_league['female_league'] + ' ♀ + ' + cross_league['male_league'] + ' ♂'
top_combos = cross_league['combo'].value_counts().head(10)
for combo, count in top_combos.items():
    pct = count / len(all_hetero_matches) * 100
    print(f"   {combo:30} - {count:4} matches ({pct:4.1f}%)")

print("\n" + "=" * 80)

CROSS-LEAGUE MATCHING INSIGHTS (Reciprocal Likes)

1. OVERALL MATCH DISTRIBUTION
   Total reciprocal matches analyzed: 29772
   Total users with leagues: 90083

2. SAME-LEAGUE MATCHES
   Same-league matches: 6139 (20.6%)

3. LEAGUE MOBILITY (Women)
   Bronze   - Same:  16.7%, Up:  83.3%, Down:   0.0%
   Silver   - Same:  12.2%, Up:  72.5%, Down:  15.3%
   Gold     - Same:  11.7%, Up:  63.7%, Down:  24.6%
   Platinum - Same:  20.9%, Up:  43.8%, Down:  35.3%
   Diamond  - Same:  42.8%, Up:   0.0%, Down:  57.2%

4. LEAGUE MOBILITY (Men)
   Bronze   - Same:  32.9%, Up:  67.1%, Down:   0.0%
   Silver   - Same:  15.6%, Up:  53.8%, Down:  30.6%
   Gold     - Same:  17.2%, Up:  37.7%, Down:  45.1%
   Platinum - Same:  20.7%, Up:  17.0%, Down:  62.3%
   Diamond  - Same:  18.6%, Up:   0.0%, Down:  81.4%

5. TOP CROSS-LEAGUE COMBINATIONS
   Bronze ♀ + Diamond ♂           - 2988 matches (10.0%)
   Platinum ♀ + Diamond ♂         - 2795 matches ( 9.4%)
   Gold ♀ + Diamond ♂             - 2200 matche