# Polymarket User Win Rate Analysis

This notebook calculates the **Profitability Win Rate** for users.

**Definition:** A "Win" is defined as making a net profit (> $0.00) on a **Resolved Market**.

$$ \text{Win Rate} = \frac{\text{Count of Profitable Markets}}{\text{Total Resolved Markets Traded}} $$

In [1]:
import pandas as pd
import numpy as np
import json

# 1. Load Data
trades_path = 'data/trades.csv'
markets_path = 'data/markets.csv'

# Handle potential encoding issues
df_trades = pd.read_csv(trades_path, encoding='utf-8', encoding_errors='replace')
df_markets = pd.read_csv(markets_path, encoding='utf-8', encoding_errors='replace')

print(f"Loaded {len(df_trades)} trades and {len(df_markets)} markets.")

Loaded 2140596 trades and 12331 markets.


## 2. Identify Resolved Markets

We only care about markets that have been closed/resolved.

In [2]:
market_resolution_map = {}
resolved_slugs = set()

for idx, row in df_markets.iterrows():
    try:
        data = json.loads(row['data'])
        slug = row['slug']
        is_closed = data.get('closed', False)
        
        if is_closed:
            resolved_slugs.add(slug)
            outcomes = json.loads(data.get('outcomes', '[]'))
            payouts = data.get('resolution', {}).get('payouts', [])
            
            if outcomes and payouts:
                res_info = {outcomes[i]: float(payouts[i]) for i in range(len(outcomes))}
                market_resolution_map[slug] = res_info
    except Exception:
        continue

print(f"Found {len(resolved_slugs)} resolved markets.")

Found 546 resolved markets.


## 3. Filter & Calculate PnL for Resolved Markets

1. Filter trades to keep only those in `resolved_slugs`.
2. Calculate Cash Flow and Payout for each user/market.

In [3]:
# Filter trades
df_resolved_trades = df_trades[df_trades['market_slug'].isin(resolved_slugs)].copy()

# Calculate Cash Flow
def calculate_flow(row):
    if row['side'] == 'BUY':
        return -row['usdc_volume'], row['shares']
    else:
        return row['usdc_volume'], -row['shares']

df_resolved_trades[['cash_flow', 'share_change']] = df_resolved_trades.apply(
    lambda row: pd.Series(calculate_flow(row)), axis=1
)

# Group by User + Market + Outcome
position_df = df_resolved_trades.groupby(['maker', 'market_slug', 'outcome']).agg(
    net_cash_flow=('cash_flow', 'sum'),
    net_shares=('share_change', 'sum')
).reset_index()

# Calculate Final Payout
def calculate_payout(row):
    market = row['market_slug']
    outcome = row['outcome']
    shares = row['net_shares']
    
    payout_price = 0.0
    if market in market_resolution_map and outcome in market_resolution_map[market]:
        payout_price = market_resolution_map[market][outcome]
        
    return shares * payout_price

position_df['payout_value'] = position_df.apply(calculate_payout, axis=1)
position_df['pnl'] = position_df['net_cash_flow'] + position_df['payout_value']

# Consolidate to User + Market level (summing PnL across outcomes)
market_pnl_df = position_df.groupby(['maker', 'market_slug'])['pnl'].sum().reset_index()
market_pnl_df.head()

Unnamed: 0,maker,market_slug,pnl
0,0x0012FDbC568eAC386bAEb465f5F8a867341Fcdb5,bitcoin-up-or-down-on-december-18,1.031
1,0x0012FDbC568eAC386bAEb465f5F8a867341Fcdb5,ethereum-up-or-down-on-december-18,1.003
2,0x0013667cEb2474c99C1A406A1Bef55c8CbAD3C30,bitcoin-above-90k-on-december-18,-3.099999
3,0x0013667cEb2474c99C1A406A1Bef55c8CbAD3C30,bitcoin-up-or-down-december-18-3pm-et,-22.848398
4,0x0013667cEb2474c99C1A406A1Bef55c8CbAD3C30,bitcoin-up-or-down-december-18-4am-et,-84.394399


## 4. Compute Win Rate

Classify each market as a Win or Loss and aggregate by user.

- **Filter:** We exclude market interactions with negligible impact (e.g., PnL between -$0.01 and +$0.01) to avoid dust noise.

In [5]:
# Define Win/Loss
def get_result(pnl):
    if pnl > 0.01: return 'WIN'
    elif pnl < -0.01: return 'LOSS'
    else: return 'NEUTRAL'

market_pnl_df['result'] = market_pnl_df['pnl'].apply(get_result)

# Filter out neutrals
active_markets = market_pnl_df[market_pnl_df['result'] != 'NEUTRAL']

# Aggregation
win_rate_df = active_markets.groupby('maker').agg(
    total_markets=('market_slug', 'count'),
    wins=('result', lambda x: (x == 'WIN').sum()),
    total_profit=('pnl', 'sum')
).reset_index()

win_rate_df['win_rate_pct'] = (win_rate_df['wins'] / win_rate_df['total_markets']) * 100

# Filter for statistical significance (e.g., at least 5 markets traded)
significant_traders = win_rate_df[win_rate_df['total_markets'] >= 5].sort_values('win_rate_pct', ascending=False)

print("Top Traders by Win Rate (min 5 markets):")
significant_traders.head(100)
# save to csv
significant_traders.to_csv('data/top_traders_by_win_rate.csv', index=False)

Top Traders by Win Rate (min 5 markets):
