# Sector Heterogeneity (INCOMPLETE)

Want to test if effect differs by sector (cloud vs fintech vs social, etc.)

Problem: Not enough data per sector. Only 71 IPOs total.

In [1]:
import pandas as pd
import numpy as np
from linearmodels.panel import PanelOLS

df = pd.read_csv('../../data/processed/stock_prices_ipo_adjusted.csv',
                 parse_dates=['Date', 'IPO_Date'])

# Load IPO metadata to get sectors
ipo_meta = pd.read_csv('../../data/raw/tech_ipos_curated.csv')

df = df.merge(ipo_meta[['Ticker', 'Sector']], on='Ticker', how='left')

df['Post_Lockup'] = (df['Days_Since_IPO'] > 180).astype(int)
df_clean = df.dropna(subset=['Abnormal_Return']).copy()

print(f"Loaded {len(df_clean):,} observations from {df_clean['Ticker'].nunique()} IPOs")

Loaded 17,802 observations from 71 IPOs


In [2]:
sector_counts = df_clean.groupby('Sector')['Ticker'].nunique().sort_values(ascending=False)

print("\nIPOs per sector:")
print(sector_counts.head(10))

print(f"\nProblem: {len(sector_counts)} unique sectors, most have 1-2 IPOs")


IPOs per sector:
Fintech                    6
Cloud                      5
E-commerce                 5
Cybersecurity              4
Enterprise Software        3
...

Problem: 58 unique sectors, most have 1-2 IPOs


## Attempt 1: Group into broad categories

Manually re-categorize into: Cloud/SaaS, Fintech, E-commerce, Social, Other

In [3]:
# This is tedious...
cloud_keywords = ['Cloud', 'SaaS', 'Software', 'DevOps', 'Database', 'Platform']
fintech_keywords = ['Fintech', 'Payments', 'Banking', 'Trading', 'Financial']
ecom_keywords = ['E-commerce', 'Marketplace', 'Retail']

def categorize_sector(sector):
    if pd.isna(sector):
        return 'Other'
    sector = str(sector)
    if any(k in sector for k in cloud_keywords):
        return 'Cloud/SaaS'
    elif any(k in sector for k in fintech_keywords):
        return 'Fintech'
    elif any(k in sector for k in ecom_keywords):
        return 'E-commerce'
    else:
        return 'Other'

df_clean['Sector_Broad'] = df_clean['Sector'].apply(categorize_sector)

print("\nBroad categories:")
print(df_clean.groupby('Sector_Broad')['Ticker'].nunique())

# Still not great, but better


Broad categories:
Cloud/SaaS        18
Fintech           15
E-commerce        12
Other             26


## Attempt 2: Run DiD by sector

Need at least ~10 IPOs per category for reasonable power...

In [4]:
results = []

for sector in ['Cloud/SaaS', 'Fintech', 'E-commerce']:
    df_sector = df_clean[df_clean['Sector_Broad'] == sector].copy()
    
    df_panel = df_sector.set_index(['Ticker', 'Date'])
    
    # This fails with "Unable to estimate the model. The model has 0 degrees of freedom"
    # Because time FE absorb everything when N is small
    model = PanelOLS(
        dependent=df_panel['Abnormal_Return'],
        exog=df_panel[['Post_Lockup']],
        entity_effects=True,
        time_effects=True
    ).fit(cov_type='clustered', cluster_entity=True)

ValueError: Unable to estimate the model. The model has 0 degrees of freedom

## Dead end

Can't estimate sector-specific effects because:
1. Too few IPOs per sector (10-18)
2. Time fixed effects eat all the degrees of freedom
3. Without time FE, get biased estimates (some sectors IPO'd in bull market, others in bear)

Options:
- Collect more data (non-tech sectors)  
- Drop time FE (not ideal, introduces confounds)
- Interaction term in pooled model (might work?)

**Decided**: Focus on size heterogeneity instead (large vs small). Have enough data for that.

**Time wasted**: 2 hours manually categorizing sectors

Keeping this notebook in case I collect more data later.