### Question 5: Implementation Gap

**Compare yield-space P&L to bond-space returns.**

This question confronts the gap between academic factor analysis and implementable trades.

a) **Design a roll schedule.** Using the bond panel (`treasury_panel_pca.xlsx`), you need to select actual bonds to proxy 2Y, 5Y, and 10Y maturities. Decide on:
   - **Roll frequency:** monthly or quarterly?
   - **Selection criteria:** closest TTM to target? Prefer on-the-run (low `age_days`)?
   
   Document your choices and rationale. There is no single correct answer—tradeoffs exist between tracking error, transaction costs, and liquidity.

b) **Construct the butterfly using actual bonds.** At each roll date:
   - Select the bond closest to each target maturity (2Y, 5Y, 10Y)
   - Apply your PCA-neutral weights, scaled by duration
   - Hold these bonds until the next roll date
   
   Compute daily bond returns between roll dates (price changes plus accrued interest changes).

c) **Compare yield-space vs bond-space P&L.** Compute the correlation and R² between:
   - Yield-space butterfly changes (from Question 3)
   - Bond-space portfolio returns
   
   Plot both series on the same chart.

d) **Analyze the sources of discrepancy:**

   - **Yield changes**: At each timestep, the indivudal bond's yield change will be different than the fixed maturity yield change. This is due to different maturities as well as factors idiosyncratic to the bond itself. How correlated are GSW yield changes with held bond yield changes? Show the relationship with a plot.
     
   - **First-order approxiation**: We can model bond returns from yields via first order approximation via duration. How correlated are our approximated returns to the actual bond returns? Show the relationship with a plot.

e) **Discuss:** Is the yield-space backtest a reliable guide to actual trading performance? What adjustments would you make for a production implementation?

## Data Loading

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from IPython.display import display

In [None]:
# GSW yields (same as Question 3)
gsw = pd.read_excel('data/gsw_yields.xlsx', index_col=0, parse_dates=True)
maturities = [2, 5, 10]
yields = gsw[maturities].dropna()
yields.columns = ['2Y', '5Y', '10Y']
yields = yields.loc['2015':]
yields_diff = yields.diff(periods=1).dropna()

# PCA weights from Question 2
%store -r weights
print(f"Butterfly weights: {weights}")

# Bond panel
panel = pd.read_excel('data/treasury_panel_pca.xlsx', parse_dates=['caldt', 'issue_date', 'maturity_date'])
print(f"Panel shape: {panel.shape}")
print(f"Date range: {panel['caldt'].min().date()} to {panel['caldt'].max().date()}")
print(f"Unique dates: {panel['caldt'].nunique()}, Unique bonds: {panel['kytreasno'].nunique()}")

## Explore the Bond Panel

In [None]:
panel.head(10)

In [None]:
panel.describe()

In [None]:
# How many bonds fall near each target maturity on a typical day?
target_mats = [2, 5, 10]
tolerance = 0.5  # +/- 6 months

sample_date = panel['caldt'].iloc[len(panel)//2]  # mid-sample date
day = panel[panel['caldt'] == sample_date]

print(f"Sample date: {sample_date.date()} ({len(day)} bonds)")
print()
for t in target_mats:
    nearby = day[(day['ttm'] >= t - tolerance) & (day['ttm'] <= t + tolerance)]
    print(f"--- {t}Y bucket (TTM {t-tolerance:.1f} to {t+tolerance:.1f}) ---")
    print(f"  {len(nearby)} bonds, age_days range: {nearby['age_days'].min():.0f} to {nearby['age_days'].max():.0f}")
    print(f"  TTM range: {nearby['ttm'].min():.2f} to {nearby['ttm'].max():.2f}")
    print(nearby[['kytreasno', 'cusip', 'ttm', 'age_days', 'duration', 'ytm', 'type']].sort_values('ttm').to_string(index=False))
    print()

In [None]:
# Distribution of TTM across the panel — where are bonds concentrated?
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: TTM histogram for the full panel
axes[0].hist(panel['ttm'], bins=100, alpha=0.7, edgecolor='black', linewidth=0.3)
for t in target_mats:
    axes[0].axvline(t, color='red', linestyle='--', linewidth=1.5, label=f'{t}Y target')
axes[0].set_title('TTM Distribution (All Dates)')
axes[0].set_xlabel('Time to Maturity (years)')
axes[0].set_ylabel('Count')
axes[0].legend()
axes[0].set_xlim(0, 12)
axes[0].grid(True, alpha=0.3)

# Right: age_days for bonds near each target (how fresh are our candidates?)
for t in target_mats:
    nearby = panel[(panel['ttm'] >= t - tolerance) & (panel['ttm'] <= t + tolerance)]
    axes[1].hist(nearby['age_days'], bins=50, alpha=0.5, label=f'{t}Y bucket')
axes[1].set_title('Age Distribution for Bonds Near Targets')
axes[1].set_xlabel('Age (days since issuance)')
axes[1].set_ylabel('Count')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Align date ranges — the panel only covers 2022-2025, GSW goes back to 2015
# We need the overlap period for comparison
panel_dates = panel['caldt'].sort_values().unique()
gsw_dates = yields_diff.index

overlap_start = max(panel_dates.min(), gsw_dates.min())
overlap_end = min(panel_dates.max(), gsw_dates.max())
print(f"Overlap period: {pd.Timestamp(overlap_start).date()} to {pd.Timestamp(overlap_end).date()}")

# Filter both datasets to overlap
yields_overlap = yields.loc[overlap_start:overlap_end]
yields_diff_overlap = yields_diff.loc[overlap_start:overlap_end]
panel_overlap = panel[(panel['caldt'] >= overlap_start) & (panel['caldt'] <= overlap_end)]
print(f"GSW trading days in overlap: {len(yields_diff_overlap)}")
print(f"Panel trading days in overlap: {panel_overlap['caldt'].nunique()}")

## Part (a): Design a Roll Schedule

**Decision point:** You need to choose two things:

1. **Roll frequency** — how often do you swap out the held bonds?
2. **Selection criteria** — how do you pick *which* bond at each roll?

The exploration above should show you what's available. Consider:
- Monthly rolls track the target maturity more tightly but incur more transaction costs
- Quarterly rolls are cheaper but let TTM drift further from target
- On-the-run bonds (low `age_days`) are more liquid but may carry a premium
- Closest-TTM selection minimizes maturity mismatch regardless of liquidity

In [None]:
# TODO: Set your roll schedule parameters
# ROLL_FREQ = 'M'   # 'M' for monthly, 'Q' for quarterly
# SELECTION = ...    # 'closest_ttm', 'on_the_run', or a custom approach

def select_bond(day_panel, target_maturity, method='closest_ttm'):
    """Select a single bond to proxy the target maturity on a given date.
    
    Parameters
    ----------
    day_panel : DataFrame — all bonds available on this date
    target_maturity : float — target TTM in years (2, 5, or 10)
    method : str — selection approach
    
    Returns
    -------
    Series — the selected bond's row
    """
    # TODO: implement your selection logic here (5-10 lines)
    # Consider: what if multiple bonds have similar TTM? 
    # How do you break ties — by age_days? by duration closeness?
    pass


def build_roll_schedule(panel, target_maturities, freq='MS', method='closest_ttm'):
    """Build a schedule of which bonds to hold at each roll date.
    
    Parameters
    ----------
    panel : DataFrame — full bond panel
    target_maturities : list — [2, 5, 10]
    freq : str — pandas offset alias for roll frequency
    method : str — bond selection method
    
    Returns
    -------
    DataFrame — columns: date, 2Y_bond, 5Y_bond, 10Y_bond (kytreasno IDs)
                plus TTM and duration for each
    """
    dates = panel['caldt'].sort_values().unique()
    # Generate roll dates (first business day of each month/quarter)
    roll_dates = pd.date_range(dates.min(), dates.max(), freq=freq)
    # Snap to nearest actual trading day
    roll_dates = [min(dates, key=lambda d: abs(d - rd)) for rd in roll_dates]
    roll_dates = sorted(set(roll_dates))
    
    records = []
    for rd in roll_dates:
        day = panel[panel['caldt'] == rd]
        record = {'roll_date': rd}
        for t in target_maturities:
            bond = select_bond(day, t, method=method)
            if bond is not None:
                record[f'{t}Y_id'] = bond['kytreasno']
                record[f'{t}Y_ttm'] = bond['ttm']
                record[f'{t}Y_duration'] = bond['duration']
                record[f'{t}Y_age'] = bond['age_days']
                record[f'{t}Y_cusip'] = bond['cusip']
        records.append(record)
    
    return pd.DataFrame(records)