# CDS Par Spread Returns Construction

## Paper Introduction

This construction is based upon the structure proposed by {AUTHORS} in {PAPER}. The original paper studies the concept of implied arbitrage returns in many different markets. If markets were truly frictionless, we would expect there to be perfect correlation between all of the arbitrage returns. This is because efficient capital allocation would dictate that capital be spent where the best opportunity is, thus dictating the arbitrage opportunites we calculate via different product would have correlating rates as capital would be allocated to a different source if the arbitrage opportunity looks more attractive.

## CDS Par Spread Returns

In the following notebook, we will walk through the steps to constructing the implied arbitrage found in the CDS and corporate bond market. 

INSERT EXPLANATION HERE and SOME LATEX EQUATIONS





In [38]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
from tqdm import tqdm

import ctypes
from scipy.interpolate import CubicSpline

from merge_bond_treasury_redcode import *
from merge_cds_bond import *
from process_final_product import *

%load_ext autoreload
%autoreload 2


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [39]:
DATA_DIR = "../../FS-project-files"

# Replace with below later
# DATA_DIR = config("DATA_DIR")


# Initial Pull and analysis
TREASURY_ISSUE_FILE_NAME = "issue_data.parquet"
TREASURY_MONTHLY_FILE_NAME = "monthly_ts_data.parquet"
CORPORATES_MONTHLY_FILE_NAME = "wrds_bond.parquet"
RED_CODE_FILE_NAME = "RED_and_ISIN_mapping.parquet"

# Secondary Pull and final analysis
BOND_RED_CODE_FILE_NAME = "merged_bond_treas_red.pkl"
# BOND_RED_CODE_FILE_NAME = "merged_bond_treas_red.parquet"
CDS_FILE_NAME = "cds_final.pkl"
# CDS_FILE_NAME = "cds_final.parquet"
FINAL_ANALYSIS_FILE_NAME = "final_data.pkl"
# FINAL_ANALYSIS_FILE_NAME = "final_data.parquet"


### Step 1: Data Pull Part 1 (WRDS bond returns, Markit Redcode Mappings, WRDS Historical Treasuries)

First we pull corporate bond time series data (WRDS bond returns), treasury time series data, and some mapping features. 

The corporate bond time series data will be used to generate the FR value (calculated as the Z-spread of the corporate bonds). Treasury time series will also be used for this purpose as well as to generate treasury yields.

### Z-Spread of a Corporate Bond

The **Z-spread** (zero-volatility spread) of a corporate bond is the constant spread that must be added to the **zero-coupon Treasury yield curve** to make the **present value of the bond's cash flows equal to its market price**. 

### **Formula**
The Z-spread is found by solving the equation:

$$
P = \sum_{t=1}^{N} \frac{C_t}{(1 + r_t + Z)^t}
$$

where:
- $P$ = Market price of the bond  
- $C_t$ = Cash flow (coupon or principal) at time $t$  
- $r_t$ = Discount rate of the zero-coupon Treasury bond at time $t$  
- $Z$ = Z-spread  
- $N$ = Number of periods  

### Our construction Z-Spread Surrogate

The **Z-spread** accounts for different discount rates of a treasury. For our surrogate **Z-spread** ($z$) and in turn what we will be using to calculate **FR** will be the below representation.

$$
P = \sum_{t=1}^{N} \frac{C_t}{(1 + y_m + z)^t}
$$

where:
- $P$ = Market price of the bond  
- $C_t$ = Cash flow (coupon or principal) at time \( t \)  
- $y_m$ = Yield of the zero-coupon Treasury bond at time $t_m$
    - This is different because the yield is calculated at the time the market price is set is a fixed rate substitute for a varying discount rate
- $z$ = Z-spread surrogate
- $N$ = Number of periods 




### Step 1.1: Treasury merge

Since this step does not take very long, we will try it out. Since the treasury timeseries dataframe does not have maturity dates, we need to use issue information to track those down. Below is our processing:

In [40]:
treasury_monthly_data = pd.read_parquet(f"{DATA_DIR}/{TREASURY_MONTHLY_FILE_NAME}")
treasury_issue_data = pd.read_parquet(f"{DATA_DIR}/{TREASURY_ISSUE_FILE_NAME}")

In [41]:
treasury_monthly_data.head()

Unnamed: 0,kycrspid,kytreasno,mcaldt,tmpubout,tmyld
0,19610622.8,200001.0,1960-10-31,,7.6e-05
1,19610622.8,200001.0,1960-11-30,,7.9e-05
2,19610622.8,200001.0,1960-12-30,,6.7e-05
3,19610622.8,200001.0,1961-01-31,,6.8e-05
4,19610622.8,200001.0,1961-02-28,,7.4e-05


In [42]:
treasury_issue_data.head()

Unnamed: 0,kycrspid,kytreasno,tmatdt
0,19610622.8,200001.0,1961-06-22
1,19610623.4,200002.0,1961-06-23
2,19610629.4,200003.0,1961-06-29
3,19610706.4,200004.0,1961-07-06
4,19610713.4,200005.0,1961-07-13


In [43]:
treasury_data_combined = generate_treasury_data(treasury_issue_data, treasury_monthly_data)

In [44]:
treasury_data_combined.head()

Unnamed: 0,kycrspid,kytreasno,mcaldt,tmpubout,tmatdt,treas_yld
0,19610622.8,200001.0,1960-10-31,,1961-06-22,0.028055
1,19610622.8,200001.0,1960-11-30,,1961-06-22,0.029241
2,19610622.8,200001.0,1960-12-30,,1961-06-22,0.024777
3,19610622.8,200001.0,1961-01-31,,1961-06-22,0.025169
4,19610622.8,200001.0,1961-02-28,,1961-06-22,0.027243


# Step 1.2 and 1.3: Merging treasury and bond time series, adding redcode mapping

## Step 1.2

Step 1.2 is relatively time intensive so we will not do it here. The code for it is in **merge_bond_treasury_redcode.py** in the function **merge_treasuries_into_bonds**. The more specific inputs are within the function itself.

The basic steps of the process are merging treasuries onto the corporate bonds using the start of the month data and the maturity dates. Given some mismatches of the end of month date, we use the year-month pair to match the report dates. The maturity dates usually do not match up relatively well. However, given how the treasuries are usually long term, we merge based upon a "day-window," where we grab treasury data from maturities within the day-window of the corporate bond's maturity and then choose the treasury with the highest public outstanding value. With higher amounts in the public, we can assume there is generally more liquidity and the priced treasury yield is more accurate. However, given the scarcity of the public outstanding data, we will choose the first treasury in the dataframe if there is no public outstanding data.

## Step 1.3

Step 1.3 is simpled compared to 1.2. However, since there is no convenient displayable product for step 1.2, we will touch on the steps of 1.3 and display the outputs. The code for it is in **merge_bond_treasury_redcode.py** in the function **merge_redcode_into_bond_treas**. The more specific inputs are within the function itself.

Given CDS tables record issuers of the Credit Default Swaps using Redcode and the bond tables only had CUSIPs, we needed to merge a redcode-CUSIP matching table to the end product of step 1.2 for CDS merging later on.

We will pull the results without processing for CDS implied arbitrage returns.

In [45]:
bond_redcode_merged_data = pd.read_pickle(f"{DATA_DIR}/{BOND_RED_CODE_FILE_NAME}")

In [46]:
bond_redcode_merged_data.head()

Unnamed: 0,cusip,company_symbol,date,maturity,amount_outstanding,yield,rating,price_eom,t_spread,treas_yld,issuer_cusip,redcode
0,001957AM1,T,2002-07-31,2004-04-01,400000.0,0.0856,1,97.213129,0.014847,0.02034,1957,001AEC
1,001957AM1,T,2002-07-31,2004-04-01,400000.0,0.0856,1,97.213129,0.014847,0.02034,1957,0A226X
2,001957AM1,T,2002-08-31,2004-04-01,400000.0,0.062781,1,100.684813,0.011224,0.019416,1957,001AEC
3,001957AM1,T,2002-08-31,2004-04-01,400000.0,0.062781,1,100.684813,0.011224,0.019416,1957,0A226X
4,001957AM1,T,2002-09-30,2004-04-01,400000.0,0.06696,1,100.066504,0.007308,0.015797,1957,001AEC


# Step 2: CDS data pull and CDS data processing

## Step 2.1: CDS data pull

The CDS data pull will be filtered using the redcodes from the above **bond_redcode_merged_data** dataframe, ensuring that only the firms that have corporate bond data are pulled from the CDS table. This data from Markit is daily.

## Step 2.2: CDS data processing

Let's first observe the data to see what we are working with:


In [48]:
cds_data = pd.read_pickle(f"{DATA_DIR}/{CDS_FILE_NAME}")

In [49]:
cds_data.head()

Unnamed: 0,date,ticker,redcode,parspread,tenor,tier,country,year
0,2002-01-01,T,001AEC,0.017589,10Y,SNRFOR,United States,2002
1,2002-01-01,T,001AEC,0.016295,10Y,SNRFOR,United States,2002
2,2002-01-01,T,001AEC,0.015566,10Y,SNRFOR,United States,2002
3,2002-01-01,T,001AEC,0.013413,1Y,SNRFOR,United States,2002
4,2002-01-01,T,001AEC,0.012417,1Y,SNRFOR,United States,2002


The CDS data has a flaw: the **tenor** is displayed as opposed to **maturity date** which would allow for more accurate cubic splines of the par spread. To approximate the correct number of days, we use tenor as is and annualize. 

For example, if the tenor is $3Y$, the number of days that we use to annualize is $3 \times 365 = 1095$. 

In our processing function **merge_cds_into_bonds**, we grab the **redcode, date** tuples for which we can generate a good cubic spline function, filter the bond and treasury dataframe (output of step 1). 

Then, we use the days between the **maturity** and the **date** for each corporate bond as the input for the cubic spline function for par spread generation. Thus, we end up with the final dataframe with bond, treasury, and cds data all merged together.

In [50]:
final_data = merge_cds_into_bonds(bond_redcode_merged_data, cds_data)

  par_df = par_df.applymap(safe_convert)


In [51]:
final_data.head()

Unnamed: 0,cusip,date,maturity,yield,rating,treas_yld,par_spread,t_spread,price_eom,amount_outstanding
0,001957AM1,2002-07-31,2004-04-01,0.0856,1,0.02034,0.069154,0.014847,97.213129,400000.0
4,001957AM1,2002-09-30,2004-04-01,0.06696,1,0.015797,0.046843,0.007308,100.066504,400000.0
6,001957AM1,2002-10-31,2004-04-01,0.06667,0,0.014777,0.039719,0.007682,100.112053,400000.0
10,001957AM1,2002-12-31,2004-04-01,0.036044,0,0.012791,0.022669,0.010935,103.810897,123856.0
12,001957AM1,2003-01-31,2004-04-01,0.036545,0,0.013383,0.024194,0.01009,103.5,123856.0
