# DataProcessing for replicating table 1 and table 2 

- This notebook walks through the data processing steps for Table 1 and 2 calculation based on methodology in The Illiquidity of Corporate Bonds, Bao, Pan, and Wang (2010).


In [1]:
!pip uninstall decouple --yes
!pip install python-decouple
#make sure decouple can be loaded

import pandas as pd
import config
import load_wrds_bondret
import load_opensource
import data_processing

pd.set_option('display.max_columns', None)

OUTPUT_DIR = config.OUTPUT_DIR
DATA_DIR = config.DATA_DIR

Found existing installation: decouple 0.0.7
Uninstalling decouple-0.0.7:
  Successfully uninstalled decouple-0.0.7


  from pandas.core import (


In [2]:
df_bondret = load_wrds_bondret.load_bondret(data_dir = DATA_DIR)
df_daily = load_opensource.load_daily_bond(data_dir=DATA_DIR)



# Data Processing

In this part, we merge and process the data necessary to reproduce table 1 in the paper, which is from daily opensource pre-processed data downloaded from https://openbondassetpricing.com/ and WRDS Bondret.

- All_trace_data_merge function:

    This function merge the TRARCE opensource pre-processed data downloaded from https://openbondassetpricing.com/ with the montly Bondret data from WRDS based on same CUSIP and time. 
    Given that the opensource pre-processed data is reported on a daily basis  vs. Bondret data is reported on a monthly basis, to merge them together, we change opensource pre-processed data to montly basis, with the assumption that time-dependent variables from Bondret remains unchanged within an given month. 

By doing that, the aggregated information will help us produce summary statistics for table 1 in the paper, with bond characteristics such as issuance, maturity, age, rating, etc.


In [3]:
df_all = data_processing.all_trace_data_merge(df_daily, df_bondret)
df_all

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_daily['trd_exctn_dt'] = pd.to_datetime(df_daily['trd_exctn_dt'])


Unnamed: 0,cusip,trd_exctn_dt,prclean,month_time,date,price_eom,tmt,t_volume,t_dvolume,t_spread,offering_amt,offering_price,principal_amt,maturity,coupon,ncoups,amount_outstanding,r_mr,n_mr,offering_date,year
0,000361AB1,2003-04-14,98.601200,2003-04,2003-04-30,99.000,0.466667,808000.0,694028.5,0.025447,50000.0,100.0,1000.0,2003-10-15,7.250,2.0,50000.0,B1,14.0,1993-10-12,2003
1,000361AB1,2003-04-15,82.769600,2003-04,2003-04-30,99.000,0.466667,808000.0,694028.5,0.025447,50000.0,100.0,1000.0,2003-10-15,7.250,2.0,50000.0,B1,14.0,1993-10-12,2003
2,000361AB1,2003-04-16,99.000000,2003-04,2003-04-30,99.000,0.466667,808000.0,694028.5,0.025447,50000.0,100.0,1000.0,2003-10-15,7.250,2.0,50000.0,B1,14.0,1993-10-12,2003
3,000361AB1,2003-05-06,87.500000,2003-05,2003-05-31,85.000,0.380556,342000.0,305580.0,0.013705,50000.0,100.0,1000.0,2003-10-15,7.250,2.0,50000.0,B1,14.0,1993-10-12,2003
4,000361AB1,2003-05-07,91.359100,2003-05,2003-05-31,85.000,0.380556,342000.0,305580.0,0.013705,50000.0,100.0,1000.0,2003-10-15,7.250,2.0,50000.0,B1,14.0,1993-10-12,2003
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4933437,949746PF2,2008-12-17,55.563510,2008-12,2008-12-31,54.500,4.494444,114000.0,62130.0,0.000000,3750.0,100.0,1000.0,2013-06-06,0.000,0.0,3750.0,,,2008-05-30,2008
4933438,38141EN88,2009-04-18,99.843891,2009-04,NaT,,,,,,,,,,,,,,,,2009
4933439,35671DAZ8,2005-10-11,91.433990,2005-10,2005-10-31,91.434,17.622222,30000.0,27430.2,,1998776.0,,1000.0,2023-03-15,3.875,2.0,,,,2013-09-09,2005
4933440,931142DF7,2007-02-17,99.570096,2007-02,NaT,,,,,,,,,,,,,,,,2007


- Sample selection function:
  this function selection samples included in paper，following the below steps as outlined in the paper 

  1）select Phase I and II bonds from 2003-04-14 to 2009-6-30 

  2）drop all bonds that only exist after the date of phase 3: Feb 7 2005

  3）make sure the bonds are traded on at least 75% of its relevant business days

  4）make sure the bonds are traded in more than 11 days to have 10 observations of (pt, p(t-1))

  5）make sure the bonds all exist for at least one full year 
  
  6）drop all non investment-grade bonds using moody's rating

By applying those filters we can shortlist the bonds included in sample selection in the paper


In [4]:
df_sample = data_processing.sample_selection(df_all)
df_sample

Unnamed: 0,cusip,trd_exctn_dt,prclean,month_time,date,price_eom,tmt,t_volume,t_dvolume,t_spread,offering_amt,offering_price,principal_amt,maturity,coupon,ncoups,amount_outstanding,r_mr,n_mr,offering_date,year
5447,001957AP4,2003-04-14,106.402800,2003-04,2003-04-30,108.103919,3.133333,33119000.0,3.540985e+07,0.008763,500000.0,99.530,1000.0,2006-06-01,7.50,2.0,320167.0,BAA2,9.0,1994-06-02,2003
5448,001957AP4,2003-04-15,106.392299,2003-04,2003-04-30,108.103919,3.133333,33119000.0,3.540985e+07,0.008763,500000.0,99.530,1000.0,2006-06-01,7.50,2.0,320167.0,BAA2,9.0,1994-06-02,2003
5449,001957AP4,2003-04-16,106.953001,2003-04,2003-04-30,108.103919,3.133333,33119000.0,3.540985e+07,0.008763,500000.0,99.530,1000.0,2006-06-01,7.50,2.0,320167.0,BAA2,9.0,1994-06-02,2003
5450,001957AP4,2003-04-17,106.930699,2003-04,2003-04-30,108.103919,3.133333,33119000.0,3.540985e+07,0.008763,500000.0,99.530,1000.0,2006-06-01,7.50,2.0,320167.0,BAA2,9.0,1994-06-02,2003
5451,001957AP4,2003-04-21,106.180801,2003-04,2003-04-30,108.103919,3.133333,33119000.0,3.540985e+07,0.008763,500000.0,99.530,1000.0,2006-06-01,7.50,2.0,320167.0,BAA2,9.0,1994-06-02,2003
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3995670,59018YUZ2,2009-06-24,101.142400,2009-06,2009-06-30,101.346440,0.619444,112424000.0,1.133605e+08,0.005601,1500000.0,99.715,1000.0,2010-02-08,4.25,2.0,1500000.0,A2,6.0,2005-02-02,2009
3995671,59018YUZ2,2009-06-25,100.956900,2009-06,2009-06-30,101.346440,0.619444,112424000.0,1.133605e+08,0.005601,1500000.0,99.715,1000.0,2010-02-08,4.25,2.0,1500000.0,A2,6.0,2005-02-02,2009
3995672,59018YUZ2,2009-06-26,101.185300,2009-06,2009-06-30,101.346440,0.619444,112424000.0,1.133605e+08,0.005601,1500000.0,99.715,1000.0,2010-02-08,4.25,2.0,1500000.0,A2,6.0,2005-02-02,2009
3995673,59018YUZ2,2009-06-29,100.907400,2009-06,2009-06-30,101.346440,0.619444,112424000.0,1.133605e+08,0.005601,1500000.0,99.715,1000.0,2010-02-08,4.25,2.0,1500000.0,A2,6.0,2005-02-02,2009
