# Illiquidity Calculation

  - This notebook walks through illiquidity calculations based on methodology in The Illiquidity of Corporate Bonds, Bao, Pan, and Wang (2010).

  - In order to avoid re-running the notebook every time it changes (it changes often, even by the act of opening it) and to only rerun it if meaningful changes have been made, the build system only looks for changes in the plaintext version of the notebook. That is, the notebook is converted to a Python script via [nbconvert](https://nbconvert.readthedocs.io/en/latest/), which is often packaged with Jupyter.
  Then, DoIt looks for changes to the Python version. If it detects a difference, then the notebook is re-run. (Note, that you could also convert to a Markdown file with 
  [JupyText](https://github.com/mwouts/jupytext). However, this package is often not packaged with Jupyter.)
  - Since we want to use Jupyter Notebooks for exploratory reports, we want to keep fully-computed versions of the notebook (with the output intact). However, earlier I said that I strip the notebook of its output before committing to version control. Well, to keep the output, every time PyDoit runs the notebook, it outputs an HTML version of the freshly run notebook and saves that HTML report in the `output` directory. That way, you will be able to view the finished report at any time without having to open Jupyter.

In [3]:
import config

OUTPUT_DIR = config.OUTPUT_DIR
DATA_DIR = config.DATA_DIR

In [4]:
import pandas as pd
from tqdm import tqdm
import numpy as np
import glob
from scipy import stats
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
from statsmodels.stats.stattools import durbin_watson
from statsmodels.regression.linear_model import OLS
from statsmodels.stats.sandwich_covariance import cov_hac
from statsmodels.tools.tools import add_constant

In [5]:
import misc_tools
import load_wrds_bondret
import load_opensource
import load_intraday
import data_processing as data

In [62]:
# Load the raw intraday data


# Step 1: Clean Merged Data for Intraday Illiquidity Calculation

Before calculating illiquidity measures, it's essential to ensure that our corporate bond data is accurate and relevant. The `clean_intraday` function takes care of preparing the intraday data by performing several critical cleaning steps:

- Merges trade data and trade time to timestamp
- Runs Dickerson filters to remove trades that the per-filtering steps missed


In [60]:
def clean_intraday(start_date, end_date):
    df = load_intraday.load_intraday_TRACE()
    df = df[(df['trd_exctn_dt'] >= start_date) & (df['trd_exctn_dt'] <= end_date)]
    df['trd_exctn_dt'] = pd.to_datetime(df['trd_exctn_dt'])
    df['trd_exctn_tm'] = pd.to_datetime(df['trd_exctn_tm'], format='%H:%M:%S').dt.time
    df['trd_tmstamp'] = pd.to_datetime(df['trd_exctn_dt'].dt.strftime('%Y-%m-%d') + ' ' + df['trd_exctn_tm'].astype(str))
    
    # dickerson clean
    df = df[(df['days_to_sttl_ct'] <= 2.0) | (df['days_to_sttl_ct'] == None) | (df['days_to_sttl_ct'] == np.NAN)]
    df = df[df['wis_fl'] != 'Y']
    df = df[(df['lckd_in_ind'] != 'Y')]
    df = df[(df['sale_cndtn_cd'] == 'None') | (df['sale_cndtn_cd'] == '@')]
    df = df[df['entrd_vol_qt'] >= 10000]
    df = df[((df['rptd_pr'] > 5) & (df['rptd_pr'] < 1000))]
    
    df['month_year'] = df['trd_exctn_dt'].dt.to_period('M') 
    df.rename(columns={'rptd_pr': 'prclean', 'cusip_id':'cusip'}, inplace=True)
    df.sort_values(by=['cusip', 'trd_tmstamp'], inplace=True)
    return df

In [63]:
df = clean_intraday('2003-04-14', '2009-06-30')
df.head()

Unnamed: 0,cusip,bond_sym_id,trd_exctn_dt,trd_exctn_tm,days_to_sttl_ct,lckd_in_ind,wis_fl,sale_cndtn_cd,msg_seq_nb,trc_st,trd_rpt_dt,trd_rpt_tm,entrd_vol_qt,prclean,yld_pt,asof_cd,orig_msg_seq_nb,rpt_side_cd,cntra_mp_id,trd_tmstamp,month_year
2901513,001546AE0,AKS.GC,2003-04-14,12:05:33,0.0,,N,@,10675,T,2003-04-14,12:05:47,5000000.0,94.375,9.1368,,,B,C,2003-04-14 12:05:33,2003-04
2901514,001546AE0,AKS.GC,2003-04-14,13:00:16,0.0,,N,@,14878,T,2003-04-14,13:00:47,1000000.0,93.625,9.312567,,,B,C,2003-04-14 13:00:16,2003-04
2901515,001546AE0,AKS.GC,2003-04-14,13:00:44,0.0,,N,@,14891,T,2003-04-14,13:01:07,1000000.0,93.9375,9.23912,,,S,C,2003-04-14 13:00:44,2003-04
2901516,001546AE0,AKS.GC,2003-04-14,13:07:28,0.0,,N,@,15340,T,2003-04-14,13:07:42,2000000.0,93.5,9.342031,,,B,C,2003-04-14 13:07:28,2003-04
2901517,001546AE0,AKS.GC,2003-04-14,13:07:52,0.0,,N,@,15352,T,2003-04-14,13:07:57,1000000.0,94.0,9.224466,,,S,C,2003-04-14 13:07:52,2003-04


# Step 2: Calculate Price Changes and Perform Additional Cleaning

In this part of the analysis pipeline, we use the `calc_deltaprc` function to compute daily price changes for corporate bonds, designed to operate on cleaned and merged daily corporate bond trade data.

This calculation is based on the Measure of Illiquidity on page 10 and 11 of the peper: $ \gamma = -\text{Cov}(p_t - p_{t-1}, p_{t+1} - p_t) $. The process involves several steps:
- Calculation of Log Prices: Transform cleaned prices to log prices for more stable numerical properties.
- Lagged and Lead Price Changes: Determine the price changes by computing lagged and lead log prices.
- Restricting Returns: Ensure that calculated price changes (returns) are within the range of -100% to 100%.
- Conversion to Percentage: Change the representation of price changes from decimal to percentage for clarity.
- Cleaning Data: Remove entries with incomplete information to maintain the quality of the dataset.
- Filtering by Trade Count: Exclude bonds with fewer than 10 trade observations to focus on more reliable data.

This function is essential for preparing the bond price data for accurate calculation of financial metrics such as illiquidity.

In [64]:
def calc_deltaprc(df):
    """Calculate delta price and delta price_lag for each intraday trade with additional cleaning.
    """

    # Calculate lagged and lead log prices, and corresponding delta p (percentage returns)
    df['logprc'] = np.log(df['prclean'])
    df['logprc_lag'] = df.groupby('cusip')['logprc'].shift(1)
    df['deltap'] = df['logprc'] - df['logprc_lag']

    # Restrict log returns to be in the interval [1,1]
    df['deltap'] = np.where(df['deltap'] > 1, 1, df['deltap'])
    df['deltap'] = np.where(df['deltap'] < -1, -1, df['deltap'])

    # Convert deltap to % i.e. returns in % as opposed to decimals
    df['deltap'] = df['deltap'] * 100

    # Repeat similar process for deltap_lag
    df['logprc_lead'] = df.groupby('cusip')['logprc'].shift(-1)
    df['deltap_lag'] = df['logprc_lead'] - df['logprc']
    df['deltap_lag'] = np.where(df['deltap_lag'] > 1, 1, df['deltap_lag'])
    df['deltap_lag'] = np.where(df['deltap_lag'] < -1, -1, df['deltap_lag'])
    df['deltap_lag'] = df['deltap_lag'] * 100

    # Drop NAs in deltap, deltap_lag and bonds < 10 observations of the paired price changes
    df_final = df.dropna(subset=['deltap', 'deltap_lag',
                                 'prclean'])  # 'offering_date', 'price_ldm', 'offering_price', 'amount_outstanding'])

    return df_final

In [65]:
df = calc_deltaprc(df)
df.head()

Unnamed: 0,cusip,bond_sym_id,trd_exctn_dt,trd_exctn_tm,days_to_sttl_ct,lckd_in_ind,wis_fl,sale_cndtn_cd,msg_seq_nb,trc_st,trd_rpt_dt,trd_rpt_tm,entrd_vol_qt,prclean,yld_pt,asof_cd,orig_msg_seq_nb,rpt_side_cd,cntra_mp_id,trd_tmstamp,month_year,logprc,logprc_lag,deltap,logprc_lead,deltap_lag
2901514,001546AE0,AKS.GC,2003-04-14,13:00:16,0.0,,N,@,14878,T,2003-04-14,13:00:47,1000000.0,93.625,9.312567,,,B,C,2003-04-14 13:00:16,2003-04,4.539297,4.547276,-0.797877,4.54263,0.333223
2901515,001546AE0,AKS.GC,2003-04-14,13:00:44,0.0,,N,@,14891,T,2003-04-14,13:01:07,1000000.0,93.9375,9.23912,,,S,C,2003-04-14 13:00:44,2003-04,4.54263,4.539297,0.333223,4.537961,-0.466823
2901516,001546AE0,AKS.GC,2003-04-14,13:07:28,0.0,,N,@,15340,T,2003-04-14,13:07:42,2000000.0,93.5,9.342031,,,B,C,2003-04-14 13:07:28,2003-04,4.537961,4.54263,-0.466823,4.543295,0.533335
2901517,001546AE0,AKS.GC,2003-04-14,13:07:52,0.0,,N,@,15352,T,2003-04-14,13:07:57,1000000.0,94.0,9.224466,,,S,C,2003-04-14 13:07:52,2003-04,4.543295,4.537961,0.533335,4.547276,0.398143
2901518,001546AE0,AKS.GC,2003-04-14,13:10:53,0.0,,N,@,19051,T,2003-04-14,13:58:00,1000000.0,94.375,9.137,,,B,D,2003-04-14 13:10:53,2003-04,4.547276,4.543295,0.398143,4.546614,-0.066247


# Step 3: Annual Illiquidity Metrics Calculation

This step involves using the `calc_annual_illiquidity_table_intraday` function to calculate and summarize annual illiquidity metrics for corporate bonds. The function takes intraday bond data as input and computes several statistics that capture the illiquidity of bonds on an annual basis.

- Computes the illiquidity for each bond and month by taking the negative of the covariance between intraday price changes (`deltap`) and their lagged values (`deltap_lag`).

- Aggregated the monthly illiquidity measures to obtain annual statistics, including mean and median illiquidity.

- Calculates t-statistics for the mean illiquidity of each bond and year and determines the percentage of these t-stats that are significant (>= 1.96).

- Calculates robust t-stats are calculated using OLS with HAC (heteroskedasticity and autocorrelation consistent) standard errors.

- Calculate overall statistics across the full sample period.

- Compiles all these metrics into a table that presents the mean and median illiquidity, the percentage of significant t-statistics, and robust t-statistics for each year, as well as for the full sample period.

This comprehensive illiquidity metric calculation allows us to understand the annual and overall liquidity characteristics of the corporate bond market.

In [66]:
def create_annual_illiquidity_table(Illiq_month):
    """Create Panel A illquidity table with cleaned monthly illiquidity data."""

    overall_illiq_mean = np.mean(Illiq_month['illiq'])
    overall_illiq_median = Illiq_month['illiq'].median()

    # Calculate t-statistics for each cusip in each year
    Illiq_month['t stat'] = Illiq_month.groupby(['cusip', 'year'])['illiq'].transform(
        lambda x: (x.mean() / x.sem()) if x.sem() > 0 else np.nan)

    # Identify the entries with t-stat >= 1.96 and calculate the percentage of significant t-stats for each year
    Illiq_month['significant'] = Illiq_month['t stat'] >= 1.96
    percent_significant = Illiq_month.groupby('year')['significant'].mean() * 100
    Illiq_month = Illiq_month.dropna(subset=['illiq', 't stat'])
    overall_percent_significant = Illiq_month['significant'].mean() * 100
    
    # Calculate robust t-stat for each year
    def get_robust_t_stat(group):
        """Run OLS on a constant term only (mean of illiq) to get the intercept's t-stat."""
        X = add_constant(group['illiq'])
        ols_result = OLS(group['illiq'], X).fit(cov_type='HAC', cov_kwds={'maxlags':1})

        return abs(ols_result.tvalues[0])


    robust_t_stats = Illiq_month.groupby('year').apply(get_robust_t_stat)
    
    
    def calculate_overall_robust_t_stat(series):
        X = add_constant(series)
        ols_result = OLS(series, X).fit(cov_type='HAC', cov_kwds={'maxlags':1})
        return abs(ols_result.tvalues[0])

    # Call the function and assign the result to overall_robust_t_stat
    overall_robust_t_stat = calculate_overall_robust_t_stat(Illiq_month['illiq'].dropna())

    # Combine the results
    table2_daily = pd.DataFrame({
        'Year': robust_t_stats.index,
        'Mean illiq': Illiq_month.groupby('year')['illiq'].mean(),
        'Median illiq': Illiq_month.groupby('year')['illiq'].median(),
        'Per t greater 1.96': percent_significant,
        'Robust t stat': robust_t_stats.values
    }).reset_index(drop=True)
    
    overall_data = pd.DataFrame({
        'Year': ['Full'],
        'Mean illiq': [overall_illiq_mean],
        'Median illiq': [overall_illiq_median],
        'Per t greater 1.96': [overall_percent_significant],
        'Robust t stat': [overall_robust_t_stat]
    })

    table2_daily = pd.concat([table2_daily, overall_data], ignore_index=True)

    return Illiq_month, table2_daily

In [67]:
def calc_annual_illiquidity_table_intraday(df):
    """Calculate illiquidity = -cov(deltap, deltap_lag) using daily data, by month."""

    tqdm.pandas()
    
    Illiq_month = df.groupby(['cusip','month_year'] )[['deltap','deltap_lag']]\
        .progress_apply(lambda x: x.cov().iloc[0,1]) * -1
    Illiq_month = Illiq_month.reset_index()
    Illiq_month.columns = ['cusip','month_year','illiq']
    Illiq_month['year'] = Illiq_month['month_year'].dt.year
    Illiq_month = Illiq_month.dropna(subset=['illiq'])
    # Illiq_month = Illiq_month[Illiq_month['illiq'] < 2000]  # for outliers
    Illiq_month, table2_daily = create_annual_illiquidity_table(Illiq_month)
    
    return Illiq_month, table2_daily

In [69]:
table2_intraday = calc_annual_illiquidity_table_intraday(df)
table2_intraday

  base_cov = np.cov(mat.T, ddof=ddof)
  c *= np.true_divide(1, fact)
  c *= np.true_divide(1, fact)
100%|██████████| 53709/53709 [00:12<00:00, 4335.45it/s]
  return abs(ols_result.tvalues[0])
  return abs(ols_result.tvalues[0])
  return abs(ols_result.tvalues[0])
  return abs(ols_result.tvalues[0])
  return abs(ols_result.tvalues[0])
  return abs(ols_result.tvalues[0])
  return abs(ols_result.tvalues[0])
  return abs(ols_result.tvalues[0])


(           cusip month_year      illiq  year    t stat  significant
 0      001546AE0    2003-04   0.109455  2003  1.670839        False
 1      001546AE0    2003-05   0.501095  2003  1.670839        False
 2      001546AE0    2003-06   1.098252  2003  1.670839        False
 3      001546AE0    2003-07   3.128340  2003  1.670839        False
 4      001546AE0    2003-08  12.506482  2003  1.670839        False
 ...          ...        ...        ...   ...       ...          ...
 53704  984121BL6    2009-02   0.087494  2009  4.688525         True
 53705  984121BL6    2009-03   0.263273  2009  4.688525         True
 53706  984121BL6    2009-04   0.085292  2009  4.688525         True
 53707  984121BL6    2009-05   0.107095  2009  4.688525         True
 53708  984121BL6    2009-06   0.096407  2009  4.688525         True
 
 [53568 rows x 6 columns],
    Year  Mean illiq  Median illiq  Per t greater 1.96  Robust t stat
 0  2003    1.422394      0.380685           89.684691       3.558605
 1 