# Table 2 Measure of Illiquidity

This notebook walks through illiquidity calculations based on methodology in The Illiquidity of Corporate Bonds, Bao, Pan, and Wang (2010). In the paper, calculations are based on corporate bond data from 2003-04-14 to 2009-06-30.

  - In order to avoid re-running the notebook every time it changes (it changes often, even by the act of opening it) and to only rerun it if meaningful changes have been made, the build system only looks for changes in the plaintext version of the notebook. That is, the notebook is converted to a Python script via [nbconvert](https://nbconvert.readthedocs.io/en/latest/), which is often packaged with Jupyter.
  Then, DoIt looks for changes to the Python version. If it detects a difference, then the notebook is re-run. (Note, that you could also convert to a Markdown file with 
  [JupyText](https://github.com/mwouts/jupytext). However, this package is often not packaged with Jupyter.)
  - Since we want to use Jupyter Notebooks for exploratory reports, we want to keep fully-computed versions of the notebook (with the output intact). However, earlier I said that I strip the notebook of its output before committing to version control. Well, to keep the output, every time PyDoit runs the notebook, it outputs an HTML version of the freshly run notebook and saves that HTML report in the `output` directory. That way, you will be able to view the finished report at any time without having to open Jupyter.

### <font color='purple'>Overview of Outputs

#### * Table 2 Measure of Illiquidity:
- ##### Panel A Individual Bonds (The mean and average monthly illiquidity per bond per year)
    - Using trade-by-trade data
    - Using daily data
- ##### Panel B Bond Portfolio
    - Equal-weighted: Consider a daily portfolio composed of all bonds, with equally weighted bond returns used to calculate annual illiquidity
    - Issuance-weighted: Consider a daily portfolio composed of all bonds, with issuance weighted bond returns used to calculate annual illiquidity
- ##### Panel C Implied by quoted bid-ask spread
    - Mean and median monthly bond bid-ask spread per year

#### * Summary Statistics of Monthly Per Bond Illiquidity Using Daily Data
#### * Panel A and Summary Statistics Using MMN corrected data
#### * Replicate the Tables in the Paper (2003-04-14 to 2009-06-30) 
#### * Update the Tables to the present (2003-04-14 to present)
#### 

In [None]:
from IPython.display import Image
Image("../assets/table2_screenshot.jpg")

In [None]:
import config

OUTPUT_DIR = config.OUTPUT_DIR
DATA_DIR = config.DATA_DIR

In [None]:
import pandas as pd
from tqdm import tqdm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from scipy import stats
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
from statsmodels.stats.stattools import durbin_watson
from statsmodels.regression.linear_model import OLS
from statsmodels.stats.sandwich_covariance import cov_hac
from statsmodels.tools.tools import add_constant
import config

import warnings
warnings.filterwarnings('ignore')

In [None]:
import misc_tools
import load_wrds_bondret
import load_opensource
import data_processing as data
import table2_calc_illiquidity as calc_illiquidity
import table2_plot_illiquidity as plot

In [None]:
# Define time frames used in the paper and the updated time stamp
today = datetime.today().strftime('%Y-%m-%d')
start_date = '2003-04-14'
end_date = '2009-06-30' 

# Step 1: Clean Merged Data for Daily Illiquidity Calculation

Before calculating illiquidity measures, it's essential to ensure that our corporate bond data is accurate and relevant. The `clean_merged_data` function takes care of preparing the pre-cleaned merged monthly and daily data by performing several critical cleaning steps:

- Loads and merges the relevant datasets within the specified date range.
- Removes any records with missing crucial price information and sorts the data chronologically.
- Adjusts for trade execution dates by incorporating a time lag to identify consecutive trades for the same bond, and filters out those that do not fall within a one-week window, accounting for holidays.
- Consolidates the cleaned data, readying it for the subsequent illiquidity analysis.

This step is crucial to ensure that the subsequent calculations are based on a dataset that reflects true trading activity without distortions from missing data or trades too far apart in time.


In [None]:
cleaned_df_paper = calc_illiquidity.clean_merged_data(start_date, end_date)
cleaned_df_paper.head()

In [None]:
cleaned_df_new = calc_illiquidity.clean_merged_data(start_date, today)
cleaned_df_new.head()

# Step 2: Calculate Price Changes and Perform Additional Cleaning

In this part of the analysis pipeline, we use the `calc_deltaprc` function to compute daily price changes for corporate bonds, designed to operate on cleaned and merged daily corporate bond trade data.

This calculation is based on the Measure of Illiquidity on page 10 and 11 of the peper: $ \gamma = -\text{Cov}(p_t - p_{t-1}, p_{t+1} - p_t) $. The process involves several steps:
- Calculation of Log Prices: Transform cleaned prices to log prices for more stable numerical properties.
- Lagged and Lead Price Changes: Determine the price changes by computing lagged and lead log prices.
- Restricting Returns: Ensure that calculated price changes (returns) are within the range of -100% to 100%.
- Conversion to Percentage: Change the representation of price changes from decimal to percentage for clarity.
- Cleaning Data: Remove entries with incomplete information to maintain the quality of the dataset.
- Filtering by Trade Count: Exclude bonds with fewer than 10 trade observations to focus on more reliable data.

This function is essential for preparing the bond price data for accurate calculation of financial metrics such as illiquidity.


In [None]:
df_paper = calc_illiquidity.calc_deltaprc(cleaned_df_paper)
df_paper.head()

In [None]:
df_new = calc_illiquidity.calc_deltaprc(cleaned_df_new)
df_new.head()

# Step 3: Panel A Individual Bond: Illiquidity Metrics Calculation Using Daily Bond Data

This step involves using the `calc_annual_illiquidity_table_daily` function to calculate and summarize annual illiquidity metrics for corporate bonds. The function takes daily bond data as input and computes several statistics that capture the illiquidity of bonds on an annual basis. `create_annual_illiquidity_table` function is used as the last step in `calc_annual_illiquidity_table_daily` to generate illiquidity table with significance percentage, robust t-stat, mean and median. 

- Computes the illiquidity for each bond by month by taking the negative of the covariance between daily price changes (`deltap`) and their lagged values (`deltap_lag`).

- Aggregated the monthly illiquidity measures to obtain annual statistics, including mean and median illiquidity.

- Calculates t-statistics for the mean illiquidity of each bond and year and determines the percentage of these t-stats that are significant (>= 1.96).

- Calculates robust t-stats are calculated using OLS with HAC (heteroskedasticity and autocorrelation consistent) standard errors.

- Calculate overall statistics across the full sample period.

- Compiles all these metrics into a table that presents the mean and median illiquidity, the percentage of significant t-statistics, and robust t-statistics for each year, as well as for the full sample period.

This comprehensive illiquidity metric calculation allows us to understand the annual and overall liquidity characteristics of the corporate bond market.

In [None]:
illiq_daily_paper, table2_daily_paper = calc_illiquidity.calc_annual_illiquidity_table_daily(df_paper)
table2_daily_paper

In [None]:
illiq_daily_new, table2_daily_new = calc_illiquidity.calc_annual_illiquidity_table_daily(df_new)
table2_daily_new

# Step 4: Summary Statistics Compilation Using Daily Illiquidity Data

This step entails utilizing the `create_summary_stats` function to compile key summary statistics that characterize daily illiquidity data for corporate bonds over different years--min, mean, median, max, 25%, 75% std monthly illiquidity per cusip and mean t-stat. This aids in understanding the distribution and central tendencies of bond illiquidity and t-statistics on an annual basis.

In [None]:
illiq_daily_summary_paper = calc_illiquidity.create_summary_stats(illiq_daily_paper)
illiq_daily_summary_paper

In [None]:
illiq_daily_paper[illiq_daily_paper['illiq'] > 2000]

In [None]:
illiq_daily_summary_new = calc_illiquidity.create_summary_stats(illiq_daily_new)
illiq_daily_summary_new

# Step 5: Panel A Using MMN Corrected Daily Bond Data

Now, we apply similar calculation in Step 3 and 4 using MMN corrected daily bond data. Since the MMN corrected daily bond data contains illiquidty directly, `calc_illiq_w_mmn_corrected` performs cleaning on MMN corrected data and apply `create_annual_illiquidity_table` to generate the similar Panel A (daily data) illiquidity final table, ready for comparison. We then use the in Step 4 to produce summary stats using cleaned MMN corrected daily bond data.

In [None]:
mmn_paper, table2_daily_mmn_paper = calc_illiquidity.calc_illiq_w_mmn_corrected(
    start_date, end_date, cleaned_df_paper)
table2_daily_mmn_paper

In [None]:
illiq_daily_summary_mmn_paper = calc_illiquidity.create_summary_stats(mmn_paper)
illiq_daily_summary_mmn_paper

In [None]:
mmn_paper.head()

In [None]:
mmn_new, table2_daily_mmn_new = calc_illiquidity.calc_illiq_w_mmn_corrected(
    start_date, today, cleaned_df_new)
table2_daily_mmn_new

In [None]:
illiq_daily_summary_mmn_new = calc_illiquidity.create_summary_stats(mmn_new)
illiq_daily_summary_mmn_new

# Step 6: Panel B Bond Portfolios: Portfolio-Based Annual Illiquidity Metrics Calculation

The `calc_annual_illiquidity_table_portfolio` function computes the illiquidity metrics for corporate bonds by constructing equal-weighted and issuance-weighted portfolio returns on a daily basis and then calculate portfolio illiquidity on an annual basis. The function systematically processes transaction-level bond data to assess market liquidity through portfolio aggregation, offering a more holistic view of the market dynamics. 

- Equal-Weighted Portfolio Calculation: Creat an equal-weighted portfolio for each trading day by averaging the daily price changes (deltap) and their lagged values (deltap_lag). It then groups these daily averages by year to calculate the negative covariance between the deltap and deltap_lag to derive the illiquidity measure for each year. Additionally, a t-statistic for the mean illiquidity of the equal-weighted portfolio is computed.

- Issuance-Weighted Portfolio Calculation: Each bond is calculated with its $ \text{issuance} = \text{offering amount} \times \text{principal amount} \times \text{offering price} / 100 / 1,000,000 $ , and all bonds deltap and deltap_lag are aggregated on a daily basis weighted by issurance. The following steps are similar to Equal-Weighted Portfolio Calculation.

- Calculate overall statistics across the full sample period.

- Compiles all these metrics into a table that presents the mean equal_weighted portfolio and t-stat, mean issuance-weighted portfolio illiquidity and t-stat for each year, as well as for the full sample period.

In [None]:
table2_port_paper = calc_illiquidity.calc_annual_illiquidity_table_portfolio(df_paper)
table2_port_paper

In [None]:
table2_port_new = calc_illiquidity.calc_annual_illiquidity_table_portfolio(df_new)
table2_port_new

# Step 7: Panel C Implied by Quoted Bid-Ask Spreads: Annual Implied Illiquidity Using Monthly Quoted Bid-Ask Spread

In this section, we focus on analyzing the illiquidity implied by quoted bid-ask spreads of corporate bonds on an annual basis using `calc_annual_illiquidity_table_spd`. 


- For each year, calculates the mean and median of the monthly `t_spread`, which represent the implied gamma. 

- Calculate overall statistics across the full sample period.

- Compiles all these metrics into a table that presents the mean and median implied illiquidity for each year, as well as for the full sample period.

By computing these statistics, the function provides insights into the liquidity of the corporate bond market as implied by the bid-ask spreads over time. As shown in the paper, not only does the quoted bid-ask spread fail to capture the overall level of illiquidity, but it also fails to explain the cross-sectional variation in bond illiquidity and its asset pricing implications.

In [None]:
def calc_annual_illiquidity_table_spd(df):
    """Calculate mean and median gamma implied by quoted bid-ask spreads by year.
    """
    df_unique = df.groupby(['cusip', 'month_year'])['t_spread'].first().reset_index()
    df_unique['year'] = df_unique['month_year'].dt.year  
    df_unique = df_unique.sort_values(by='month_year')

    Illiq_mean_table = df_unique.groupby('year')['t_spread'].mean()
    overall_illiq_mean = df_unique['t_spread'].mean()
    overall_illiq_median = df_unique['t_spread'].median()
    
    table2_spd = pd.DataFrame({
        'Year': Illiq_mean_table.index,
        'Mean implied gamma': df_unique.groupby('year')['t_spread'].mean(),
        'Median implied gamma': df_unique.groupby('year')['t_spread'].median(),
    }).reset_index(drop=True)
    
    overall_data = pd.DataFrame({
        'Year': ['Full'],
        'Mean implied gamma': [overall_illiq_mean], 
        'Median implied gamma': [overall_illiq_median]
    })
    
    table2_spd = pd.concat([table2_spd, overall_data], ignore_index=True)
    
    return table2_spd

In [None]:
table2_spd_paper = calc_illiquidity.calc_annual_illiquidity_table_spd(df_paper) 
table2_spd_paper

In [None]:
table2_spd_new = calc_illiquidity.calc_annual_illiquidity_table_spd(df_new) 
table2_spd_new

# Step 8: Monthly Illiquidity Per Bond and Average Illiquidity By Year

The `plot_illiquidity` function visualizes both monthly bond illquidity observations and annual trends. 

- Monthly Illiquidity Per Bond: This granular data paves the way for an in-depth examination of liquidity at the bond level, month by month, in scatter.

- Annual Illiquidity Summary Insights: Visualize Table 2 Panel 1 mean and median illiquidity using daily data, shown as the lines on the plot. Red line indicates the mean, purple line the median. Mean is much higher than median around ~2008-2009, suggesting high illquidity outliers.

- The Zoomed-In Analysis: Acknowledging the potential distortion by extreme values, the function prudently narrows down the focus in the second subplot. By honing in on a more typical range of illiquidity values, it effectively filters out the outliers, thereby furnishing a clearer, more focused analysis of the prevalent liquidity patterns.

We have used both original data and MMN corrected data to generate seperate plots.

In [None]:
# Plot using original data, 2003-2009
plot.plot_illiquidity(illiq_daily_paper, illiq_daily_summary_paper, "2003-2009")

In [None]:
# Plot using original data, 2003-2023
plot.plot_illiquidity(illiq_daily_new, illiq_daily_summary_new, "2003-2023")

In [1]:
# Plot using MMN corrected data, 2003-2009
plot.plot_illiquidity(mmn_paper, illiq_daily_summary_mmn_paper, "MMN_Corrected, 2003-2009")

NameError: name 'plot' is not defined

In [None]:
# Plot using MMN corrected data, 2003-2023
plot.plot_illiquidity(mmn_new, illiq_daily_summary_mmn_new, "MMN_Corrected, 2003-2023")