## Question 3 (60 Points)

### Background
Fama and French famously used Book-to-Market ratio (BM) and Market Capitalization (Size) to construct their well-known value and size factors. By performing an independent double sort on these two variables, they were able to capture the relationship between a stock’s size, value characteristics, and its future returns. In this assignment, you will replicate this type of factor construction by performing an independent double sort on BM and Size.

You are provided with three datasets:
1. **Book-to-Market Ratio (BM)** (a DataFrame): Each row corresponds to the end of a month, and each column represents a stock. The data file name is `bm.csv`.
2. **Market Capitalization (Size)** (a DataFrame): Each row corresponds to the end of a month, and each column represents a stock. The data file name is `market_cap.csv`.
3. **Monthly Stock Returns** (a DataFrame): Each row corresponds to a month, and each column represents a stock. The data file name is `stock_returns.csv`.

### Task
Your objective is to perform an independent double sort based on BM and Size, using the following guidelines:
- At the end of each month, divide stocks into 3 groups based on their BM and Size.
- For BM: Sort the stocks into low, medium, and high groups using the 30th and 70th percentiles of BM values for that month.
- For Size: Sort the stocks into small, medium, and large groups using the 30th and 70th percentiles of market capitalization for that month.

This will result in 9 portfolios (3 BM groups × 3 Size groups) for each month.

### Steps
- **Portfolio construction:**
  - For each month-end, assign stocks into one of the 9 portfolios based on their BM and Size values, using the 30th and 70th percentiles as cutoffs.
  - Calculate the value-weighted return for each portfolio for the following month. In a value-weighted portfolio, each stock’s weight is proportional to its market capitalization relative to the total market capitalization of all stocks in that portfolio.
  - If a stock’s return is missing in any month, ignore that stock for the return calculation of that month.
  - Rebalance the portfolios at the end of each month.

- **Return Analysis:** Test whether the average return for each portfolio is statistically significant (i.e., deriving the t-statistic and the p-values).
  
- **Discussion:** Within each Size group (small, medium, large), examine whether there is a pattern in the returns across the three BM groups (low, medium, high). Provide a brief discussion on any observable patterns.

### Submission and Deliverables:
- (30 Points) Please submit your Python code with a clear explanation of key steps/functions. We should be able to REPLICATE your results with your code!
- (15 Points) The t-statistics and p-values for the average returns of the 9 portfolios.
- (15 Points) A brief discussion on whether you observe any pattern in returns across the BM groups within each Size group.


---

### Load the data

In [1]:
import pandas as pd

# Load data
bm_df = pd.read_csv('bm.csv', index_col=0, parse_dates=True)
size_df = pd.read_csv('market_cap.csv', index_col=0, parse_dates=True)
returns_df = pd.read_csv('stock_returns.csv', index_col=0, parse_dates=True)

# Ensure the date format is consistent across all dataframes
bm_df.index = bm_df.index.strftime('%Y-%m-%d')
size_df.index = size_df.index.strftime('%Y-%m-%d')
returns_df.index = returns_df.index.strftime('%Y-%m-%d')

### Setting partitioning groups

In [2]:
def create_portfolio_groups(df, percentiles=[0.3, 0.7]):
    q1 = df.quantile(percentiles[0], axis=1)
    q2 = df.quantile(percentiles[1], axis=1)
    return pd.DataFrame({ '30th Percentile': q1, '70th Percentile': q2 })

bm_partition = create_portfolio_groups(bm_df)
size_partition = create_portfolio_groups(size_df)

In [3]:
# Partition the bm and size dataframes into their groups based on partitions
def set_partition_groups(bm_df, size_df, bm_partition, size_partition):
    # Initialise empty dictionary for storing the portfolios by month by group
    portfolios = {}

    # Iterate over each month
    for mth in bm_df.index:
        low_bm = bm_df.loc[mth][bm_df.loc[mth] <= bm_partition.loc[mth, '30th Percentile']].index
        medium_bm = bm_df.loc[mth][(bm_df.loc[mth] > bm_partition.loc[mth, '30th Percentile']) & (bm_df.loc[mth] <= bm_partition.loc[mth, '70th Percentile'])].index
        high_bm = bm_df.loc[mth][bm_df.loc[mth] > bm_partition.loc[mth, '70th Percentile']].index
        
        small_size = size_df.loc[mth][size_df.loc[mth] <= size_partition.loc[mth, '30th Percentile']].index
        medium_size = size_df.loc[mth][(size_df.loc[mth] > size_partition.loc[mth, '30th Percentile']) & (size_df.loc[mth] <= size_partition.loc[mth, '70th Percentile'])].index
        large_size = size_df.loc[mth][size_df.loc[mth] > size_partition.loc[mth, '70th Percentile']].index
        
        portfolios[mth] = {
            'low_bm_small_size': low_bm.intersection(small_size),
            'low_bm_medium_size': low_bm.intersection(medium_size),
            'low_bm_large_size': low_bm.intersection(large_size),
            'medium_bm_small_size': medium_bm.intersection(small_size),
            'medium_bm_medium_size': medium_bm.intersection(medium_size),
            'medium_bm_large_size': medium_bm.intersection(large_size),
            'high_bm_small_size': high_bm.intersection(small_size),
            'high_bm_medium_size': high_bm.intersection(medium_size),
            'high_bm_large_size': high_bm.intersection(large_size)
        }
        
    return portfolios

# call the function on the dataframes and partitions
portfolios = set_partition_groups(bm_df, size_df, bm_partition, size_partition)
portfolios


{'2000-01-28': {'low_bm_small_size': Index(['000038.SZ', '000065.SZ', '000430.SZ', '000511.SZ', '000516.SZ',
         '000517.SZ', '000519.SZ', '000552.SZ', '000565.SZ', '000582.SZ',
         '000591.SZ', '000592.SZ', '000603.SZ', '000605.SZ', '000606.SZ',
         '000619.SZ', '000655.SZ', '000672.SZ', '000676.SZ', '000711.SZ',
         '000719.SZ', '000736.SZ', '000790.SZ', '000805.SZ', '000811.SZ',
         '000819.SZ', '000827.SZ', '000838.SZ', '000861.SZ', '000863.SZ',
         '600067.SH', '600079.SH', '600137.SH', '600139.SH', '600148.SH',
         '600608.SH', '600616.SH', '600636.SH', '600656.SH', '600659.SH',
         '600671.SH', '600676.SH', '600711.SH', '600749.SH', '600762.SH',
         '600763.SH', '600782.SH', '600791.SH', '600794.SH', '600817.SH',
         '600825.SH', '600840.SH', '600850.SH', '600883.SH'],
        dtype='object'),
  'low_bm_medium_size': Index(['000018.SZ', '000019.SZ', '000032.SZ', '000045.SZ', '000078.SZ',
         '000402.SZ', '000409.SZ', '000415

### Calculate portfolio returns

In [None]:
import pandas as pd
import numpy as np

def calculate_portfolio_returns(portfolios, size_df, returns_df):
    portfolio_returns = {}

    for month, groupings in portfolios.items():
        if month not in returns_df.index:
            print(f"No stock return data for {month}. Skip.")
            continue
        
        print(f"\nProcessing month: {month}")
        month_returns = {}
        
        for portfolio, stocks in groupings.items():
            print(f"\n  Portfolio: {portfolio}")
            print(f"    Initial stocks in portfolio: {list(stocks)}")
            
            # Filter valid stocks
            valid_stocks = stocks.intersection(returns_df.columns).intersection(size_df.columns)
            valid_stocks = valid_stocks[returns_df.loc[month, valid_stocks].notna()]
            print(f"    Valid stocks after filtering non-null returns: {list(valid_stocks)}")
            
            if valid_stocks.size > 0:
                total_size_df = size_df.loc[month, valid_stocks].sum()
                print(f"    Total market cap for valid stocks: {total_size_df}")
                
                if total_size_df > 0:
                    weights = size_df.loc[month, valid_stocks] / total_size_df
                else:
                    print("    Warning: Since total market cap is 0, using equal weights.")
                    weights = pd.Series(1 / len(valid_stocks), index=valid_stocks)
                
                # Calculate value-weighted return
                month_returns[portfolio] = (returns_df.loc[month, valid_stocks] * weights).sum()
                print(f"    Value-weighted return for {portfolio}: {month_returns[portfolio]}")
            else:
                print(f"    No valid stocks for {portfolio}. Assigning NaN.")
                month_returns[portfolio] = np.nan

        portfolio_returns[month] = month_returns
    
    return pd.DataFrame(portfolio_returns).T


portfolio_returns = calculate_portfolio_returns(portfolios, size_df, returns_df)


### Return Analysis

In [None]:
from scipy import stats

def test_portfolio_significance(portfolio_returns_df):
    # Calculate average returns and perform t-tests
    results = {}
    alpha = 0.05
    
    for portfolio in portfolio_returns_df.columns:
        returns = portfolio_returns_df[portfolio].dropna()
        
        if len(returns) > 1:  # Ensure there's enough data for t-test
            mean_return = returns.mean()
            t_stat, p_value = stats.ttest_1samp(returns, 0)  # Test against zero return
            
            results[portfolio] = {
                'mean_return': mean_return,
                't_statistic': t_stat,
                'p_value': p_value,
                'significant?': 'yes' if p_value < alpha else 'no'
            }
        else:
            results[portfolio] = {
                'mean_return': np.nan,
                't_statistic': np.nan,
                'p_value': np.nan,
                'significant?': 'N/A'
            }
    
    return pd.DataFrame(results).T


significance_results = test_portfolio_significance(portfolio_returns)
print("======================== Return Analysis ========================\n")
print(significance_results)


                      mean_return t_statistic   p_value significant?
low_bm_small_size        0.018362    3.032076  0.002648          yes
low_bm_medium_size       0.037059    6.147385       0.0          yes
low_bm_large_size        0.041197    7.284621       0.0          yes
medium_bm_small_size     0.004708    0.804272  0.421898           no
medium_bm_medium_size    0.013955    2.485537  0.013499          yes
medium_bm_large_size     0.017774    3.627603  0.000338          yes
high_bm_small_size      -0.005604   -1.002316  0.317026           no
high_bm_medium_size      0.000316    0.059573  0.952537           no
high_bm_large_size        0.00494    1.048383  0.295335           no


## Discussion of Results

### Size and Value Relationship

**Overall Trend:**

We observe a consistent negative relatoinship between Book-to-Market (BM) ratio and stock returns across all size groups (small, medium, large). This implies that, regardless of the size of the company, **lower BM stocks (value stocks) tend to outperform higher BM stocks (growth stocks)**.

**Specific Observations:**

* **Small-Cap Stocks:**
    * The mean return decreases as BM increases.
    * The difference in mean returns between low BM and high BM stocks is approximately `0.0236`.
* **Medium-Cap Stocks:**
    * The mean return also decreases with increasing BM. 
    * The difference in mean returns is more pronounced, at approximately `0.0367`.
* **Large-Cap Stocks:**
    * The trend of decreassing returns with increasing BM continues.
    * The difference in mean returns is similar to medium-cap stocks, around `0.0363`.

**Conclusion:**

The observed inverse relationship between BM and returns suggests that the **value premium** is a robust phenomenon, evident across different market capitalization segments. This finding aligns with the well-established Fama-French factor model.