# Arbitrage Strategy Based on Book-to-Market

This notebook implements an arbitrage model based on the spread between accounting value (book value) and market value of major listed companies.
Enhancements include:
- Sector neutrality
- Transaction cost adjustments
- Combined value signals (Book-to-Market and Free Cash Flow Yield)
- Quality filtering (positive earnings, low accruals)

## Theoretical Framework

Let:
- $P_{i,t}$ = Price of stock *i* at time *t*
- $BVPS_{i,t}$ = Book Value per Share
- $MV_{i,t} = P_{i,t} \times Shares$
- $BV_{i,t} = BVPS_{i,t} \times Shares$

### Value Signal:
We consider book value to market capitalization ratio: how much accounting value is paid for by the market
$$
BTM_{i,t} = \frac{BV_{i,t}}{MV_{i,t}}
$$

### FCF Yield:
$$
FCFY_{i,t} = \frac{FCF_{i,t}}{MV_{i,t}}
$$

### Composite Signal (Z-Score):
$$
Z_{i,t} = \frac{(Signal_{i,t} - \mu_t)}{\sigma_t}
$$
Where $Signal_{i,t}$ combines BTM and FCFY, and $\mu_t$, $\sigma_t$ are cross-sectional mean and std.

We go **long** on stocks with low Z and **short** on high Z, adjusted for sector neutrality.

In [None]:
# Ideally, only run a few times since key usage is limited

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
import requests

from utils import get_current_fundamentals

# CONFIG
TICKERS = ['AAPL', 'MSFT', 'GOOGL', 'META', 'AMZN']
TRANSACTION_COST = 0.001  # 0.1% per trade
START_DATE = '2018-01-01'
END_DATE = '2023-12-31'

fundamentals = {}
for ticker in TICKERS:
    fundamentals[ticker] = get_current_fundamentals(ticker)

fund_df = pd.DataFrame(fundamentals).T
pd.DataFrame({'Attributes': fund_df.columns.to_list()})

In [None]:
calc = fund_df.copy()
# Convert to numeric columns that start with a digit
calc = calc.apply(lambda x: pd.to_numeric(x, errors='coerce') if x.str.contains(r'^\d', na=False).any() else x)

# Filter out companies with negative EPS or profit margin
calc = calc[(calc['DilutedEPSTTM'] > 0) & (calc['ProfitMargin'] > 0)]

calc = calc[['BookValue', 'SharesOutstanding', 'EPS', 'DilutedEPSTTM', 'MarketCapitalization',
             'Sector', 'ProfitMargin','PriceToBookRatio']]

calc['Book to Market1'] = 1 / calc['PriceToBookRatio']  # Book to Market ratio 1
calc['Book to Market2'] = (calc['BookValue'] * calc['SharesOutstanding']) / calc['MarketCapitalization']
#calc.insert(0, 'Book to Market', calc.pop('Book to Market'))

# Oddly enough, the 'Book to Market1' and 'Book to Market2' columns are NOT identical.
# What differences in source data are driving this? To be investigated.
calc.iloc[0]

BookValue                       4.471
SharesOutstanding         14935800000
EPS                              6.41
DilutedEPSTTM                    6.41
MarketCapitalization    3194468958000
Sector                     TECHNOLOGY
ProfitMargin                    0.243
PriceToBookRatio                47.82
Book to Market1              0.020912
Book to Market2              0.020904
Name: AAPL, dtype: object