# Summary

## Update 
### 202402
`akshare` stops working for retrieving fundamentals of us stocks. After research, I used https://www.dolthub.com/repositories/post-no-preference/earnings/query/master to get the Book Value, in order to compute PB. The code is modified accordingly. Alternatives APIs are in the Backup section  
Changes include
- Since APIs or query all limit calls of data, I need to first limit the number of stocks in analaysis, instead of retrieving info of all SP500 stocks
### 202502
- Use `yfinance` to get book value
- Change the date of BV to use for computing PB  
  - Original logic: Use 2 quarters ahead of QUARTER_ACT, to make sure BV is available for all stock. For example, if I act in February (Q1), use BV from previous Q3. **The problem** of this that the def of qQuarters differ by stock, and Q3 could mean July for one stock and September for another. 
  - ``New logic``: Use the latest BV available, regardless which quarter it's from

## Methodology

**Caveat**
- In my first test (Stock_MA.ipynb), I use the first trading day of MONTH_ACT and first trading day of MONTH_PREV to define the the two ends of the 6m momentum. Here I use the Nth trading day instead. Therefore the result may vary
- If stock splits, the price momentum is unreasonable, I will check it and likely skip that stockthat year
- If using finviz
  - The BV is updated with Q4 results for some stocks
  - I can only use the whole SP500 instead of top 300
  - The momentum filter is not by rank, but by value like 10% increase 
- The SP500 components change by about 20 each year. To accurately apply the magic formula, I should use the sp500 components of that year for backtest, but I don't have it. Here I'm using the latest SP500 list instead. There's an information leak, since those in today's SP500 but not 5 years ago must have been performing well to get into the list in the past 5 years. So I'm picking up those stocks that I know increases a lot in value, and putting in my portfolio to test the performance. 
 - That said, I'm using top300 of sp500 which hopefully are more stable over time. And I have those strict criteria about momentum and PB. So the results should not be too far off


**Methodology (for backtest)** 
Here I get daily price history data and use the price of the action day (The Nth trading day of the action month) to compute price momentum. Also I use a latest full list of SP500, but take only the first N_TOP_BY_MKT_CAP stocks.

The follows are for the backtest. Prediction is similar except I only use one year's data and optimized parameters

Method: 
- Select the N_TOP_BY_MKT_CAP top stocks from SP500
- Starting from 2010, take actions on the Nth trading day of Feb (MONTH_ACT) of each year
- First choose the top 20% (TOP_BY_MMT = 0.2) of stocks ranked by 6 month price momentum (price of the Nth trading day of MONTH_ACT minus price of the Nth trading day of previous MONTH_PREV)
- Then choose the top N_STOCKS ranked by PB. PB = price of the Nth trading day of MONTH_ACT / book value of 2 quarters ago (previous Q3 if MONTH_ACT = Feb)
- On the Nth trading day of MONTH_ACT of each year, sell all stocks from the previous year with tax rate of 10% (TAX_FACTOR = (1 - 0.1)), and buy new stocks with the money by repeating the previous two steps  

**Parameters**
- Tried different N_TOP_BY_MKT_CAP, and 300 is better than the fullist
- Choose the top 40 gives good return among (30, 40, 50)
- Each year, the gain (after 10% lt tax) is better than SPY

In [11]:
%reset -f
from urllib.request import urlopen
import requests
import certifi
import json
import time

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import akshare as ak  # https://github.com/akfamily/akshare
import yfinance as yf

from stocklib import pick_stocks, get_quarter

pd.set_option('display.max_rows', 500)
%matplotlib inline

# Parameters

In [None]:
# 1. Download the SP500 list manually
# 2. Change month, year and day below

# Which stock was split in the past year; check back after finding those anomalous momentum
STOCK_SPLIT = ['IDONTKNOW']

## When to buy and sell
MONTH_ACT = 2  
YEAR_ACT = 2025
QUARTER_ACT = f'Q{MONTH_ACT // 3 + 1}'  # Dummy variable. If acting on Feb, the quarter is Q1

# Starting year and month to compute 6m momentum
if MONTH_ACT - 6 > 0:
    MONTH_PREV = MONTH_ACT - 6
    YEAR_PREV = YEAR_ACT 
else:
   YEAR_PREV = YEAR_ACT - 1
   MONTH_PREV = MONTH_ACT + 6  


## The quarter for the BV
# Original logic: Since the ER date differs, this should be 2 quarters ahead of QUARTER_ACT
# so that the BV is available for every stock. If acting in Q1, use previous Q3 
# New logic: Use the latest BV available, so QUARTER_BV is obsolete
QUARTER_BV = f'Q{MONTH_ACT // 3 + 3}'  


# Trade on which **trading day of the month** 
# 0. This is the Nth trading day, not day of month
# 1. The maximum of DOM varies by month; so be careful using a big value (> 20)
# 2. Previously I use the first day of the MONTH_ACT for convienience. Now using DOM_TRADING allows me to
# compute on any day of the month. Note that the starting and end dates of 6m momentum is defined as 
# the Nth trading day of MONTH_PREV and that of MONTH_ACT, where N = DOM_TRADING. Therefore they do not
# always fall on the same calendar day of month
# 3. Some stocks don't have price of the previous day in early AM, wait a bit 
DOM_TRADING = 17  
DATE_CUTOFF = f"{YEAR_ACT}-{f'{MONTH_ACT}'.zfill(2)}-01"  # To filter data without recent data

## Filters to pick stocks and compute gains
N_TOP_BY_MKT_CAP = 300  # Choose from the top N of sp500
TOP_BY_MMT = 0.2  # The top fraction of stocks ranked by MMT
MMT_VAR = 'stock_price_mmt_6m'
TAX_FACTOR = (1 - 0.1)
N_STOCKS = 40  # Number of stocks to buy
TOTAL_CASH = 248000.0  # Total cash

## Parameters for both backtest and prediction
# Remove stocks with more than certain quarters with negatives BV. This is to reduce risks based on history.
# When apply the magic formula, only stocks with PB > 0 of the MONTH_ACT are considered
MAX_QUARTERS_NEG_PB = 40

## Backtest only parameters
YEAR_START = 2011  # backtest starting year
CASH_FOR_EACH_STOCK = 1000  # backtest cash for each stock
# Remove stocks with fewer than some quarters
MIN_YEARS_TEST = 3  # Keep stocks with >= MIN_YEARS_TEST years of data

In [13]:
file_name = {
    'fundamental': f'sp500_history_raw_{str(YEAR_ACT)}.csv',
    'price': f'sp500_history_price_raw_{str(YEAR_ACT)}.csv',
}

# Download SP500 price data
Downloading price takes about 0.5hr

In [14]:
## Get fundamental data (seasonal) or price data (daily) 
# Download all SP500 instead of N_TOP_BY_MKT_CAP, since later filters will remove some

is_download = True
is_from_scratch = False  # If starting from scratch and no stock data has been downloaded already
to_download = 'price'  # 'fundamental' or 'price', download fundamental or price data  # fundamental not working with akshare any more

if is_download:    
    min_row = {
        'fundamental': 3,
        'price': 50,
    }
    anom = {'Failed_PB': [], 'Short': [], 'Failed_P': []}

    # Get stock list (ordered by capital)
    df_sp500_list = pd.read_excel('sp500_fulllist_ranked.xlsx', engine='openpyxl', sheet_name=str(YEAR_ACT))
    stocks = df_sp500_list.loc[df_sp500_list['stock'] != 'GOOG', 'stock'].values.tolist()
    stocks.append('OHI')  # I like OHI

    if not is_from_scratch: 
        print('Read downloaded stocks from local file')
        df_stock_all = pd.read_csv(file_name[to_download])

    # If df_stock_all does not exist, declare it
    try:
        df_stock_all       
    except NameError: 
        df_stock_all = pd.DataFrame()
    stock_downloaded = [] if df_stock_all.empty else df_stock_all.stock.unique()
    print(f'Downloaded {len(stock_downloaded)}, {stock_downloaded}')

    count = 0
    for stock_symbol in stocks:
        if stock_symbol in stock_downloaded:
            continue
        try:
            if to_download == 'price':  # Download price
                df_stock = ak.stock_us_daily(symbol=stock_symbol)
                df_stock = df_stock.reset_index()  # Set the date index to a column
            else:
                print('Wrong variable name')
        except IndexError:
            print(f'Failed for {stock_symbol}')
            anom['Failed_P'].append(stock_symbol)
            continue        
        df_stock['stock'] = stock_symbol
        df_stock_all = pd.concat([df_stock_all, df_stock], ignore_index=True)
        print(f"""{(stock_symbol, 
                    df_stock.date.dt.date.min().isoformat(), 
                    df_stock.date.dt.date.max().isoformat(), 
                    len(df_stock))}""")
        if len(df_stock) < min_row[to_download]:
            anom['Short'].append(stock_symbol)

        count += 1
        if count % 50 == 1:
            print(f'{count} stocks downloaded; saving...')
            df_stock_all.to_csv(file_name[to_download], index=False)    
        time.sleep(1)        

    df_stock_all.to_csv(file_name[to_download], index=False)    

Read downloaded stocks from local file
Downloaded 504, ['MSFT' 'AAPL' 'NVDA' 'AMZN' 'META' 'GOOGL' 'BRK.B' 'AVGO' 'LLY' 'TSLA'
 'JPM' 'UNH' 'V' 'XOM' 'MA' 'JNJ' 'PG' 'HD' 'MRK' 'COST' 'ABBV' 'ADBE'
 'CRM' 'AMD' 'CVX' 'NFLX' 'WMT' 'KO' 'PEP' 'ACN' 'BAC' 'MCD' 'TMO' 'CSCO'
 'ABT' 'LIN' 'CMCSA' 'ORCL' 'INTC' 'VZ' 'DIS' 'INTU' 'WFC' 'AMGN' 'IBM'
 'DHR' 'NOW' 'QCOM' 'CAT' 'PFE' 'UNP' 'SPGI' 'GE' 'TXN' 'PM' 'AMAT' 'UBER'
 'ISRG' 'RTX' 'COP' 'HON' 'T' 'LOW' 'GS' 'NKE' 'BKNG' 'NEE' 'PLD' 'BA'
 'MDT' 'AXP' 'ELV' 'SYK' 'VRTX' 'TJX' 'BLK' 'MS' 'LRCX' 'SBUX' 'C' 'ETN'
 'PANW' 'DE' 'PGR' 'MDLZ' 'UPS' 'REGN' 'ADP' 'CB' 'BMY' 'GILD' 'ADI' 'MMC'
 'BSX' 'CVS' 'LMT' 'MU' 'SCHW' 'AMT' 'CI' 'BX' 'ZTS' 'FI' 'TMUS' 'SNPS'
 'KLAC' 'EQIX' 'CDNS' 'SO' 'DUK' 'ICE' 'CME' 'MO' 'SHW' 'CSX' 'CL' 'BDX'
 'ITW' 'SLB' 'WM' 'CMG' 'PYPL' 'MCK' 'TGT' 'ANET' 'EOG' 'PH' 'PSX' 'ABNB'
 'USB' 'MPC' 'MCO' 'NOC' 'HCA' 'TT' 'ORLY' 'TDG' 'APH' 'GD' 'MAR' 'AON'
 'ROP' 'PNC' 'APD' 'NSC' 'FCX' 'FDX' 'NXPI' 'ADSK' 'MSI' 'EMR' 'CTAS'
 

# Get price momentum and filter by mkt share
## Get price momentum

In [15]:
df_p_history = pd.read_csv(file_name["price"])
df_p_history["date"] = pd.to_datetime(df_p_history.date.str[0:10])

## Remove stocks that didn't last until the recent
dt_cutoff = pd.to_datetime(DATE_CUTOFF)
df_p_history["max_date"] = df_p_history.groupby("stock")["date"].transform('max')
df_p_history = (df_p_history[df_p_history["max_date"] >= dt_cutoff])
max_date = df_p_history[df_p_history["max_date"] == df_p_history["date"]]
print("Data of last day: \n")
display(max_date['date'].value_counts())

## Get the starting and end dates to compute the momentum
# The starting date is the DOM_TRAIDING day of MONTH_ACT, and the end date is
# the DOM_TRAIDING day of MONTH_PREV. The two may not be the same calendar DOM
df_p_history = df_p_history.sort_values(["stock", "date"])
df_p_history["year"] = df_p_history.date.dt.year
df_p_history["month"] = df_p_history.date.dt.month
df_p_history["dom_trading"] = df_p_history.groupby(["stock", "year", "month"])[
    "date"
].rank()
df_p_history_prev = df_p_history[
    (df_p_history.month == MONTH_PREV) & (df_p_history.dom_trading == DOM_TRADING)
].copy()
df_p_history_curr = df_p_history[
    (df_p_history.month == MONTH_ACT) & (df_p_history.dom_trading == DOM_TRADING)
].copy()

## Use year_prev to join two datasets.
# For 6month range, if MONTH_ACT is Jul-Dec, the year (year_prev) of MONTH_PREV is
# the same year; otherwise its the previous year
df_p_history_curr["year_prev"] = (
    df_p_history_curr.year if MONTH_ACT - 6 > 0 else df_p_history_curr.year - 1
)

df_p_history_prev["year_prev"] = df_p_history_prev.year
cols = ["date", "close", "stock", "year_prev"]
df_p_history_mmt = pd.merge(
    df_p_history_prev[cols],
    df_p_history_curr[cols + ["year"]],
    on=["stock", "year_prev"],
    suffixes=["_prev", ""],
)

## Get the momentum
df_p_history_mmt["stock_price_mmt_6m"] = (
    df_p_history_mmt["close"] / df_p_history_mmt["close_prev"] - 1
)

Data of last day: 



date
2025-02-26    504
Name: count, dtype: int64

## Get top N by THIS YEAR'S market cap

In [16]:
# Choose only the top N companies of SP500 to start with
df_rank = pd.read_excel('sp500_fulllist_ranked.xlsx', engine='openpyxl', sheet_name=str(YEAR_ACT))
top_stocks = df_rank[df_rank['rank'] <= N_TOP_BY_MKT_CAP].stock.values
df_p_history_mmt = df_p_history_mmt[df_p_history_mmt.stock.isin(top_stocks)]

print(f'{df_p_history_mmt.stock.nunique()} of stocks after filtering by MKT CAP')

299 of stocks after filtering by MKT CAP


In [17]:
print(
    f"The stock with the lowest market cap is {df_rank[df_rank['rank'] == N_TOP_BY_MKT_CAP].stock.values}"
    )

The stock with the lowest market cap is ['FTV']


# Apply the magic formula for a certain year
## Look for anoumalous mmt to detect split

In [18]:
df_row = df_p_history_mmt[df_p_history_mmt.year == YEAR_ACT]

In [19]:
## Check mmt anomalies manually
display(df_row.sort_values('stock_price_mmt_6m').head())
display(df_row.sort_values('stock_price_mmt_6m').tail())

# # Check stocks with super low mmt on tradingview to see if they had a split
# STOCK_SPLIT = ['SRE', 'CPRT' ]  # And their momentum is low; just remove them
# stocks_split = STOCK_SPLIT
# df_row = df_row[~df_row.stock.isin(stocks_split)]

Unnamed: 0,date_prev,close_prev,stock,year_prev,date,close,year,stock_price_mmt_6m
6573,2024-08-23,847.37,LRCX,2024,2025-02-26,81.3,2025,-0.904056
10116,2024-08-23,269.18,TSCO,2024,2025-02-26,55.14,2025,-0.795156
2700,2024-08-23,788.5,CTAS,2024,2025-02-26,204.31,2025,-0.740888
726,2024-08-23,355.13,ANET,2024,2025-02-26,96.38,2025,-0.728606
8094,2024-08-23,350.75,PANW,2024,2025-02-26,189.55,2025,-0.459587


Unnamed: 0,date_prev,close_prev,stock,year_prev,date,close,year,stock_price_mmt_6m
1019,2024-08-23,370.7,AXON,2024,2025-02-26,572.4,2025,0.544106
10606,2024-08-23,85.77,VST,2024,2025-02-26,148.19,2025,0.72776
4585,2024-08-23,183.29,GEV,2024,2025-02-26,335.24,2025,0.829014
10255,2024-08-23,43.32,UAL,2024,2025-02-26,97.4,2025,1.248384
8507,2024-08-23,31.78,PLTR,2024,2025-02-26,89.31,2025,1.810258


## Filter by momentum

In [20]:
mmt_var = MMT_VAR
top_by_mmt = TOP_BY_MMT
df_top_by_mmt = (
    df_row.sort_values(mmt_var, ascending=False).iloc[: round(len(df_row) * top_by_mmt), :]
)
print(f'{df_top_by_mmt.stock.nunique()} of stocks after filtering by momentum')

60 of stocks after filtering by momentum


## Download price book ratio of filtered stocks

In [21]:
## A test case
# df_sp500_list_2023 = pd.read_excel('magic_stocks.xlsx', engine='openpyxl', sheet_name='2023')
# stock_list = df_sp500_list_2023.stock.values
# print(stock_list)

stock_list = df_top_by_mmt.stock.unique().tolist()

In [None]:
# Method 1: use dolthub
# with open('dolthub.key', 'r') as f:
#     key = f.readline().strip()

# repo = "earnings"
# query = f"""
# SELECT act_symbol as stock, `date`, book_value_per_share
# FROM `balance_sheet_equity`
# WHERE act_symbol IN ('{"', '".join(stock_list)}')
#     AND period = 'Quarter'
#     AND `date` > '{YEAR_ACT}-01-01' 
# ORDER BY `date` DESC
# LIMIT 1000;
# """

# def get_data(query, repo, key, owner="post-no-preference", branch="master", timeout=40):
#     res = requests.get(
#         f"https://www.dolthub.com/api/v1alpha1/{owner}/{repo}/{branch}",
#         params={"q": query},
#         headers={"authorization": f"token {key}" },
#         timeout=timeout,
#     )
#     try:
#         res = res.json()
#         return pd.DataFrame(res['rows'])
#     except Exception as e:
#         return e, res


# df_bv0 = get_data(query, repo, key)
# df_bv0['book_value_per_share'] = df_bv0['book_value_per_share'].astype(float)
# df_bv0.to_csv(f"dolthub_bv_{str(YEAR_ACT)}.csv", index=False)


# Method 2: use yfinance
dataframes = []
for stock in stock_list:
    ticker = yf.Ticker(stock)
    quarterly_balance_sheet_latest = ticker.quarterly_balance_sheet.iloc[:, 0:2].T
    bv_stock = (quarterly_balance_sheet_latest["Stockholders Equity"] / quarterly_balance_sheet_latest["Ordinary Shares Number"]).to_frame().reset_index()
    bv_stock.columns = ['date', 'book_value_per_share']
    bv_stock['stock'] = stock
    dataframes.append(bv_stock)
    # # Fetch the current market price
    # price_latest = ticker.history(period="1d")["Close"]
    # market_price = price_latest.iloc[-1]
    # market_price_date = price_latest.index.strftime('%Y-%m-%d')[-1]

    # # Calculate Price-to-Book Ratio
    # price_to_book_ratio = market_price / book_value_per_share

    # Print the results
    # print(f"Book Value: {book_value}")
    # print(f"Book Value per Share: {book_value_per_share:.2f}")
    # print(f"Market Price: {market_price:.2f}")
    # print(f"Price-to-Book Ratio: {price_to_book_ratio:.2f}")
    
    time.sleep(3)

df_bv0 = pd.concat(dataframes, ignore_index=True)

In [92]:
df_bv = df_bv0.copy()
df_bv['book_value_per_share'] = df_bv['book_value_per_share'].astype(float)

# Get BV of the latest quarter
df_bv['latest'] = df_bv.groupby('stock')['date'].transform('max')
df_bv = df_bv[df_bv.date == df_bv.latest].drop('latest', axis=1)
df_stock_sub = get_quarter(df_bv)
df_stock_sub = df_stock_sub.drop(['DATE', 'dayofyear', ], axis=1).sort_values(['date', 'stock'])

## Check if there is any anomalous pbs
## Replace inf pb to 0
# df_stock_sub = df_stock_sub.replace(np.inf, 0)
# Remove stocks with over MAX_QUARTERS_NEG_PB quarters with neg equity (pb)
# df_stock_sub_neg = df_stock_sub[df_stock_sub.price_to_book_ratio < 0].groupby('stock').size()
# stocks_sub_neg = df_stock_sub_neg[df_stock_sub_neg >= MAX_QUARTERS_NEG_PB].index.values
# df_stock_sub = df_stock_sub[~df_stock_sub.stock.isin(stocks_sub_neg)]
# print(f'{df_stock_sub.stock.nunique()} stocks remained after filtering by neg BV')

## Get data for the quarter needed - obsolete
# df_pb_history = df_stock_sub
# df_pb_quarter = (df_pb_history[df_pb_history.index.str.endswith(QUARTER_BV)][['date', 'book_value_per_share', 'stock', 'year']]
#                      .reset_index(drop=True)
#                      .rename({'year': 'year_prev'}, axis=1)
#                 )

df_pb_quarter = (df_stock_sub[['date', 'book_value_per_share', 'stock', 'year']]
                     .reset_index(drop=True)
                     .rename({'year': 'year_prev'}, axis=1)
                )

# Filter out non-positive BV
df_pb_quarter = df_pb_quarter[df_pb_quarter['book_value_per_share'] > 0]

print(f'{df_pb_quarter.stock.nunique()} stocks with the right quarter and positive BV', 
      f'\nStocks without {QUARTER_BV} data or with non-positive BV are {set(df_pb_history.stock.unique()) - set(df_pb_quarter.stock.unique())}'
     )

55 stocks with the right quarter and positive BV 
Stocks without Q3 data or with non-positive BV are {'BKNG', 'SBUX', 'MAR', 'PM', 'HLT'}


### Manually check and fill stocks with BV failure
I only filled data for the last quarter for prediction only

In [94]:
# df_pb_quarter

In [95]:
# stocks_failed_pb = ['GM', 'HSY', 'CSGP', 'STT' ]  #  anom['Failed_P']
# stocks_failed_pb_bv = [48.95, 15.26, 16.49, 69.7]   # get from https://www.macrotrends.net/

# df_stocks_failed_pb = pd.DataFrame([
#     [df_pb_quarter.date.max()] * len(stocks_failed_pb),
#     [np.nan] * len(stocks_failed_pb),
#     stocks_failed_pb_bv,
#     stocks_failed_pb,
#     [df_pb_quarter.year_prev.max()] * len(stocks_failed_pb)
# ]).T
# df_stocks_failed_pb.columns = df_pb_quarter.columns

# df_pb_quarter = df_pb_quarter.append(df_stocks_failed_pb)
# print(f'{df_pb_quarter.stock.nunique()} stocks after manully filling in those failing in PB download')

## Merge Data

In [97]:
# Original code compatible for backtest
# df_p_pb = pd.merge(df_p_history_mmt[['stock', 'year_prev', 'date', 'close', 
#                                      'year', 'stock_price_mmt_6m']], 
#                    df_pb_quarter, 
#                    on=['stock', 'year_prev'], 
#                    suffixes=['', '_pb'])\
#             .rename({'close': 'stock_price'}, axis=1)\
#             .drop(['year_prev'], axis=1)

df_p_pb = pd.merge(df_top_by_mmt[['stock', 'date', 'close', 'year', 'stock_price_mmt_6m']], 
                   df_pb_quarter, 
                   on=['stock'], 
                   suffixes=['', '_pb'])\
            .rename({'close': 'stock_price'}, axis=1)\
            .drop(['year_prev'], axis=1)
df_p_pb['price_to_book_ratio'] = df_p_pb['stock_price'] / df_p_pb['book_value_per_share']

## Pick stocks

In [None]:
stocks_invested = pick_stocks(df_p_pb, 
                              cash_to_invest=TOTAL_CASH,
                              n_stocks=N_STOCKS, 
                              mmt_var=MMT_VAR, 
                             )
stocks_invested = stocks_invested.round(2)
stocks_invested.sort_values('stock').reset_index(drop=True)

# Copy to magic_stocks.xlsx
# Compare with https://finviz.com/screener.ashx?v=150&f=idx_sp500,ta_perf_26w20o&ft=4&o=pb

Unnamed: 0,stock,shares,stock_price,price_to_book_ratio,book_value_per_share,stock_price_mmt_6m_pct
0,ABT,45.6,135.96,4.94,27.52,21.0
1,AMZN,28.92,214.35,7.94,27.0,21.0
2,APO,41.99,147.65,4.84,30.5,32.0
3,BK,71.81,86.34,1.49,57.75,30.0
4,BKR,142.5,43.51,2.55,17.07,24.0
5,BSX,61.05,101.56,6.88,14.76,28.0
6,C,78.41,79.07,0.71,111.13,27.0
7,CBRE,44.07,140.68,5.05,27.85,20.0
8,CEG,22.95,270.14,6.42,42.09,39.0
9,CME,25.0,247.99,3.16,78.47,20.0


### Sanity check
- Check where a stock is removed below
- If a stock is in finviz filter, but not here, it's possibly due to
  - Filter of Market Cap (only the top N is used)
  - Filter of PB (only the top M is used)
- If a stock is here but not finviz filter, it's possibly due to
  - Filter of momentum (e.g., mmt < 30%>)

finvis filter: https://finviz.com/screener.ashx?v=151&f=idx_sp500,ta_perf_26w30o&ft=4&o=ticker&r=41

In [76]:
stock = 'AAPL'
print(stock in df_sp500_list.stock.values,
      stock in df_p_history.stock.values,
      stock in df_p_history_mmt.stock.values,
      stock in df_row.stock.values,
      stock in df_top_by_mmt.stock.values,
      )
df_mmt = df_row.sort_values(mmt_var, ascending=False).reset_index(drop=True)
print(f'Rank of {stock} by momentum: {df_mmt[df_mmt.stock == stock].index.values[0]}')

True True True True False
Rank of AAPL by momentum: 131


# Backup

In [27]:
import yfinance as yf


ticker = yf.Ticker("MSFT")

quarterly_balance_sheet = ticker.quarterly_balance_sheet

# Book value
quarterly_balance_sheet_latest = quarterly_balance_sheet.iloc[:, 0]
shares_outstanding = quarterly_balance_sheet_latest["Ordinary Shares Number"]
book_value = quarterly_balance_sheet_latest["Stockholders Equity"]
book_value_date = quarterly_balance_sheet.index[0]

# Calculate Book Value per Share
book_value_per_share = book_value / shares_outstanding

# Fetch the current market price
price_latest = ticker.history(period="1d")["Close"]
market_price = price_latest.iloc[-1]
market_price_date = price_latest.index.strftime('%Y-%m-%d')[-1]

# Calculate Price-to-Book Ratio
price_to_book_ratio = market_price / book_value_per_share

# Print the results
print(f"Book Value: {book_value}")
print(f"Book Value per Share: {book_value_per_share:.2f}")
print(f"Market Price: {market_price:.2f}")
print(f"Price-to-Book Ratio: {price_to_book_ratio:.2f}")

## Use API to download fundamental data

In [None]:
with open('api.my', 'r') as hf:
    fmg_api_key = hf.read()  # financialmodelingprep, no quaterly data
def get_jsonparsed_data(url):
    headers = {
        'Content-Type': 'application/json'
    }
    requestResponse = requests.get(url, headers=headers)
    return requestResponse.json()
# def get_jsonparsed_data(url):
#     response = urlopen(url, cafile=certifi.where())
#     data = response.read().decode("utf-8")
#     return json.loads(data)


# APIs
av_api_key = ''  # alphavantage, Working. Seems not straightforward to cancel subscription
pg_api_key = ''  # Polygon, can't retrieve fundamental data
tg_api_key = ''  # tiingo, fundamental data requires add-on and monthly subs
# Other apis: https://www.fmpcloud.io/plans/, https://data.nasdaq.com/databases/SF1#usage, https://github.com/theOGognf/finagg (promising)
# More at https://www.reddit.com/r/algotrading/comments/1akqgki/is_there_an_inexpensive_api_for_company/, 
# https://www.reddit.com/r/algotrading/comments/122ixxl/cheapest_quote_and_historical_data_provider_for_a/


bv_raw = {}
bv_list = []
stock_list = df_top_by_mmt['stock'].unique()
for symbol in stock_list[:1]:
    # url = (f"https://financialmodelingprep.com/api/v3/balance-sheet-statement/{symbol}?period=annual&apikey={api_key}")
    # url = f"https://www.alphavantage.co/query?function=BALANCE_SHEET&symbol={symbol}&apikey={av_api_key}"
    url = f"https://api.tiingo.com/tiingo/fundamentals/{symbol}/statements?token={tg_api_key}"
    bv_raw[symbol] = get_jsonparsed_data(url)
    for data in bv_raw[symbol]:
        balance_sheet = {i['dataCode']: i['value'] for i in data['statementData']['balanceSheet'] if i['dataCode'] in ['equity', 'sharesBasic']}
        bv_list.append((symbol, data['date'], balance_sheet['equity'], balance_sheet['sharesBasic']))
df_bv = pd.DataFrame(bv_list, columns=('stock', 'date', 'book_value', 'shares_outstanding'))

In [None]:
# Market Cap data 
us_stock_em = ak.stock_us_spot_em()
us_stock_em[['symbol_prefix', 'symbol']] = us_stock_em['代码'].str.split('.').tolist()
us_stock_em = us_stock_em.rename({'总市值': 'mktCap'}, axis=1)
df_merged = pd.merge(df_bv, us_stock_em[['mktCap', 'symbol']], on='symbol')
df_merged['pb'] = df_merged['mktCap'] / df_merged['book_value']
df_merged

## Backtest

In [12]:
# Remove data with too short length and too late starting date
df_p_pb_size = df_p_pb.groupby('stock').size() 
df_p_pb_cut = df_p_pb[(df_p_pb.year >= YEAR_START) & 
                      (df_p_pb.stock.isin(df_p_pb_size[df_p_pb_size >= MIN_YEARS_TEST].index.values))]
print(f'{df_p_pb_cut.stock.nunique()} of stocks after filtering by length and starting date')

286 of stocks after filtering by length and starting date


In [50]:
top_by_mmt = TOP_BY_MMT
rsl_lt = pd.DataFrame()
# Number of stocks to choose by PB
for n_stocks in [30, 40, 50]:  
    # Invest method:
    # - At the beginning, get the top_by_mmt fraction of stocks ranked by MMT, 
    #   And choose the top n_stocks stocks ranked by PB
    # - After each holding period (PERIOD_HOLD), sell all stocks, and repurchase 
    #   with the initial methods
    cash_to_invest = CASH_FOR_EACH_STOCK * n_stocks
    cash_to_invest_prev = cash_to_invest

    # Run over time
    is_start = True
    period_cnt = 1
    for k, df_row in df_p_pb_cut.groupby('year'):    

        # Initialize
        if is_start:
            stocks_invested = pick_stocks(df_row, 
                              cash_to_invest=cash_to_invest, 
                              n_stocks=n_stocks, 
                              mmt_var=MMT_VAR, 
                              top_by_mmt=top_by_mmt
                             )
            is_start = False
            continue
        
        df_start = pd.merge(df_row[['stock', 'stock_price']], stocks_invested, on='stock', suffixes=('', '_bought'))
        values = (df_start['shares'] * df_start['stock_price']).sum()
        
        # Take action
        cash_to_invest_prev = cash_to_invest
        # Tax
        if values > cash_to_invest_prev:
            cash_to_invest = (values - cash_to_invest_prev) * TAX_FACTOR + cash_to_invest_prev
        else:
            cash_to_invest = values
        stocks_invested = pick_stocks(df_row, 
                                      cash_to_invest=cash_to_invest, 
                                      n_stocks=n_stocks, 
                                      mmt_var=MMT_VAR, 
                                      top_by_mmt=top_by_mmt
                                     )
        rsl_lt = rsl_lt.append(
            pd.DataFrame([k, round(values), 
                          round(values / (CASH_FOR_EACH_STOCK * n_stocks), 3), len(df_start),
                          n_stocks, MMT_VAR, 'lt_1', ], 
                         index=['year', 'value', 'return_overall', 'n_stocks_actual',
                                'n_stocks', 'MMT_VAR', 'method', ]).T) 

        period_cnt += 1
        

In [51]:
# Get SPY history
spy0 = ak.stock_us_daily(symbol="SPY", adjust="").reset_index()
spy = spy0.copy()
spy['year'] = spy.date.dt.year
spy['month'] = spy.date.dt.month
spy['dom_trading'] = spy.groupby(['year', 'month'])['date'].rank()

# Get the first day of MONTH_PREV and of MONTH_ACT, compute the momentum
spy_curr = spy[(spy.month == MONTH_ACT) & (spy.dom_trading == DOM_TRADING)]

spy_curr_cut = spy_curr[spy_curr.year >= YEAR_START]
spy_curr_cut['return_overall'] = (spy_curr_cut['close'] / 
                                  spy_curr_cut[spy_curr_cut.year == YEAR_START]['close'].\
                                  values[0]
                                 ).round(2)

In [52]:
df_compare = pd.merge(rsl_lt, spy_curr_cut[['year', 'return_overall']], 
                      on='year', 
                      suffixes=['', '_spy']).sort_values(['n_stocks', 'year'])
df_compare[df_compare.year.isin(
    [2020, 2021, 2022, 2023]
#     range(2011, 2024)
)][
    ['year', 'n_stocks_actual', 'n_stocks', 'return_overall', 'return_overall_spy']
].sort_values(['year', 'n_stocks'])

Unnamed: 0,year,n_stocks_actual,N_STOCKS,return_overall,return_overall_spy
24,2020,30,30,2.255,2.48
25,2020,40,40,2.805,2.48
26,2020,50,50,3.042,2.48
27,2021,30,30,3.059,2.88
28,2021,40,40,3.664,2.88
29,2021,50,50,4.032,2.88
30,2022,30,30,5.077,3.46
31,2022,40,40,5.6,3.46
32,2022,50,50,5.846,3.46
33,2023,30,30,5.259,3.14


In [54]:
# The higher the rank means the more years with best return
df_compare['return_overall'] = df_compare['return_overall'].astype(float)
df_compare['rank_n_stock'] = \
    df_compare.groupby('year')['return_overall'].rank()
df_compare.groupby('n_stocks')['rank_n_stock'].mean()

N_STOCKS
30    1.0
40    2.5
50    2.5
Name: rank_n_stock, dtype: float64