# Trade
- Swing trading/long-term trading
    - Exposed to overnight risk (close price previous day might not equal to open 
    price next day if there are major events happening between market closure and
    market open).
- Assume I already have which day to long, which day to short
- Conduct post-trade analysis
- Refine risk management techniques (Comparing starting on 2023-12-22)
    - Boeing: Main character in the events
        - Stock -18.61%

    - Direct competitors
        - Airbus (EPA: AIR): Boeing's primary competitor in commercial aircraft manufacturing
            - Stock +5.93%
        - Lockhead Martin (LMT): More focused on defense but also compete in aerospace
    - Suppliers
        - General Electric (GE): Supplies engines for Boeing aircraft
            - Have presence in aviation, healthcare, power, renewable energy
            - Doesn't seem to be affected
            - Can also supply engines to other aircraft manufacturers (effect on
            stock price is complicated)
    - Customers
        - Alaska Airlines (ALK): Main airline involved
            - Stock -11.73%
        - American Airlines (UAL - NasdaqGS)
            - Stock -4.91%
        - Delta Air Lines (DAL)
            - -11.73%
        - Southwest Airlines
- Trading timing (NYSE) vs news timing
    - The news was updated on January 18, 2024, at 4:36 AM GMT+8, which translates to January 17, 2024, at 3:36 PM Eastern Time (since GMT+8 is 13 hours ahead of Eastern Time). Since the NYSE closes at 4:00 PM ET, this news would have come out just before the market close.
    - Difference stock exchanges might operate at different timings also
- No training and validation - straight go to validation (backtesting)



# Set Up

In [1]:
import os
import ast
import requests
import logging

import yfinance as yf
from backtesting import Backtest, Strategy
import pandas as pd
import numpy as np
import finnhub
from dotenv import load_dotenv
from pathlib import Path    
import sys
import time
sys.path.append('../') # Change the python path at runtime

# Self-created modules
from src.utils import path as path_yq

  from .autonotebook import tqdm as notebook_tqdm


In [208]:
load_dotenv()
POLYGON_API_KEY = os.environ.get('POLYGON_API_KEY')

BT_START_DATE = '2023-11-01'
BT_START_STR = '20231101'
BT_END_DATE = '2024-01-31'
BT_END_STR = '20240131'

cur_dir = Path.cwd()
root_dir = path_yq.get_root_dir(cur_dir)

logging.basicConfig(filename=Path.joinpath(root_dir, 'logs', 'trading_system.log'),
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                    level=logging.DEBUG)

# Fetch Tick Data

## Polygon

Polygon docs: https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to

- FIXME: The timings include those in pre-market hours
- The timestamp is in ms, not sec

Similar to download data codes
TODO: Assumption: assume other stocks share the same timezone

- The data is incomplete (not every minute)
24265	47739.0	217.6996	217.6800	217.7000	217.7000	217.6800	1705096560000	21	2024-01-12 21:56:00
24266	171.0	217.5423	217.5000	217.5000	217.5000	217.5000	1705096800000	5	2024-01-12 22:00:00

In [122]:
url = f"https://api.polygon.io/v2/aggs/ticker/BA/range/1/minute/{BT_START_DATE}/{BT_END_DATE}?adjusted=true&sort=asc&limit=50000&apiKey={POLYGON_API_KEY}"

# Make the GET request
resp = requests.get(url)

In [123]:
# Check if the request was successful
if resp.status_code == 200:
    # Convert the 'results' list to a DataFrame
    df = pd.DataFrame(resp.json().get('results'))

    # Rename the columns to more descriptive names
    column_mapping = {
        "v": "Volume",
        "vw": "VWAP",
        "o": "Open",
        "c": "Close",
        "h": "High",
        "l": "Low",
        "t": "Timestamp",
        "n": "Transactions"
        # Add more mappings as necessary
    }

    df.rename(columns=column_mapping, inplace=True)

    # Optionally, convert the 'Timestamp' column from Unix milliseconds to a datetime format
    df['Datetime'] = pd.to_datetime(df['Timestamp'], unit='ms')

    # Display the updated DataFrame
    print(df)
else:
    # Handle errors (e.g., logging, raising an exception)
    print(f"Error fetching data: {resp.status_code}, {resp.text}")



      Volume      VWAP    Open   Close    High     Low      Timestamp  \
0     2881.0  260.2149  260.35  260.30  260.35  260.10  1704188280000   
1      679.0  260.2014  260.23  260.23  260.23  260.23  1704188340000   
2      174.0  260.0000  260.00  260.00  260.00  260.00  1704189900000   
3      200.0  260.0000  260.00  260.00  260.00  260.00  1704190080000   
4     2241.0  259.2437  259.21  259.10  259.35  259.10  1704190800000   
...      ...       ...     ...     ...     ...     ...            ...   
5204   632.0  217.2991  217.30  217.30  217.30  217.30  1705107000000   
5205   425.0  217.3412  217.35  217.35  217.35  217.35  1705107060000   
5206   836.0  217.3015  217.30  217.30  217.30  217.30  1705107300000   
5207  2492.0  217.0491  217.16  217.02  217.16  217.00  1705107420000   
5208   356.0  217.1003  217.10  217.10  217.10  217.10  1705107540000   

      Transactions            Datetime  
0               56 2024-01-02 09:38:00  
1               25 2024-01-02 09:39:00  


In [124]:
# Boeing open high low close data
raw_path = Path.joinpath(root_dir, 'data', 'raw', 'BA_OHLC_20231101_to_20240131.csv')
df.to_csv(raw_path, index=False)

## Yahoo

In [None]:
# Define the ticker list
ticker_list = ['BA']

# Fetch the data
dl_data = yf.download(ticker_list, start=BT_START_DATE, end=BT_END_DATE) # Auto adjust is false

dl_data = pd.DataFrame(dl_data)
data = dl_data.drop(columns=['Close'], axis=1)
data = data.rename(columns={'Adj Close': 'Close'})
display(data.isna().sum(axis=0)) # Axis=0: along the indices, row-wise opertaion
# Gives the sum for rows in a column
data.index = pd.to_datetime(data.index)
data


In [None]:
dates = pd.DataFrame(data.index.strftime('%Y-%m-%d'))
# dates.to_csv("trading_dates.csv", index=False)

In [None]:
# After performing sentiment
stm_path = root_dir.joinpath('data', 'proc', 'boeing_stm_20231101_to_20240131.csv')
news = pd.read_csv(stm_path, index_col=False)
news2 = news[['datetime2', 'news_pol_blob']]
news2

news2.plot()
# # data['Sentiment'] = np.random.random(len(data)) * 2 - 1
# display(len(data))
# sentiment = np.array([0, -1, -0.8, 0, 0, 0]) # Put -1 on 01-05 (Before the whole thing Boeing case appeared after market closed on 01-05 to prepare to trade for 01-08)
# data['Sentiment'] = sentiment
# display(data.tail(20))

In [None]:
# Ensure datetime2 in news2 is in pandas datetime format
news2['datetime2'] = pd.to_datetime(news2['datetime2'])

# Assuming data.index is already a DatetimeIndex, no need to convert it again
# Just ensure it's sorted
data.sort_index(inplace=True)

# Function to find the closest previous date in data for each date in news2
def find_closest_previous_date(target_date, date_index):
    previous_dates = date_index[date_index <= target_date]
    if not previous_dates.empty:
        return previous_dates.max()
    else:
        return pd.NaT  # Return Not-A-Time (NaT) if no previous date is found

# Apply the function to each date in news2['datetime2']
closest_dates = news2['datetime2'].apply(lambda x: find_closest_previous_date(x, data.index))

# Add this closest date information to news2
news2['closest_date'] = closest_dates
news2

In [None]:
# TODO: Need to think of how to combine the data (might have many neutral etc.)
# as_index will retain closest_date
news3 = news2.groupby('closest_date', as_index=False)['news_pol_blob'].mean().reset_index(drop=True) 
news3

In [None]:
merged = pd.merge(data, news3, left_on='Date', right_on='closest_date', how='left')
merged

In [None]:
# Clean for 2 lines only
merged2 = merged.dropna().reset_index(drop=True)
merged2

# Merge data

In [126]:

def convert_data(row):
    """
    A function from sentiment.ipynb.
    """
    try:
        # First, try to evaluate the row as a list
        evaluated = ast.literal_eval(row)
        # If the result is a list, return it directly
        if isinstance(evaluated, list):
            return evaluated
        # If not, it's already the correct type (int, float, etc.)
        return evaluated
    except ValueError:
        # Handle the case where the row is not a valid Python literal
        # This could be a string that should not be converted
        return row
    except SyntaxError:
        # Handle syntax errors which might occur if ast.literal_eval can't parse the string
        return row
    except Exception as e:
        print(f'Exception: {e}')
        return row

# TODO:
score_path = root_dir.joinpath('data', 'proc', f'boeing_score_{BT_START_STR}_to_{BT_END_STR}.csv') # TODO: Change dates
df9 = pd.read_csv(score_path, index_col=False)

# Apply the conversion function to each specified column
for col in df9.columns:
    df9[col] = df9[col].apply(convert_data)
df9['datetime2'] = pd.to_datetime(df9['datetime2'])

# print(df8.equals(df7))
# print(type(df8['datetime2'][0]))

In [125]:
# Boeing open high low close data
raw_path = Path.joinpath(root_dir, 'data', 'raw', f'BA_OHLC_20231101_to_20240131.csv')
tick = pd.read_csv(raw_path, index_col=False)
tick = tick.sort_values(by='Datetime')
tick['Datetime'] = pd.to_datetime(tick['Datetime'])
tick = tick[(tick['Datetime'] >= BACKTEST_START_DATE) & (tick['Datetime'] <= BACKTEST_END_DATE)]
tick

Unnamed: 0,Volume,VWAP,Open,Close,High,Low,Timestamp,Transactions,Datetime
0,2881.0,260.2149,260.35,260.30,260.35,260.10,1704188280000,56,2024-01-02 09:38:00
1,679.0,260.2014,260.23,260.23,260.23,260.23,1704188340000,25,2024-01-02 09:39:00
2,174.0,260.0000,260.00,260.00,260.00,260.00,1704189900000,2,2024-01-02 10:05:00
3,200.0,260.0000,260.00,260.00,260.00,260.00,1704190080000,1,2024-01-02 10:08:00
4,2241.0,259.2437,259.21,259.10,259.35,259.10,1704190800000,53,2024-01-02 10:20:00
...,...,...,...,...,...,...,...,...,...
5204,632.0,217.2991,217.30,217.30,217.30,217.30,1705107000000,7,2024-01-13 00:50:00
5205,425.0,217.3412,217.35,217.35,217.35,217.35,1705107060000,3,2024-01-13 00:51:00
5206,836.0,217.3015,217.30,217.30,217.30,217.30,1705107300000,16,2024-01-13 00:55:00
5207,2492.0,217.0491,217.16,217.02,217.16,217.00,1705107420000,52,2024-01-13 00:57:00


In [5]:
tick[tick['Datetime'] >= '2024-01-12 21:47:23'].head()

Unnamed: 0,Volume,VWAP,Open,Close,High,Low,Timestamp,Transactions,Datetime
24265,47739.0,217.6996,217.68,217.7,217.7,217.68,1705096560000,21,2024-01-12 21:56:00
24266,171.0,217.5423,217.5,217.5,217.5,217.5,1705096800000,5,2024-01-12 22:00:00
24267,202.0,217.6949,217.69,217.6999,217.6999,217.69,1705096920000,3,2024-01-12 22:02:00
24268,119.0,217.6883,217.6892,217.6892,217.6892,217.6892,1705097280000,5,2024-01-12 22:08:00
24269,100.0,217.69,217.69,217.69,217.69,217.69,1705097520000,1,2024-01-12 22:12:00


In [127]:
# Assuming data.index is already a DatetimeIndex, no need to convert it again
df9['datetime2'] = pd.to_datetime(df9['datetime2'])
tick['Datetime'] = pd.to_datetime(tick['Datetime'])

# Make sure to sort first
df9 = df9.sort_values(by='datetime2')
tick = tick.sort_values(by='Datetime')

# Function to find the closest previous date in tick for each date in news2
def find_closest_prev_date(target_date, date_col):
    # The information gotten at this time point can only be used in the next time point
    prev_dates = date_col[date_col <= target_date] 
    if not prev_dates.empty:
        return prev_dates.max()
    else:
        # Can happen when we have news on weekends but there is no more tick data (end of backtest)
        print("WARNING.")
        return pd.NaT  # Return Not-A-Time (NaT) if no previous date is found


# Apply the function to each date in news2['datetime2']
closest_dates = df9['datetime2'].apply(lambda x: find_closest_prev_date(x, tick['Datetime']))

# Add this closest date information to news2
df9['closest_date'] = closest_dates
df9.sort_values(by='datetime2')

Unnamed: 0,datetime2,cln_hdl,cln_smr,cln_news,cln_hdl_lemma,cln_smr_lemma,cln_news_lemma,cln_hdl_pol_blob,cln_smr_pol_blob,cln_news_pol_blob,...,cln_hdl_lemma_pol_bert_score,cln_smr_lemma_pol_bert_score,cln_news_lemma_pol_bert_score,cln_hdl_pol_finbert_score,cln_smr_pol_finbert_score,cln_news_pol_finbert_score,cln_hdl_lemma_pol_finbert_score,cln_smr_lemma_pol_finbert_score,cln_news_lemma_pol_finbert_score,closest_date
3,2024-01-02 12:10:00,[Bell-Boeing Secures Contract to Aid MV-22 Osp...,"[Bell-Boeing, a joint venture between Boeing (...","[Bell-Boeing, a joint venture (JV) between The...",[Bell-Boeing Secures Contract Aid MV-22 Osprey...,"[Bell-Boeing , joint venture Boeing ( BA ) Bel...","[Bell-Boeing , joint venture ( JV ) Boeing Com...",[0.0],[0.0],"[0.0, 0.0, -0.1, 0.0, 0.0, 0.0, 0.020833333333...",...,0.727048,0.733606,0.311212,0.939747,0.764613,0.787006,0.926775,0.558279,0.730209,2024-01-02 12:10:00
2,2024-01-02 13:48:00,[Boeing Stock Just Got Downgraded.],[Goldman Sachs analyst Noah Poponak took share...,[Goldman Sachs analyst Noah Poponak took share...,[Boeing Stock Got Downgraded .],[Goldman Sachs analyst Noah Poponak took share...,[Goldman Sachs analyst Noah Poponak took share...,[0.0],[0.0],[0.0],...,-0.211657,-0.749018,-0.749018,-0.876752,0.000000,0.000000,-0.946479,0.000000,0.000000,2024-01-02 13:48:00
1,2024-01-02 17:17:13,[A loose bolt is the least of Boeings problems...,[Boeing (BA) shares tick down Tuesday morning ...,[Boeing (BA) shares tick down Tuesday morning ...,"[loose bolt least Boeings problem , analyst say]",[Boeing ( BA ) share tick Tuesday morning foll...,[Boeing ( BA ) share tick Tuesday morning foll...,[-0.18846153846153846],"[-0.07777777777777779, -0.07692307692307693, 0...","[-0.07777777777777779, -0.07692307692307693, 0...",...,-0.685612,-0.071711,-0.279021,-0.853149,-0.332891,-0.140990,-0.541524,-0.509943,-0.288236,2024-01-02 17:17:00
0,2024-01-02 19:19:53,"[US STOCKS-SP, Nasdaq start 2024 in subdued fa...",[The SP 500 and Nasdaq Composite dropped in th...,[Apple down on Barclays downgrade Tesla flat d...,"[US STOCKS-SP , Nasdaq start 2024 subdued fash...",[SP 500 Nasdaq Composite dropped first trading...,[Apple Barclays downgrade Tesla flat despite r...,[0.0],"[0.08333333333333333, 0.0]","[-0.034523809523809526, -0.17916666666666667, ...",...,0.659485,-0.800565,-0.380766,0.000000,-0.971305,-0.279670,0.000000,-0.961590,0.085610,2024-01-02 19:19:00
8,2024-01-03 14:09:00,[RTX Secures a $345M Deal to Build StormBreake...,[RTX wins deal to manufacture the 10th lot of ...,[RTX Corp. RTX recently secured a modification...,[RTX Secures $ 345M Deal Build StormBreaker Mu...,[RTX win deal manufacture 10th lot StormBreake...,[RTX Corp. RTX recently secured modification c...,[0.0],[0.3],"[0.0, -0.45, -0.1, 0.0, 0.0, -0.1125, 0.0, 0.3...",...,0.770482,0.812814,0.280963,0.897753,0.867298,0.863019,0.894005,0.000000,0.878530,2024-01-03 14:07:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2024-01-12 20:40:54,[UPDATE 1-Delta orders up to 40 Airbus A350-10...,[Delta Air Lines on Friday unveiled an order f...,"[(New throughout, adds comments from Deltas CE...",[UPDATE 1-Delta order 40 Airbus A350-1000 wide...,[Delta Air Lines Friday unveiled order 40 new ...,"[( New throughout , add comment Deltas CEO par...",[0.0],"[0.19318181818181818, 0.125, 0.0]","[0.17424242424242423, 0.125, 0.0, 0.2047619047...",...,0.633881,-0.253080,0.115502,0.857537,0.711469,0.820392,0.644400,0.770639,0.814048,2024-01-12 20:40:00
94,2024-01-12 21:06:00,"[Boeing Stock Is Down Again., The FAA Is Inten...",[Keeping up with the flow of news about the Al...,[Keeping up with the flow of news about the Al...,"[Boeing Stock ., FAA Intensifying Oversight .]",[Keeping flow news Alaska Airlines incident im...,[Keeping flow news Alaska Airlines incident im...,"[-0.15555555555555559, 0.0]",[0.2],[0.2],...,0.442367,0.182889,0.182889,-0.174863,0.000000,0.000000,0.000000,0.000000,0.000000,2024-01-12 21:06:00
93,2024-01-12 21:08:00,"[Top Stock Reports for NVIDIA, Boeing Sony]",[Todays Research Daily features new research r...,"[Friday, January 12, 2024 The Zacks Research D...","[Top Stock Reports NVIDIA , Boeing Sony]",[Todays Research Daily feature new research re...,"[Friday , January 12 , 2024 Zacks Research Dai...",[0.5],[0.06628787878787878],"[0.5, 0.06628787878787878, -0.0999999999999999...",...,0.929975,0.728547,0.186859,0.000000,0.000000,0.059901,0.000000,0.000000,0.108448,2024-01-12 21:07:00
92,2024-01-12 21:47:23,[FAA increasing oversight of Boeing 737 Max ma...,[In light of recent Boeing (BA) 737 Max 9 inci...,[In light of recent Boeing (BA) 737 Max 9 inci...,[FAA increasing oversight Boeing 737 Max manuf...,[light recent Boeing ( BA ) 737 Max 9 incident...,[light recent Boeing ( BA ) 737 Max 9 incident...,[0.0],"[0.2, 0.0, -0.15555555555555559, 0.31727272727...","[0.2, 0.0, -0.15555555555555559, 0.31727272727...",...,0.701685,-0.063543,0.003233,0.000000,-0.743456,-0.268857,0.000000,0.000000,-0.587720,2024-01-12 21:45:00


In [None]:
# def drop_na(df):
#     # Drop all the news_content with na
#     print(f"Before dropping na: {df.isna().sum().sum()}")
#     df1 = df.dropna()
#     df1.reset_index(inplace=True, drop=True)
#     print(f"After dropping na: {df.isna().sum().sum()}")
#     return df1

# drop_na(df9).head()

In [128]:
techs = ['sentic', 'blob', 'sid', 'bert', 'finbert']
cols = ['cln_hdl', 'cln_smr', 'cln_news',
        'cln_hdl_lemma', 'cln_smr_lemma', 'cln_news_lemma']

col_list = []
for tech in techs:
    for col in cols:
        col_name = f'{col}_pol_{tech}_score'
        if col_name in df9.columns:
            col_list.append(col_name)
print(col_list)



['cln_hdl_pol_bert_score', 'cln_smr_pol_bert_score', 'cln_news_pol_bert_score', 'cln_hdl_lemma_pol_bert_score', 'cln_smr_lemma_pol_bert_score', 'cln_news_lemma_pol_bert_score', 'cln_hdl_pol_finbert_score', 'cln_smr_pol_finbert_score', 'cln_news_pol_finbert_score', 'cln_hdl_lemma_pol_finbert_score', 'cln_smr_lemma_pol_finbert_score', 'cln_news_lemma_pol_finbert_score']


In [129]:
df_list = []

for col_name in col_list:
    tmp = df9.groupby('closest_date', as_index=False)[col_name].mean().reset_index(drop=True) 
    df_list.append(tmp)
# print(df_list)

# # Assumes df_list has at least two elements
# merged = df_list[0]
# for i in range(1, len(df_list)):
#     merged = pd.merge(left=merged, right=df_list[i], on='closest_date', how='inner')
# merged

from functools import reduce
# A simpler implementation
merged = reduce(lambda left, right: pd.merge(left, right, on='closest_date', how='inner'), df_list)
merged

Unnamed: 0,closest_date,cln_hdl_pol_bert_score,cln_smr_pol_bert_score,cln_news_pol_bert_score,cln_hdl_lemma_pol_bert_score,cln_smr_lemma_pol_bert_score,cln_news_lemma_pol_bert_score,cln_hdl_pol_finbert_score,cln_smr_pol_finbert_score,cln_news_pol_finbert_score,cln_hdl_lemma_pol_finbert_score,cln_smr_lemma_pol_finbert_score,cln_news_lemma_pol_finbert_score
0,2024-01-02 12:10:00,0.711817,0.723454,0.356488,0.727048,0.733606,0.311212,0.939747,0.764613,0.787006,0.926775,0.558279,0.730209
1,2024-01-02 13:48:00,-0.215074,-0.651566,-0.651566,-0.211657,-0.749018,-0.749018,-0.876752,0.000000,0.000000,-0.946479,0.000000,0.000000
2,2024-01-02 17:17:00,-0.692009,-0.040292,-0.168793,-0.685612,-0.071711,-0.279021,-0.853149,-0.332891,-0.140990,-0.541524,-0.509943,-0.288236
3,2024-01-02 19:19:00,0.634973,-0.488042,-0.303654,0.659485,-0.800565,-0.380766,0.000000,-0.971305,-0.279670,0.000000,-0.961590,0.085610
4,2024-01-03 14:07:00,0.756132,0.811201,0.372040,0.770482,0.812814,0.280963,0.897753,0.867298,0.863019,0.894005,0.000000,0.878530
...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,2024-01-12 20:40:00,-0.642240,0.019090,0.305772,0.633881,-0.253080,0.115502,0.857537,0.711469,0.820392,0.644400,0.770639,0.814048
90,2024-01-12 21:06:00,-0.269338,0.000000,0.000000,0.442367,0.182889,0.182889,-0.174863,0.000000,0.000000,0.000000,0.000000,0.000000
91,2024-01-12 21:07:00,0.925480,0.715107,0.168182,0.929975,0.728547,0.186859,0.000000,0.000000,0.059901,0.000000,0.000000,0.108448
92,2024-01-12 21:45:00,0.687002,-0.051605,0.101750,0.701685,-0.063543,0.003233,0.000000,-0.743456,-0.268857,0.000000,0.000000,-0.587720


In [130]:
merged2 = pd.merge(left=tick, right=merged, left_on='Datetime', right_on='closest_date', how='left')
merged2.reset_index(inplace=True, drop=True)

In [131]:
merged2[merged2.index == 1873]

Unnamed: 0,Volume,VWAP,Open,Close,High,Low,Timestamp,Transactions,Datetime,closest_date,...,cln_news_pol_bert_score,cln_hdl_lemma_pol_bert_score,cln_smr_lemma_pol_bert_score,cln_news_lemma_pol_bert_score,cln_hdl_pol_finbert_score,cln_smr_pol_finbert_score,cln_news_pol_finbert_score,cln_hdl_lemma_pol_finbert_score,cln_smr_lemma_pol_finbert_score,cln_news_lemma_pol_finbert_score
1873,100.0,248.99,248.99,248.99,248.99,248.99,1704493620000,1,2024-01-05 22:27:00,2024-01-05 22:27:00,...,-0.394173,-0.706452,-0.719321,-0.563449,-0.438621,0.183951,-0.45101,0.0,0.0,-0.563889


In [134]:
# Choose col_name to describe
merged2[col_name].describe()

count    94.000000
mean     -0.260669
std       0.585878
min      -0.970300
25%      -0.843885
50%      -0.301096
75%       0.079261
max       0.951613
Name: cln_news_lemma_pol_finbert_score, dtype: float64

In [135]:
merged2[merged2['Datetime'] >= pd.to_datetime('2024-01-03T14:17:00')]


Unnamed: 0,Volume,VWAP,Open,Close,High,Low,Timestamp,Transactions,Datetime,closest_date,...,cln_news_pol_bert_score,cln_hdl_lemma_pol_bert_score,cln_smr_lemma_pol_bert_score,cln_news_lemma_pol_bert_score,cln_hdl_pol_finbert_score,cln_smr_pol_finbert_score,cln_news_pol_finbert_score,cln_hdl_lemma_pol_finbert_score,cln_smr_lemma_pol_finbert_score,cln_news_lemma_pol_finbert_score
547,207.0,247.7027,247.70,247.7000,247.7000,247.7000,1704291420000,6,2024-01-03 14:17:00,2024-01-03 14:17:00,...,0.204607,0.715425,0.752041,0.121834,0.920367,0.936394,0.921957,0.511063,0.860812,0.869526
548,308.0,247.7006,247.70,247.7000,247.7000,247.7000,1704291540000,8,2024-01-03 14:19:00,NaT,...,,,,,,,,,,
549,264.0,247.6105,247.60,247.6000,247.6000,247.6000,1704291780000,9,2024-01-03 14:23:00,NaT,...,,,,,,,,,,
550,1157.0,247.5724,247.60,247.5031,247.6001,247.5031,1704291840000,49,2024-01-03 14:24:00,NaT,...,,,,,,,,,,
551,563.0,247.5684,247.60,247.6000,247.6000,247.6000,1704291900000,19,2024-01-03 14:25:00,NaT,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5204,632.0,217.2991,217.30,217.3000,217.3000,217.3000,1705107000000,7,2024-01-13 00:50:00,NaT,...,,,,,,,,,,
5205,425.0,217.3412,217.35,217.3500,217.3500,217.3500,1705107060000,3,2024-01-13 00:51:00,NaT,...,,,,,,,,,,
5206,836.0,217.3015,217.30,217.3000,217.3000,217.3000,1705107300000,16,2024-01-13 00:55:00,NaT,...,,,,,,,,,,
5207,2492.0,217.0491,217.16,217.0200,217.1600,217.0000,1705107420000,52,2024-01-13 00:57:00,NaT,...,,,,,,,,,,


# Backtesting
- Pros
    - Test single strategy
    - Have optimizer, graphs
- Cons
    - Cannot trade multiple assets FIXME: not applicable to portfolio
    - Does not trade fractional shares
https://kernc.github.io/backtesting.py/#example


- Other backtesting framework: backtrader, zipline - both can do multi-asset trading
- Backtrader works with Pandas DataFrames, CSV, and real-time data feeds from Interactive Brokers, Oanda, and Visual Chart. 
- 2% rule: https://www.investopedia.com/terms/t/two-percent-rule.asp#:~:text=What%20Is%20the%202%25%20Rule,capital%20on%20any%20single%20trade.
- Try to have less than 10% of drawdown: https://www.quora.com/How-do-I-use-the-never-risk-more-than-2-rule-in-Forex-trading


Hypothesis
- Takes in a df from start to end, with all the ticker data (including those NA for sentiment)
- Enters trade at 549 (My information should backfill)
548	308.0	247.7006	247.7000	247.7000	247.7000	247.7000	1704291540000	8	2024-01-03 14:19:00	2024-01-03 14:19:00	0.156808
549	264.0	247.6105	247.6000	247.6000	247.6000	247.6000	1704291780000	9	2024-01-03 14:23:00	NaT	NaN
550	1157.0	247.5724	247.6000	247.5031	247.6001	247.5031	1704291840000	49	2024-01-03 14:24:00	NaT	NaN
- I can compare the results between lemmatization or not, and fix other variables constant
- I can compare the results between different content and fix others constant



In [207]:
class SimpleStmStrat(Strategy):
    """
    Use a proportional amount of cash to trade with the sentiment score indicator.
    """
    # Initialising here does not work
    def init(self):
        super().init()
        # Initialize additional indicators here if needed
        # self.trade_size = 40 # This times the next open price cannot exceed equity
        self.sl_pct = 0.01
        self.tp_pct = 0.02
        self.risk_per_trade = 0.5 # Maximum 2% of the portfolio on one trade
        # ['cln_hdl_pol_bert_score', 'cln_smr_pol_bert_score', 'cln_news_pol_bert_score', 'cln_hdl_lemma_pol_bert_score', 'cln_smr_lemma_pol_bert_score', 'cln_news_lemma_pol_bert_score', 'cln_hdl_pol_finbert_score', 'cln_smr_pol_finbert_score', 'cln_news_pol_finbert_score', 'cln_hdl_lemma_pol_finbert_score', 'cln_smr_lemma_pol_finbert_score', 'cln_news_lemma_pol_finbert_score']
        self.col = 'cln_smr_pol_finbert_score'
    def next(self):
        cur_stm = self.data[self.col][-1]
        # print(self.data['closest_date'][-1])
        cur_price = self.data['Close'][-1]

        # print(f"-----{self.data['Datetime'][-1]}-----")
        trade_size = (0.5 * (abs(cur_stm) ** 2) + 0.5) * self.risk_per_trade
        if (cur_stm > 0): # Many losses if I don't take
            self.buy(size=trade_size, sl=(1 - self.sl_pct) * cur_price, tp=(1 + self.tp_pct) * cur_price)
            # If size is a value between 0 and 1, it is interpreted as a fraction of current available liquidity (cash plus Position.pl minus used margin). A value greater than or equal to 1 indicates an absolute number of units.

            # print("Trade here.")
        elif cur_stm < 0:
            self.sell(size=trade_size, sl=(1 + self.sl_pct) * cur_price, tp=(1 - self.tp_pct) * cur_price)
            # print("Trade here.")
        elif (cur_stm == 0):
            pass
            # print("No trade.")
        # print(cur_stm)
# Running the backtest
bt = Backtest(data=merged2, 
              strategy=SimpleStmStrat, 
              cash=10000, 
              margin=1,
              commission=.0,
              trade_on_close=False,
              hedging=True
              ) # TODO: Adjust commission
results = bt.run()
# TODO: 
# bt.optimize(maximise='SQN',
#             method='grid',
#             max_tries=None,
#             constraint=None,
#             return_heatmap=True,
#             return_optimization=False # For the method 'skopt
#             ) 
display(results)
bt.plot(results=results, plot_return=True) # TODO: Can have filename, plot in html

# These are the main results that we need
print(results.get('Return [%]'), results.get('Max. Drawdown [%]'), results.get('# Trades'), results.get('Win Rate [%]'))


  bt = Backtest(data=merged2,


Start                                     0.0
End                                    5208.0
Duration                               5208.0
Exposure Time [%]                   76.195047
Equity Final [$]                  10092.30228
Equity Peak [$]                    10164.0893
Return [%]                           0.923023
Buy & Hold Return [%]              -16.596235
Return (Ann.) [%]                         0.0
Volatility (Ann.) [%]                     NaN
Sharpe Ratio                              NaN
Sortino Ratio                             NaN
Calmar Ratio                              0.0
Max. Drawdown [%]                   -5.490053
Avg. Drawdown [%]                   -0.417077
Max. Drawdown Duration                 3453.0
Avg. Drawdown Duration             235.227273
# Trades                                 68.0
Win Rate [%]                        47.058824
Best Trade [%]                       2.110485
Worst Trade [%]                     -6.827858
Avg. Trade [%]                    

INFO:bokeh.io.state:Session output file 'SimpleStmStrat.html' already exists, will be overwritten.


  fig = gridplot(
  fig = gridplot(


0.9230228000000534 -5.490053142291851 68.0 47.05882352941176


In [None]:
# TODO: Shift the polarity down or the date up (the data with today's date is the data from ytd)
# TODO: How to add the number of positions to put? Or leverage?
# Assume your data includes 'Open', 'Close', and 'Sentiment' columns
# data = pd.read_csv('your_stock_data.csv', parse_dates=True, index_col='Date')

class SentimentStrategy(Strategy):
    """
    
    """
    def init(self):
        super().init()
        # Initialize additional indicators here if needed
        self.long_pos_tsh = 0.4
        self.short_pos_tsh = -0.4
        self.trade_size = 100
        self.sl_pct = 0.005
        self.tp_pct = 0.005
        # self.COL = 'cln_news_pol_sid_score'
        self.COL = 'cln_hdl_pol_bert_score'
    def next(self):
        
        print(f"-----{self.data['Datetime'][-1]}-----")
        # It doesn't display the first, only start from the next
        # print(self.data)
        curr_stm = self.data[self.COL][-1]
        # print(self.data['closest_date'][-1])

        cur_price = self.data['Close'][-1]
        # TODO: Position represents all trades

        # Strong signals
        if curr_stm > self.long_pos_tsh and self.position.is_short:
            self.position.close()
            self.buy(size=self.trade_size * 2, sl=(1 - self.sl_pct) * cur_price, tp=(1 + self.tp_pct) * cur_price)
            # self.buy(size)
            logging.warning(f"curr_stm > {self.long_pos_tsh} but position is short.")
        elif curr_stm < self.short_pos_tsh and self.position.is_long:
            self.position.close()
            self.sell(size=self.trade_size * 2, sl=(1 + self.sl_pct) * cur_price, tp=(1 - self.tp_pct) * cur_price)
            logging.warning(f"curr_stm < {self.short_pos_tsh} but position is long.")
        else:
            if curr_stm > 0.4:  # Threshold for going long
                self.buy(size=self.trade_size, sl=(1 - self.sl_pct) * cur_price, tp=(1 + self.tp_pct) * cur_price)
            elif curr_stm < -0.4:
                self.sell(size=self.trade_size, sl=(1 + self.sl_pct) * cur_price, tp=(1 - self.tp_pct) * cur_price)
            # for trade in self.trades:
            #     print(f'Trade entry price, time:{trade.entry_price}, {trade.entry_time}')
            #     print(f'Is long, PL: {trade.is_long}, {trade.pl}')

# 1. Slippage: Incorporate slippage into your trades, if your backtesting framework allows.
# 2. Execution Price: Decide whether to execute at the current day's close or the next day's open.
# 3. Risk Management: Implement risk management strategies like stop-loss orders.

# Running the backtest
bt = Backtest(data=merged2, 
              strategy=SentimentStrategy, 
              cash=10000, 
              margin=0.1,
              commission=.0,
              trade_on_close=False,
              hedging=True
              ) # TODO: Adjust commission
results = bt.run()
# TODO: 
# bt.optimize(maximise='SQN',
#             method='grid',
#             max_tries=None,
#             constraint=None,
#             return_heatmap=True,
#             return_optimization=False # For the method 'skopt
#             ) 
display(results)
bt.plot() # TODO: Can have filename, plot in html

# TODO: Change to use close instead of open, change html location

Return in %
- Basic: -4.060443
- Only buy with strong signals: -3.495643
- SID: 11.621226

In [22]:
results.get('Return [%]')

-4.96074300000002