# QF603 Project - Alpha Generation, Portfolio Construction 

## by Team Alpha : DENG Ke, LEE How-Chih Howard, LEI YuMing, SOH Giap Hong Daniel , WANG WenJie, XUE Yuanhuang, YU ChunXue


### SCENARIO: We are an ETF Manager, actively managing our ETF products linked to popular stock indices for Asian clients.  In particular, we seek to generate Alpha for our “China A-50" ETF investors who like the diversity of the FTSE China A-50 Index, but are looking for outperformance versus the Index by taking a bit more risk from increased volatility of returns.  

### We look to examine the period from 01/01/2016 to 31/03/2022, a six-year period spanning Covid-19, which dramatically affected the economy and financial markets in China.  We seek to understand:
### <font color = 'red'> (1) how the advent of Covid–19 in Feb/Mar 2020 and subsequent social measures taken by governments affect our Alpha strategy, and 
### <font color = 'red'> (2) how the resulting regime change requires us to take into account new explanatory variables and recalibrate our models, in order to improve our Alpha strategy.


### (1) Define The Question
#### How can we generate Alpha for our actively managed ETF, benchmarked to the FTSE China A-50 Index, under normal market conditions and following external shocks that result in regime change?
#### Null hypothesis: that our ETF does not beat the China A-50 Index in a statistically significant way over the six-year time period


In [None]:
# Load Libraries and Set Print Options 

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib as mpl

import seaborn as sns

import time

import datetime as dt
import re

import plotly.express as px

import warnings
warnings.filterwarnings("ignore")

# Setting baseline seed
np.random.seed(230218)



In [None]:
# Set print options.

np.set_printoptions(precision = 3)

plt.style.use("ggplot") # Grammar of Graphics Theme

mpl.rcParams["axes.grid"] = True
mpl.rcParams["grid.color"] = "grey"
mpl.rcParams["grid.alpha"] = 0.25

mpl.rcParams["axes.facecolor"] = "white"

mpl.rcParams["legend.fontsize"] = 14

%matplotlib inline

### (2) Import Data from External Source/Wrangle Data for given time period (01/01/2016 to 31/08/2023)
### Convert absolute price change to pct_change, Clean up NaN values, ffill where required

In [None]:
# Download csv files from Google Drive QF603/data and read csv into DataFrame 
# A50 China Index.csv ==> Index_df, use this to compare against own A50 Index to make sure consistent
# all_data.csv ==> df_data, this includes more time periods than our examination period and includes the FTSE A50 Index
# all_weights.csv ==> df_weights
# [all_industry.csv] ==> all_class


In [None]:
all_weights = pd.read_csv('all_weights.csv',index_col=0)
all_weights

In [None]:
df_data = pd.read_csv('all_data.csv',index_col=0)
df_data

### (3-1) Construct BBoss_ETF using Total Capital = RMB 100 mio, for first training period (q-1) where q = 2Q2016 
### BBoss_ETF = (W1 x Index) + (W2 x Big_10) + (W3 x Best_10)
### where: W1 = 50%, W2 = 25%, W3 = 25%
### 	Index {'a50'} = A-50 Index, but scaled down by 0.5X
###     Big_10 {'size'} = 10 largest component Stocks by market value (in any one quarter), with an aggregate weighting within Index = A% (normally around 44%~48%). Each of Big 10 will have weightings a1 to a10 from the Index, and within the BBoss ETF, their weightings shall be (an/A)*W2%.  
### Best_10 {'mom'} = 10 component Stocks with the highest expected growth in stock price, forecast using Momentum Indicators. For simplicity, each of Best 10 will have equal weightings b within the ETF, where b = W3/10 = 2.5%.
### However, for first training period, there were no prior training periods, so start with Best_10 based on historical price data in period = (q-1)
### Introduce a Moving Window to your training/test set for q1 to q22 where q1=4Q2016 and q22=1Q2022. The ETF_stats should exhibit outperformance until 1Q2020, and then underperformance from 2Q2020 onwards. You will end up with 22 train/test sets
### You may also try and use one quarter to train and next quarter to test. q1=3Q2016 and q23=2022. You will end up with 23 training / test sets 
  

### (3-2) Run Exploratory Data Analysis on BBoss_ETF.  Run some stats on BBoss_ETF vs Index_df to set baseline for A50_Return, Strategy_Returns, Index_Return, and pass into dict_summary { }. Also work out formulas for Alpha, Beta, Sharpe, Sortino, etc. and pass your results into  dict_summary, timestamped for Start Date of your q1 period.  
### Assume risk-free rate for RMB is 2.00% pa

### (4-1) Model Data using Momentum Indicators to predict the weights for Best_10 for test period = q. Choose your own Momentum Indicators, which can be Rate of Change (n) where n is a whole integer ( currently used n=20 ), MACD(S,L) where Short = 5, Long = 14, or other combinations you deem effective, RSI (S,L)  where Short = 5, Long = 14) or other combinations you deem effective, or ARIMA (p,d,q).   
### See which one works best for you to maximise returns for BBoss_ETF for period = q
### Need to think how to switch component stocks in/out of BBoss_ETF and maintain intergity of BBoss_ETF. You may end up with one df for each test period.
### We are not testing y-hat as a prediction of future price, instead we are actually using y-hat to maximize Alpha, so no need to use Algos for testing RMSE

### (4-2) Insert write-up on Covid Measures and download all_industry.csv. 
 ### Allocate Industry Classifications to df_data through dummy variables

### (5) Introduce test for whether Alpha was significant enough to disprove Null Hypothesis, should be t-test. What about chi-square test? 

### (6) Presentation and Interpretation 