# Index Performance Analysis

Refer to the [Introduction](/home/gnewy/workspace/starxetp/notebooks/introduction.ipynb) on how to use the notebook.

The intent of this notebook is to discuss the types of indeces and their potential to analyze an ETP's performance. Of particular interest are:

1) Average Directional Index (ADX) &nbsp; 2) Sharpe ratio &nbsp; 3) Sortino ratio &nbsp; 4) Calmar ratio <br>
5) Treynor ratio &nbsp; 6) Jensen’s alpha &nbsp; 7) Tracking error &nbsp; 8) Information ratio


The methodology and their relevance are discussed in the subsequent sections

### ETP market quality
* Exchange Traded Product (ETP) investors would look at [five key metrics](https://www.ishares.com/us/insights/etf-trends/etp-market-quality-metrics) when assessing market quality. _For our purpose of designing the ETP we are only interested in the first three_.
   1. ___Usage___: liquidity of an ETP is a key component of market quality - ETP trading volumes are important because increased liquidity can create a network effect; i.e. the most heavily traded ETPs are typically the cheapest to trade, which spurs even more ___usage___.
   1. ___Tracking___: difference and volatility reflect an ETP's ability to deliver returns that are consistent its benchmark, as well as closely replicate benchmark performance consistently over time. An index ETP with high market quality should deliver this consistency in all market conditions.
   1. ___Trading costs___: When the cost of rebalancing an ETP higher than a tolerance band to that of the ETP’s underlying holdings, or exhibits less sensitivity to stressed market conditions, it is a potential signal of high market quality.
   1. ___Premium/discounts behavior___: ETP premiums and discounts in illiquid or volatile markets can indicate an ETP is providing price discovery—both signals of market quality.
   1. ___Primary market efficiency___ - A diverse set of authorized participants and a stable platform are crucial for insight into the ETP’s market qualityfor because the ETP’s primary market operations must be efficient.
* Preperations for the [Backtesting](https://teddykoker.com/2019/05/momentum-strategy-from-stocks-on-the-move-in-python/)


In [1]:
'''
    WARNING CONTROL to display or ignore all warnings
'''
import warnings; warnings.simplefilter('default')     #switch betweeb 'default' and 'ignore'

''' Set debug flag to view extended error messages; else set it to False to turn off debugging mode '''
debug = True

## Load data

* The current test dataset is from ___2021-01-01___ to ___01-06-2022___. At this stage the full dataset from the past decade and beyond in unavailable but will be made available in the subsequence phase to support the backtesting. It requires writing a script that will systematically retrieve the data because coindesk, for example, only allows small payloads of data at a time.
* To filter the data set for a shorter time span, change the year (YYYY), month (m), and day (d) of the parameters
   * start_dt ("start date") and _end_date ("end date")
   * Example change the year, month, and day as you desire: 
      * ```start_dt = datetime.date(2022,1,1)``` &nbsp; &nbsp; &nbsp; &nbsp;# implies 2022 January 01; (must be ${\ge}$ 2021 January 01)
      * ```end_dt = datetime.date(2022,3,1)``` &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # implies 2022 March 01; (must be ${\le}$ 2022 June 01)

In [2]:
import sys
sys.path.insert(1, '../lib')
import clsDataETL as etl
import datetime

'''
    To filter data by a date range change the two date parameters below
'''
start_dt = datetime.date(2022,1,1)
end_dt = datetime.date(2022,3,1)

if debug:
    import importlib
    etl = importlib.reload(etl)

''' Set the data source and temporal range '''
_path = "../data/market_cap_2021-01-01_2022-06-01/"

''' Initialize the dataETL class '''
print("Loading and filtering data ... this may take a while.")
clsETL = etl.ExtractLoadTransform()

''' Load data into dataframe '''
rec_marketcap_df=clsETL.load_data(dataPath=_path, start_date=start_dt, end_date=end_dt)
rec_marketcap_df.dropna(axis=0,how='any',subset=['market_cap'],inplace=True)
print("Loaded %d rows %s" % (rec_marketcap_df.shape[0],str(rec_marketcap_df.columns)))

''' Transform data with coin ids in columns '''
piv_marketcap_df = rec_marketcap_df.pivot_table(values=['market_cap'], index=rec_marketcap_df.Date, columns='ID', aggfunc='first')
piv_marketcap_df.columns = piv_marketcap_df.columns.droplevel(0)
piv_marketcap_df.dropna(axis=1,how='all', inplace=True)
piv_marketcap_df.reset_index(inplace=True)

print("Data from %s to %s loaded and transformed into a pivot table with %d rows complete!" % (str(rec_marketcap_df.Date.min()),
                                                                    str(rec_marketcap_df.Date.max()),
                                                                    piv_marketcap_df.shape[0]))

All packages in ExtractLoadTransform loaded successfully!
All packages in ExtractLoadTransform loaded successfully!
Loading and filtering data ... this may take a while.
Loaded 419 rows Index(['Date', 'ID', 'Symbol', 'market_cap'], dtype='object')
Data from 2022-01-01 to 2022-03-01 loaded and transformed into a pivot table with 60 rows complete!


## Compute and Augment the Dataset

The Simple Moving Average (SMAvg), Simple Moving Standard Deviation (SMStd), Simple Moving Sum (SMSum), and Momentum are essential for computing the indeces across the entire time series of ticker-wise market caps.

Run this cell to augement the dataset with desired new values.

In [3]:
from datetime import date

start_dt = date(2022,1,1)
end_dt = date(2022,3,1)

_cal_ops_dict = {
    "simp_move_avg" : "market_cap",
    "simp_move_std" : "market_cap",
    "simp_move_sum" : "market_cap",
    "momentum" : "market_cap",
}
_results_df = clsETL.get_rolling_measures(ticker_data=rec_marketcap_df,
                                                rolling_window_length=7,
                                                window_start_date = start_dt,
                                                window_end_date = end_dt,
                                                rolling_measure_dict = _cal_ops_dict,)
_results_df.reset_index(inplace=True)
''' Print the outputs '''
for op_key in _cal_ops_dict.keys():
    col_name = op_key+'_'+_cal_ops_dict[op_key]
    col_count = _results_df.filter([col_name]).count(axis=0, numeric_only=True)
    if _results_df[col_name].shape[0]>0:
        print("%s has %d non-empty rows"
              % (col_name,col_count))
# _results_df.to_csv('../data/rolling_values.csv')
print("rolling value computations complete!")


simp_move_avg_market_cap has 419 non-empty rows
simp_move_std_market_cap has 412 non-empty rows
simp_move_sum_market_cap has 419 non-empty rows
momentum_market_cap has 412 non-empty rows
rolling value computations complete!


# COMPUTE Log ROR

Next we must compute the Log ROR for the assets.  

## Instantiate ETPreturns class

In [59]:
import sys
sys.path.insert(1, '../lib')
import clsETPreturns as returns

if debug:
    import importlib
    returns = importlib.reload(returns)

data_name = "coindesk"
clsROR = returns.RateOfReturns(name=data_name)
print(dir(clsROR))

All packages in clsETPReturns loaded successfully!
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'days_offset', 'get_coin_cov_cor_coef_matrix', 'get_holding_period_returns', 'get_logarithmic_returns', 'get_simple_returns', 'maximize_weights', 'name', 'p_val', 'sum_weighted_returns', 'window_length']


## Top-N Assets

The top-N Assets are selected on their statistical significance of being one standard deviation from the mean. After some testing this policy might be changed to taking the number of assets that are one or two standard deviations from the asset with the highest market-cap. Since the mean is inclined towards the assets with the highest market-cap it may not be significanlly different from the current approach.

___This logic needs to be further investigated and improved because:___
* a contineously downward or declining trend may require rebalancing based on not the top-N significant assets?
* seected assets within one standard deviation might contain assets with a downward or declining trend

In this cell we demonstrate the process of selecting the top-N market-cap assets based on their Log ROR and the statistical significance. 
1. Calculate the Log ROR of all the assets for each day: ${R(t,x_i)}$ = ${log \left({R(t,x_i) \over R(t+1,x_i)} \right)}$
1. Select the assets with market-cap ${R(t,x_i) \le 0}$ because there is an increase in the market-cap value from the previous day ${t}$ to the next day ${t+1}$.
1. From that list select the _top-N_ assets with the highest statistically significant negative log ROR.

Run this cell to compute the top-N assets

In [60]:
import numpy as np

_kwargs = {'greater than': 0,
            'max num coins': 5}
#topN = 5
''' get the log rate of returns '''
actual_log_ror = clsROR.get_logarithmic_returns(_results_df, value_col_name='simp_move_avg_market_cap')

_neg_log_df = actual_log_ror.copy()
_neg_log_df.dropna(axis=0, how='any', inplace=True)
_neg_log_df = _neg_log_df.sort_values(by=['Date','simp_move_avg_market_cap_ror'])
_neg_log_df['simp_move_avg_market_cap_ror'] = _neg_log_df['simp_move_avg_market_cap_ror']*(-1)
#_topNassets_df = clsETL.get_fixed_topN_assets(_neg_log_df, N=topN, val_col_name='ror')
_topNassets_df = clsETL.get_significant_topN_assets(_neg_log_df,
                                                    val_col_name='simp_move_avg_market_cap_ror',
                                                    **_kwargs)
_topNassets_df['simp_move_avg_market_cap_ror'] = _topNassets_df['simp_move_avg_market_cap_ror']*(-1)
_topNassets_df.reset_index(inplace=True)
print("Completed getting %d list with top assets" % (_topNassets_df.shape[0]))

Completed getting 188 list with top assets


### Plot the Top-N assets
Run this cell to view the top-N assets that are selceted based on the criteria explained in the previous cell.

In [62]:
''' Plot the top-N assets'''
import plotly.express as px
import pandas as pd

_plot_topN_df = _topNassets_df.copy()
# _plot_topN_df['market_cap'] = 0

# for idx in _plot_topN_df.index:
#     mask = (rec_marketcap_df.Date == _plot_topN_df.Date[idx]) & \
#             (rec_marketcap_df.ID == _plot_topN_df.ID[idx])
#     _plot_topN_df['market_cap'][idx] = rec_marketcap_df.loc[mask,'market_cap']

_plot_topN_df = pd.merge(_topNassets_df,
                         rec_marketcap_df,how='inner',
                         on=['Date','ID'])

_min_date = _plot_topN_df["Date"].min()
_max_date = _plot_topN_df["Date"].max()

_title = "Top-N statistically significant Assets "+str(_min_date)+" to "+str(_max_date)
fig = px.scatter(_plot_topN_df, x="Date", y=['market_cap'],
#               color='red',
              hover_data={"Date": "|%B %d, %Y", "ID": True},
              title=_title,)

fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()

## Weighted Sum of the Log ROR

Now we calculate the weighted sum of the selected significant top-N assets. This follows the Modern Portfolio Theory (MPT) approach. 

Sum of the weighted actual portfolio allocation at time ${t}$: ${F_{t}({Y})}$  = $\sum_{k_i = 1}^{N<n} {w(t,x_{k_{i}})} \times {R(t,x_{k_i})}$

In this procedure, instead of executing a Monte Carlo simulation to select the weights and assets, we employ a randomized method that selects the best weights that maximizes the Log ROR.
1. A set of ```_size = 100``` weights is generated for each iteration
1. The significant top-N assets for each day is multiplied by the weights
1. Select the set of weights that produce the highest Log ROR

In [82]:
import pandas as pd

''' initialize the parameters '''
_size=100

''' merge the data with the top-N assets'''
_merged_actual_ror = pd.merge(_topNassets_df,
                              rec_marketcap_df,how='inner',
                              on=['Date','ID'])
_merged_actual_ror.dropna(axis=0, how='any', inplace=True)
_merged_actual_ror = _merged_actual_ror.sort_values(by=['Date','simp_move_avg_market_cap_ror'], ascending=True)

_actual_weighted_sum_df = clsROR.sum_weighted_returns(
    _merged_actual_ror,
    size=_size,
    value_col_name='simp_move_avg_market_cap_ror'
)
# _actual_weighted_sum_df.rename(columns={'date':'Date'}, inplace=True)
print('Data merge complete!')

Data merge complete!


## Plot Weighted Sum and Bitcoin market cap
The next two cells plot the weighted sum of the ETP and the Bitcoin market-caps to compare the differences

### Weighted sum plot

In [83]:
import plotly.express as px

''' To plot the data transform back to coind ids to be individual columns '''
weigted_sum_df = pd.DataFrame(_actual_weighted_sum_df)
_min_date = weigted_sum_df["date"].min()
_max_date = weigted_sum_df["date"].max()
weigted_sum_df.reset_index(inplace=True)

_title = "Weighted ETP Logarithmic Returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.line(weigted_sum_df, x="date", y=['weighted_market_cap_returns'],
#               color='red',
              hover_data={"date": "|%B %d, %Y", "coins": True, "best_weights": True},
              title=_title,)

fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()

### Bitcoin plot

In [65]:
mask = (_results_df.Date >= _min_date) & (_results_df.Date <= _max_date) & (_results_df.ID == 'bitcoin')
bench_mark_df = _results_df.loc[mask]
bench_mark_df.sort_values(by=['Date'],inplace=True)

# bench_mark_df.reset_index(inplace=True)

_title = "Bitcoin ETP Logarithmic Returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.line(bench_mark_df, x="Date", y=['simp_move_avg_market_cap'],
#               color='red',
              hover_data={"Date": "|%B %d, %Y"},
              title=_title,)

fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



## Indicator based Trading strategies

### Momentum trading 
It assumes big moves will continue in the same direction. If the ETP weighted sum of returns is above a moving average, an uptrend is indicated. Directional Movement Index (DMI) is a technical indicator used by traders to help identify the strength of an uptrend or a downtrend in the market. The Average Direction Index (ADX) provides an indicator of the relative strength of the directional trend indicated by the Positive Direction Indicator (DI+) and the Negative Direction Indicator (DI-). Momentum trade can potentially reap big gains because getting in relatively early to a strong price trend is highly profitable.

### Mean reversion strategy
It allows traders to determine whether big moves will partly reverse or not. It assumes that the price of a stock always tends to move closer to the average price over time because most extreme events are often trailed by a period of normalization. Example - If the index droped 20% this month, the mean reversion theory would predict it will fall less than that percentage the following month. Use the [Relative Strength Index](https://phemex.com/academy/rsi-indicator-crypto-trading) (RSI) to complement their mean reversion strategy.
      * RSI is useful because it helps them determine which asset exhibits overbought or oversold price levels.
      * RSI Value >70% = Overbought, RSI Value <30% = Oversold
      * core of this indicator is based on the average upward market cap change vs. the average downward market cap change for a given period of time.
      * RSI = 100 – (100/1 + RS)
  We look at correlated assets to confirm the prediction. The quintessential mean reversion trading strategy has low-profit expectations and high frequency.


## Instantiate ETP Index class

In [70]:
import sys
sys.path.insert(1, '../lib')
import clsIndex as perform

if debug:
    import importlib
    perform = importlib.reload(perform)

data_name = "coindesk"
clsPerfIndex = perform.PortfolioPerformance(name=data_name)
print(dir(clsPerfIndex))

All packages loaded successfully!
All packages loaded successfully!
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'get_adx', 'get_value_index', 'name', 'p_val', 'rebalance_etp', 'sharp_ratio', 'sortino_ratio']


## Indicator based portfolio selection
___Intent is to use the indicators to make decisions on portfolio asset selection and rebalancing using indicators___

## ADX Indicator
Average Directional Index (ADX) indicator determines the intensity or strength of a trend. It will ensure that we don't rebalance with a weaker trend because there is a high probability of reversal compared to a stronger trend. Hence, combining the rebalancing during a directional trades with a stronger trend will achieve higher [hit ratio](https://www.investopedia.com/terms/w/win-loss-ratio.asp) and higher ROR on the weighted ETP.

* ADX consists of three indicators that measure a trend’s strength and direction - Direction Movement Index (DMI): ADX line, DI+ line, and DI- line. 
* The [calculated ADX](https://www.investopedia.com/terms/w/wilders-dmi-adx.asp#:~:text=The%20DMI%20is%20a%20collection,25%20indicates%20a%20strong%20trend.) value above 25 indicates that the trend is relatively strong and a value below 20 indicates that the trend is weak or that the markets are trading sideways
   * When the DI+ line is on top of DI-, the markets are in a bullish trend. Conversely, when the DI- line is above the DI+ line, the markets are in a bearish trend.
   * The ADX indicator helps traders calculate the expansion or contraction of an asset’s price range for a specific time.
* The DMI and ADX values are determined based on the range of price movements during the last 14 trading periods.
* Follow a similar process discussed in [Mathematical Intuition of the ADX Indicator: A Python Approach](https://blog.quantinsti.com/adx-indicator-python/).

In [84]:
''' Create ADX relevant columns '''

_value_col_name = 'simp_move_avg_market_cap_ror'

if debug:
    import importlib
    etl = importlib.reload(etl)

''' ADX function is defined in clsIndex.get_adx() It will return 
    a time series dataframe with daily ADX, +DI, and -DI values '''
''' Set the data source and temporal range '''
_start_dt = actual_log_ror.Date.min()
_end_dt = actual_log_ror.Date.max()
adx_df = clsPerfIndex.get_adx(ticker_data=actual_log_ror,
                            rolling_window_length=7,
                            value_col_name=_value_col_name,
                            window_start_date = _start_dt,
                            window_end_date = _end_dt,
                            )

# adx_df.to_csv("../data/adx.csv")

All packages in ExtractLoadTransform loaded successfully!
Index(['index', 'Date', 'ID', 'Symbol', 'market_cap',
       'simp_move_avg_market_cap', 'simp_move_std_market_cap',
       'simp_move_sum_market_cap', 'momentum_market_cap',
       'simp_move_avg_market_cap_ror'],
      dtype='object')
All packages in ExtractLoadTransform loaded successfully!


In [85]:
''' To plot the data transform back to coind ids to be individual columns '''
plot_adx_df = adx_df.loc[adx_df['ID']=='solana']
plot_adx_df.sort_values(by=['Date'], inplace = True)

_min_date = plot_adx_df["Date"].min()
_max_date = plot_adx_df["Date"].max()
_title = "Weighted ETP Logarithmic Returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.line(plot_adx_df, x="Date", y=['smooth+DM','smooth-DM'],
#               color='red',
              hover_data={"Date": "|%B %d, %Y", "ID": True,},
              title=_title,)

fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



## Index based optimization
___The objective is to optimize the performance ratios for an ETP portfolio asset picks and rebalancing___

### Performance Ratios
The calculation is adjusted for an ETP for a given time period ${\left[{T_{min},T_{max}}\right]}$.
* For an ETP portfolio consisting top ${N < n}$ set of assets ${Y}$
    * the weighted ROR at time ${t}$ is ${F({t,Y})}$  = $\sum_{k_i = 1}^{N<n} {w(t,x_{k_{i}})} \times {log \left({R(t,x_{k_i}) \over R(t+1,x_{k_i})}\right)}$
* Let ${z}$ be the asset (e.g. Bitcoin) that sets the benchmark to offer a risk free ROR.
    * Similarly to the Y set of assets, the weighted ROR for ${z}$ at time ${t}$ is ${F({t,z})}$  = ${w(t,z)} \times {log \left({R(t,z)} \over R(t+1,z)\right)}$

### Sharpe Ratio
It measures the performance of an investment (e.g., security or portfolio) compared to a risk-free asset, after adjusting for its risk. It is poor at estimating tail risks and as a results gave rise to the [PMPT](https://www.investopedia.com/terms/p/pmpt.asp).
* ${\forall_{t \in {\left[{T_{min},T_{max}}\right]}}}$
    * ___Sharpe ration___ = ${{\mu \left({F(t,x_{k_i})}\right) - \mu \left({F(t,z)}\right)} \over {\sigma(F(t,x_{k_i}))} }$; where ${\mu}$ is the expected vallue and ${\sigma}$ is the standard deviation

In [86]:
sharp = clsPerfIndex.sharp_ratio(piv_marketcap_df, investment=100, risk_free_rate=0.01/365)
print("\n Sharp Ratio")
print(sharp.sort_values(ascending=False))

-0.0004980844352952042

 Sharp Ratio
ID
ripple          0.016981
bitcoin         0.000000
ethereum       -0.051617
litecoin       -0.060197
cardano        -0.075909
solana         -0.104899
bitcoin_cash         NaN
dtype: float64


### Sortino Ratio
It is a portfolio optimization methodology that uses the downside risk of returns instead of the mean variance of investment returns used by the [MPT](https://blog.quantinsti.com/modern-portfolio-capital-asset-pricing-fama-french-three-factor-model/)
* ${\forall_{t \in {\left[{T_{min},T_{max}}\right]}}}$
    * ___Sortino ratio___ is almost the same as the Sharpe ratio except that the ${\sigma^-(F(t,x_{k_i}))}$ is the downside standard deviation of returns 

In [40]:
sortino = clsPerfIndex.sortino_ratio(piv_marketcap_df, investment=100, risk_free_rate=0.01/365)
print("\n Sortino Ratio")
print(sortino.sort_values(ascending=False))


 Sortino Ratio
ID
ripple          0.030394
bitcoin         0.000000
ethereum       -0.080442
litecoin       -0.094637
cardano        -0.149946
solana         -0.169963
bitcoin_cash         NaN
dtype: float64


### Calmar ratio
The Calmar ratio is the average annual rate of return for the last 36 months divided by the maximum drawdown for the last 36 months. It is calculated on a monthly basis. The Calmar ratio changes gradually and serves to smooth out the overachievement and underachievement periods of a performance more readily than the Sharpe ratio.

## Value Index

In [41]:
import plotly.express as px

index_df = clsPerfIndex.get_value_index(piv_marketcap_df)

_min_date = (index_df["Date"].min()).date()
_max_date = (index_df["Date"].max()).date()
_title = "Asset class value index "+str(_min_date)+" to "+str(_max_date)
fig = px.line(index_df, x="Date", y=index_df.columns,
              hover_data={"Date": "|%B %d, %Y"},
              title=_title)
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()