# Logarithmic Ratio of Returns (LogRoR)
Refer to the [Introduction](/home/gnewy/workspace/starxetp/notebooks/introduction.ipynb) on how to use the notebook.

In investory must consider factors such as downside risk, market conditions, and the length of time it will take for each investment to realize returns. They also need to consider opportunity costs that represent the potential benefits that an individual, investor, or business misses out on when choosing the ETP over another.

The _Return On Investment_ (ROI) of a single investment is the net price gain from holding the asset by the asset's original cost. The cost of an asset includes not only the purchase price, but also any commissions, management fees, or other expenses associated with the acquisition. __ETP ROI__ is the differential gain (or loss) of the sum of the weighted original market capital to that of the current sum of the weighted market capital.

## Modern Portfolio Theory
Instrumental to developing the ETP is the Modern Portfolio Theory (MPT) because it adopts a methodology that builds a low risk portfolio as a weighted sum of the assets based on their volatility (i.e. the risk defined by the standard deviation of the marker value). While MPT assumes allocation of weights to different asset classes, in this work we are considering a single asset class and the construction of an ETP that is the weighted sum of the crypto assets that offer lucrative returns based on their market cap. The [post-modern portfolio theory](https://www.investopedia.com/terms/p/pmpt.asp) (PMPT) attempts to improve on modern portfolio theory by minimizing downside risk instead of variance.

## Defining the single asset class ETP

* Let ${X}$ = {${x_i | i \in I}$} the set of crypto currency assets; where $I \subset \mathbb{N}$ is an index set 
* respectively the set of asset market capitalization at time ${t}$: ${R}$(${t,X}$) = {${R}$(${t,x_i}$) | ${i \in I}$}
* the set of top ${N}$ market capitalization assets ${Y}$ at time ${t}$: ${R}$(${t,{Y}}$) = {${R}$(${t,{x_{k_i}}}$) | ${k_i \in I_N}$}; where ${I_N}$ = {${k_i \in I}$}${_{i=1}^{N<n}}$
* The portfolio weight allocation for the top ${N<n}$ assets at time ${t}$: ${W}$(${t,Y}$) = {${w}$(${t,x_{k_i}}$) | ${k_i \in I_N}$}; such that $\sum_{k_i=1}^{N<n} {w_{t}(x_{k_{i}}) = 1.0}$
* Sum of the weighted portfolio (i.e., the _ETP_) allocation at time ${t}$: ${F_{t}({Y})}$  = $\sum_{k_i = 1}^{N<n} {w(t,x_{k_{i}})} \times {R(t,x_{k_i})}$
* __Objective__ is ${Max(F(t,Y))}$ > ${R^{'}(t,z)}$ by finding the optimal set of weights ${W(t,Y)}$; where ${R^{'}(t,z)}$ is an alternative ROI from a risk free investment
   * In this stage of the work, ${z = bitcoin}$ is considered the benchmark and risk free investment; i.e., ${R^{'}(t, bitcoin)}$

In [1]:
'''
    WARNING CONTROL to display or ignore all warnings
'''
import warnings; warnings.simplefilter('default')     #switch betweeb 'default' and 'ignore'

''' Set debug flag to view extended error messages; else set it to False to turn off debugging mode '''
debug = True

### Load data

* The current test dataset is from ___2021-01-01___ to ___01-06-2022___. At this stage the full dataset from the past decade and beyond in unavailable but will be made available in the subsequence phase to support the backtesting. It requires writing a script that will systematically retrieve the data because coindesk, for example, only allows small payloads of data at a time.
* To filter the data set for a shorter time span, change the year (YYYY), month (m), and day (d) of the parameters
   * start_dt ("start date") and _end_date ("end date")
   * Example change the year, month, and day as you desire: 
      * ```start_dt = datetime.date(2022,1,1)``` &nbsp; &nbsp; &nbsp; &nbsp;# implies 2022 January 01; (must be ${\ge}$ 2021 January 01)
      * ```end_dt = datetime.date(2022,3,1)``` &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # implies 2022 March 01; (must be ${\le}$ 2022 June 01)

In [83]:
import sys
sys.path.insert(1, '../lib')
import clsDataETL as etl
import datetime

'''
    To filter data by a date range change one or both date parameters below
'''
start_dt = datetime.date(2022,1,1)
end_dt = datetime.date(2022,3,1)

if debug:
    import importlib
    etl = importlib.reload(etl)

''' Set the data source and temporal range '''
_path = "../data/market_cap_2021-01-01_2022-06-01/"
''' Initialize the dataETL class '''
print("Loading and filtering data ... this may take a while.")
clsETL = etl.ExtractLoadTransform()
''' Load data into dataframe '''
rec_marketcap_df=clsETL.load_data(dataPath=_path, start_date=start_dt, end_date=end_dt)
rec_marketcap_df.dropna(axis=0,how='any',subset='market_cap',inplace=True)
rec_marketcap_df.sort_values(by=['Date','ID'], ascending=True,inplace=True)
print("Loaded %d rows %s" % (rec_marketcap_df.shape[0],str(rec_marketcap_df.columns)))
''' Transform data with coin ids in columns '''
piv_marketcap_df = rec_marketcap_df.pivot_table(values='market_cap', index=rec_marketcap_df.Date, columns='ID', aggfunc='first')
piv_marketcap_df.dropna(axis=1,how='all', inplace=True)
piv_marketcap_df.sort_values(by=['Date'],ascending=True,inplace=True)
piv_marketcap_df.reset_index(inplace=True)
print("Data from %s to %s loaded and transformed into a pivot table with %d rows complete!" % (str(rec_marketcap_df.Date.min()),
                                                                    str(rec_marketcap_df.Date.max()),
                                                                    piv_marketcap_df.shape[0]))

All packages in ExtractLoadTransform loaded successfully!
Loading and filtering data ... this may take a while.
Loaded 419 rows Index(['Date', 'ID', 'Symbol', 'market_cap'], dtype='object')
Data from 2022-01-01 to 2022-03-01 loaded and transformed into a pivot table with 60 rows complete!


# COMPUTE LogRoR

* [logarithmic return](https://www.rateofreturnexpert.com/log-return/) at ${t}$ is expressed in terms of market cap at time ${t}$ and ${t+1}$ in the following way 
   * rolling mean market cap values [CORRECTION]: ${S(t,x_{i}) = log \left({S(t,x_{i}) \over S(t+1,x_{i})}\right)}$
   * actuals market cap values: ${R(t,x_{i}) = log \left(R(t,x_{i}) \over R(t+1,x_{i})\right)}$

## Instantiate ETP class

In [60]:
import sys
sys.path.insert(1, '../lib')
import clsETPreturns as returns

if debug:
    import importlib
    returns = importlib.reload(returns)

data_name = "coindesk"
clsROR = returns.RateOfReturns(name=data_name)
print(dir(clsROR))
print("\nClass initiated!")

All packages in clsETPReturns loaded successfully!
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'days_offset', 'get_coin_cov_cor_coef_matrix', 'get_holding_period_returns', 'get_logarithmic_returns', 'get_simple_returns', 'maximize_weights', 'name', 'p_val', 'sum_weighted_returns', 'window_length']

Class initiated!


## Moving Average
* T-day rolling mean (or commonly know as [moving average](https://mathworld.wolfram.com/MovingAverage.html)) for all the market cap values for each coin id
   * Given a sequence of market cap values ${\{{R_t(x_i)}\}_{t=T_{min}}^{T_{max}}}$; where ${\left[{T_{min},T_{max}}\right]}$ is a time interval. The moving average is a new sequence $\{{S_t(x_i)}\}_{t=T_{min}}^{T_{max}}$; where ${T}$ is the time window lengthe (e.g. ${T=7}$)
   * ${S_t(x_i) = {1 \over n} \sum{R_j(x_i)}_{j=t-T+1}^{t}}$
   

In [86]:
from datetime import date

start_dt = date(2022,1,1)
end_dt = date(2022,3,1)

_cal_ops_dict = {
    "simp_move_avg" : "market_cap",
    "simp_move_std" : "market_cap",
}
_results_df = clsETL.get_rolling_measures(ticker_data=rec_marketcap_df,
                                                rolling_window_length=7,
                                                window_start_date = start_dt,
                                                window_end_date = end_dt,
                                                rolling_measure_dict = _cal_ops_dict,)
_results_df.to_csv('../data/rolling_values.csv')
print("rolling values computation complete!")

end dt 2022-03-01
start dt 2022-01-01
          Date            ID Symbol    market_cap
0   2022-01-01       bitcoin    btc  8.761929e+11
0   2022-01-01  bitcoin_cash    bch  0.000000e+00
0   2022-01-01       cardano    ada  4.219625e+10
0   2022-01-01      ethereum    eth  4.397909e+11
0   2022-01-01      litecoin    ltc  1.016561e+10
..         ...           ...    ...           ...
0   2022-03-01       cardano    ada  3.063100e+10
0   2022-03-01      ethereum    eth  3.490563e+11
0   2022-03-01      litecoin    ltc  7.901459e+09
0   2022-03-01        ripple    xrp  3.750062e+10
0   2022-03-01        solana    sol  3.184609e+10

[419 rows x 4 columns]
rolling values computation complete!


## Logarithmic ROR



### LogROR for rolling mean market cap values

In [100]:
import plotly.express as px
''' calculating the LogROR for the rolling mean '''
mean_log_ror = clsROR.get_logarithmic_returns(_results_df, value_col_name='simp_move_avg_market_cap')
''' Transform data with coin ids in columns '''
value_col_name='simp_move_avg_market_cap_ror'
# trans_mean_log_ror = mean_log_ror[['Date','ID',value_col_name]]
# trans_mean_log_ror[trans_mean_log_ror.index.duplicated()]
# trans_mean_log_ror = trans_mean_log_ror.dropna(axis=0,subset=value_col_name)
trans_mean_log_ror = mean_log_ror.pivot_table(values=value_col_name,
                                            index='Date',
                                            columns=['ID'],
                                            aggfunc='first')
trans_mean_log_ror.dropna(axis=1,how='all', inplace=True)
trans_mean_log_ror.reset_index(inplace=True)
_subset = [col for col in trans_mean_log_ror if col != 'Date']
# _mean_log_ror_plot = trans_mean_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
_mean_log_ror_plot = trans_mean_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
_mean_log_ror_plot.to_csv('../data/mean_log_ror_plot.csv')
# _subset = [col for col in trans_mean_log_ror if col != 'Date']
# _mean_log_ror_plot = trans_mean_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
# ''' recreate rolling mean dataframe '''
# _l_coin_ids = [col for col in _common_log_ror if (col != 'Date' and '_mean' in col)]
# _l_coin_ids.append('Date')
# _mean_log_ror_plot = _common_log_ror[_l_coin_ids]
# _mean_log_ror_plot.columns = _mean_log_ror_plot.columns.str.replace('_mean','')

''' To plot the data transform back to coind ids to be individual columns '''
_min_date = (_mean_log_ror_plot["Date"].min())
_max_date = (_mean_log_ror_plot["Date"].max())
_title = "Moving Average Logarithmic Returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.line(_mean_log_ror_plot, x="Date", y=_subset,
              hover_data={"Date": "|%B %d, %Y"},
              title=_title)
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()


distutils Version classes are deprecated. Use packaging.version instead.



In [103]:
import plotly.express as px

''' calculating the LogROR for the rolling standard deviation '''
stdv_log_ror = clsROR.get_logarithmic_returns(_results_df, value_col_name='simp_move_std_market_cap')
value_col_name='simp_move_std_market_cap_ror'
# trans_stdv_log_ror = mean_log_ror[['Date','ID',value_col_name]]
# trans_stdv_log_ror[trans_stdv_log_ror.index.duplicated()]
# trans_stdv_log_ror = trans_stdv_log_ror.dropna(axis=0,subset=value_col_name)
trans_stdv_log_ror = stdv_log_ror.pivot_table(values=value_col_name,
                                            index='Date',
                                            columns=['ID'],
                                            aggfunc='first')
trans_stdv_log_ror.dropna(axis=1,how='all', inplace=True)
trans_stdv_log_ror.reset_index(inplace=True)
_subset = [col for col in trans_stdv_log_ror if col != 'Date']
# _stdv_log_ror_plot = trans_stdv_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
_stdv_log_ror_plot = trans_stdv_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
_stdv_log_ror_plot.to_csv('../data/stdv_log_ror_plot.csv')

''' To plot the data transform back to coind ids to be individual columns '''
_min_date = (_stdv_log_ror_plot["Date"].min())
_max_date = (_stdv_log_ror_plot["Date"].max())
_title = "Moving Standard Deviation Logarithmic Returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.line(_stdv_log_ror_plot, x="Date", y=_subset,
              hover_data={"Date": "|%B %d, %Y"},
              title=_title)
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()


distutils Version classes are deprecated. Use packaging.version instead.



### LogROR for actual market cap values

In [105]:
import pandas as pd
import plotly.express as px

''' calculating the LogROR for the rolling standard deviation '''
actual_log_ror = clsROR.get_logarithmic_returns(_results_df, value_col_name='market_cap')
value_col_name='market_cap_ror'
# trans_stdv_log_ror = mean_log_ror[['Date','ID',value_col_name]]
# trans_stdv_log_ror[trans_stdv_log_ror.index.duplicated()]
# trans_stdv_log_ror = trans_stdv_log_ror.dropna(axis=0,subset=value_col_name)
trans_actual_log_ror = actual_log_ror.pivot_table(values=value_col_name,
                                            index='Date',
                                            columns=['ID'],
                                            aggfunc='first')
trans_actual_log_ror.dropna(axis=1,how='all', inplace=True)
trans_actual_log_ror.reset_index(inplace=True)
_subset = [col for col in trans_actual_log_ror if col != 'Date']
# _actual_log_ror_plot = trans_actual_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
_actual_log_ror_plot = trans_actual_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
_actual_log_ror_plot.to_csv('../data/actual_log_ror_plot.csv')

_min_date = (_actual_log_ror_plot["Date"].min())
_max_date = (_actual_log_ror_plot["Date"].max())
_title = "Actuals logarithmic Returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.line(_actual_log_ror_plot, x="Date", y=_actual_log_ror_plot.columns,
              hover_data={"Date": "|%B %d, %Y"},
              title=_title)
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()


distutils Version classes are deprecated. Use packaging.version instead.



In [11]:
import pandas as pd
import plotly.express as px

rec_momentum_mcap_df = clsETL.get_rolling_momentum(ticker_data=actual_log_ror,
                                                rolling_window_length=7,
                                                value_col_name='ror',
                                                window_start_date = start_dt,
                                                window_end_date = end_dt)
# print(rec_momentum_mcap_df)
trans_momentum_mcap_df = clsETL.transfrom_data(rec_momentum_mcap_df,value_col_name="ror")
_subset = [col for col in trans_momentum_mcap_df if col != 'Date']
_momentum_mcap_plot = trans_momentum_mcap_df.dropna(axis=0, how='all', subset=_subset, inplace=False)

_min_date = (_momentum_mcap_plot["Date"].min()).date()
_max_date = (_momentum_mcap_plot["Date"].max()).date()
_title = "Momentum of logarithmic Returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.line(_momentum_mcap_plot, x="Date", y=_momentum_mcap_plot.columns,
              hover_data={"Date": "|%B %d, %Y"},
              title=_title)
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()


distutils Version classes are deprecated. Use packaging.version instead.



### Covariance
* calculate the [covariance](https://mathworld.wolfram.com/Covariance.html) of ${S(t,x_i)}$ and ${R(t,x_i)}$ is denoted as ${COV\left(S(t,x_i),R(t,x_i)\right)}$ = ${E\left(S(t,x_i) R(t,x_i)\right) - E\left(S(t,x_i)\right) E\left(R(t,x_i)\right)}$

In [None]:
import json

_l_cov = clsROR.get_coin_cov_cor_coef_matrix(
    trans_actual_log_ror.dropna(axis=1, how='all', inplace=False), # datafram a - actual
    trans_mean_log_ror.dropna(axis=1, how='all', inplace=False),   # dataframe b - rolling mean
    suffix = ('_actual','_mean'))

print(json.dumps(_l_cov, indent=2))

[]


# DEPRECATED

___moved to etpPerformIndex notebook___

## Top-N Assets
${R(t,x_{i}) = log \left({R(t,x_{i}) \over R(t+1,x_{i})}\right) \le 0}$ implies that the ratio ${{R(t,x_{i}) \over R(t+1,x_{i})} \le 1.0}$. Thus there is an increase in the market cap value from ${t}$ to ${t+1}$. Therefore, the _top-N_ assets are the ones with the largest negative log ROR.  

In [40]:
import numpy as np

_kwargs = {'greater than': 0,
            'max num coins': 5}
#topN = 5
_neg_log_df = actual_log_ror.copy()
_neg_log_df.dropna(axis=0, how='any', inplace=True)
_neg_log_df = _neg_log_df.sort_values(by=['Date','ror'])
_neg_log_df['ror'] = _neg_log_df['ror']*(-1)
#_topNassets_df = clsETL.get_fixed_topN_assets(_neg_log_df, N=topN, val_col_name='ror')
_topNassets_df = clsETL.get_significant_topN_assets(_neg_log_df,
                                                    val_col_name='ror', **_kwargs)
_topNassets_df['ror'] = _topNassets_df['ror']*(-1)
_topNassets_df=_topNassets_df.reindex()
print("Completed getting %d list with top assets" % (_topNassets_df.shape[0]))

Completed getting 164 list with top assets


In [42]:
type(_topNassets_df.Date.max())

datetime.date

## Risk measure
* expected return on an investment is the expected value of the probability distribution of possible returns it can provide. The purpose of calculating the expected return on an investment is to provide an investor with an idea of probable profit vs risk. This gives the investor a basis for comparison with the risk-free rate of return. The interest rate on 3-month U.S. Treasury bills is often used to represent the risk-free rate of return.
* Risk of a single asset is the standard devsion ${\sigma}$ for a given time. We calculated the T-day (e.g. 7-day) [moving standard deviation](https://www.danielstrading.com/education/technical-analysis-learning-center/moving-standard-deviation) ${\sigma}^{(T)}$
* MPT uses variance as its measure of risk

In [47]:
import sys
sys.path.insert(1, '../lib')
import clsETPreturns as returns

if debug:
    import importlib
    returns = importlib.reload(returns)

data_name = "coindesk"
clsROR = returns.RateOfReturns(name=data_name)
print(dir(clsROR))

All packages loaded successfully!
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'days_offset', 'get_coin_cov_cor_coef_matrix', 'get_geometric_return', 'get_holding_period_returns', 'get_logarithmic_returns', 'get_simple_returns', 'maximize_weights', 'name', 'p_val', 'sum_weighted_returns', 'window_length']


## Actual weighted returns SUM
Sum of the weighted actual portfolio allocation at time ${t}$: ${F_{t}({Y})}$  = $\sum_{k_i = 1}^{N<n} {w(t,x_{k_{i}})} \times {R(t,x_{k_i})}$

### Merge dataframes to get all columns

In [49]:

_size=100
_merged_actual_ror = pd.merge(_topNassets_df,
                              rec_marketcap_df,how='outer',
                              on=['Date','ID'])
_merged_actual_ror.dropna(axis=0, how='any', inplace=True)
_merged_actual_ror = _merged_actual_ror.sort_values(by=['Date','ror'], ascending=True)
print(_merged_actual_ror.head(10))

         Date        ID       ror Symbol    market_cap
0  2022-01-03    solana -0.008847    sol  5.460443e+10
1  2022-01-03   bitcoin -0.007786    btc  8.975361e+11
2  2022-01-04   cardano -0.039437    ada  4.257930e+10
3  2022-01-04    solana -0.031966    sol  5.288655e+10
4  2022-01-04    ripple -0.031305    xrp  3.966750e+10
5  2022-01-04  litecoin -0.020565    ltc  1.030904e+10
6  2022-01-04   bitcoin -0.019356    btc  8.803302e+11
7  2022-01-04  ethereum -0.018207    eth  4.486096e+11
8  2022-01-05    solana -0.011679    sol  5.227248e+10
9  2022-01-05    ripple -0.007475    xrp  3.937207e+10


### get weighted sum of returns metrics

In [56]:

_l_actual_weighted_sum = clsROR.sum_weighted_returns(
    _merged_actual_ror,
    size=_size,
    value_col_name='ror'
)
print(_l_actual_weighted_sum)

[{'date': datetime.date(2022, 1, 3), 'coins': ['solana', 'bitcoin'], 'max_sum_row': 46, 'ror': [-0.008846735483933856, -0.0077864032143415245], 'best_weights': [0.0006683812220425357, 0.9993316187779576], 'weighted_ror_sum': -0.007787111920519647, 'weighted_market_cap_returns': 49733590928450.25}, {'date': datetime.date(2022, 1, 4), 'coins': ['cardano', 'solana', 'ripple', 'litecoin', 'bitcoin', 'ethereum'], 'max_sum_row': 30, 'ror': [-0.039437498732642046, -0.031965883608941766, -0.03130514703901118, -0.020565149890809785, -0.01935627635018719, -0.018207372396231266], 'best_weights': [0.016570940171056536, 0.06736159481489631, 0.03352324094761667, 0.1538221027444311, 0.022043778408610454, 0.706678342913389], 'weighted_ror_sum': -0.020313055138563556, 'weighted_market_cap_returns': 23380009763641.46}, {'date': datetime.date(2022, 1, 5), 'coins': ['solana', 'ripple', 'litecoin', 'cardano', 'bitcoin'], 'max_sum_row': 8, 'ror': [-0.011679010839337992, -0.007475413472633437, -0.00595942068

In [None]:
''' DEPRECATED after completing all actual, mean, and std logROR plots'''

import pandas as pd

''' get the moving averate logarithmic returns '''
# mean_log_ror = clsROR.get_logarithmic_returns(rec_sma_marketcap_df, value_col_name='sma')

# trans_mean_log_ror = clsETL.transfrom_data(mean_log_ror,value_col_name="ror")
# _subset = [col for col in trans_mean_log_ror if col != 'Date']
# _mean_log_ror_plot = trans_mean_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
''' get the moving standard deviation of the logarithmic returns '''
# stdv_log_ror = clsROR.get_logarithmic_returns(rec_smd_marketcap_df, value_col_name='smd')
trans_stdv_log_ror = clsETL.transfrom_data(stdv_log_ror,value_col_name="ror")
_subset = [col for col in trans_stdv_log_ror if col != 'Date']
_stdv_log_ror_plot = trans_stdv_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
''' moving logarithmic returns of actual market return '''
actual_log_ror = clsROR.get_logarithmic_returns(rec_marketcap_df, value_col_name='market_cap')
trans_actual_log_ror = clsETL.transfrom_data(actual_log_ror,value_col_name="ror")
_subset = [col for col in trans_actual_log_ror if col != 'Date']
_actual_log_ror_plot = trans_actual_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)

_common_log_ror = pd.merge(_actual_log_ror_plot,_mean_log_ror_plot,
                                how='inner', on=['Date'], suffixes=('_actual','_mean'))
''' re-create actuals dataframe  '''
_l_coin_ids = [col for col in _common_log_ror if (col != 'Date' and '_actual' in col)]
_l_coin_ids.append('Date')
_actual_log_ror_plot = _common_log_ror[_l_coin_ids] 
_actual_log_ror_plot.columns = _actual_log_ror_plot.columns.str.replace('_actual','')
# ''' recreate rolling mean dataframe '''
# _l_coin_ids = [col for col in _common_log_ror if (col != 'Date' and '_mean' in col)]
# _l_coin_ids.append('Date')
# _mean_log_ror_plot = _common_log_ror[_l_coin_ids]
# _mean_log_ror_plot.columns = _mean_log_ror_plot.columns.str.replace('_mean','')

print("Dataframes merged and cleaned, ready for plotting!")

[Error]Class <ExtractLoadTransform> Function <transfrom_data> "['ror'] not in index"
Traceback (most recent call last):
  File "/home/gnewy/workspace/starxetp/notebooks/../lib/clsDataETL.py", line 488, in transfrom_data
    tmp_df=tmp_df[['Date','ID',value_col_name]]
  File "/home/gnewy/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3511, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
  File "/home/gnewy/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 5782, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/home/gnewy/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 5845, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['ror'] not in index"

[Error]Class <ExtractLoadTransform> Function <transfrom_data> "['ror'] not in index"
Traceback (most recent call last):
  File "/home/gnewy/workspace/starxetp/notebooks/../lib/clsDataETL.py

In [None]:
''' DEPECATED moved to Plot'''
''' Transform data with coin ids in columns '''
value_col_name='simp_move_avg_market_cap_ror'
trans_mean_log_ror = mean_log_ror[['Date','ID',value_col_name]]
# trans_mean_log_ror[trans_mean_log_ror.index.duplicated()]
trans_mean_log_ror = trans_mean_log_ror.dropna(axis=0,subset=value_col_name)
trans_mean_log_ror = mean_log_ror.pivot_table(values=value_col_name,
                                            index='Date',
                                            columns=['ID'],
                                            aggfunc='first')
trans_mean_log_ror.dropna(axis=1,how='all', inplace=True)
trans_mean_log_ror.reset_index(inplace=True)
_subset = [col for col in trans_mean_log_ror if col != 'Date']
# _mean_log_ror_plot = trans_mean_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
_mean_log_ror_plot = trans_mean_log_ror.dropna(axis=0, how='all', subset=_subset, inplace=False)
_mean_log_ror_plot.to_csv('../data/mean_log_ror_plot.csv')


In [None]:
''' DEPRACATED moved to plots'''
mean_log_ror = clsROR.get_logarithmic_returns(_results_df, value_col_name='simp_move_avg_market_cap')
stdv_log_ror = clsROR.get_logarithmic_returns(_results_df, value_col_name='simp_move_std_market_cap')
print(stdv_log_ror[100:120])

          Date            ID Symbol    market_cap  simp_move_avg_market_cap  \
40  2022-02-10  bitcoin_cash    bch  0.000000e+00              3.561344e+10   
41  2022-02-11  bitcoin_cash    bch  0.000000e+00              3.589823e+10   
42  2022-02-12  bitcoin_cash    bch  0.000000e+00              3.525097e+10   
43  2022-02-13  bitcoin_cash    bch  0.000000e+00              3.443105e+10   
44  2022-02-14  bitcoin_cash    bch  0.000000e+00              3.345269e+10   
45  2022-02-15  bitcoin_cash    bch  0.000000e+00              3.252121e+10   
46  2022-02-16  bitcoin_cash    bch  0.000000e+00              3.215757e+10   
47  2022-02-17  bitcoin_cash    bch  0.000000e+00              3.164524e+10   
48  2022-02-18  bitcoin_cash    bch  0.000000e+00              3.108687e+10   
49  2022-02-19  bitcoin_cash    bch  0.000000e+00              3.083422e+10   
50  2022-02-20  bitcoin_cash    bch  0.000000e+00              3.067157e+10   
51  2022-02-21  bitcoin_cash    bch  0.000000e+00   

## Maximize weights

In [287]:
#_new_list = clsROR.maximize_weights(_l_actual_weighted_sum,
#                                    value_col_name = "market_cap")

[Error]Class <RateOfReturns> Function <maximize_weights> 'market_cap'


NameError: name 'traceback' is not defined

In [254]:
_actuals_df = pd.DataFrame()
for returns in _l_actual_weighted_sum:
    for exp_ret in returns['Expected Return']:
        _actuals_df = pd.concat([_actuals_df,pd.DataFrame([{'Date' : returns['Date'], 'return' : exp_ret}])])
_min_date = (_actuals_df["Date"].min()).date()
_max_date = (_actuals_df["Date"].max()).date()
_title = "Sum of actual weighted returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.scatter(_actuals_df, x="Date", y=_actuals_df.columns,
              hover_data={"Date": "|%B %d, %Y"},
              title=_title)
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()


distutils Version classes are deprecated. Use packaging.version instead.



## Sum of the T-day rolling mean Logarithmic weighted return
Sum of the weighted mean portfolio allocation at time ${t}$: ${E\left(F_{t}({Y})\right)}$  = $\sum_{k_i = 1}^{N<n} {w(t,x_{k_{i}})} \times {E\left(R(t,x_{k_i})\right)}$

In [190]:
_merged_mean_ror = pd.merge(_topNassets_df,
                            rec_sma_marketcap_df,
                            how='outer',on=['Date','ID'])
_merged_mean_ror.dropna(axis=0, how='any', inplace=True)
_merged_mean_ror = _merged_mean_ror.sort_values(by=['Date','ror'], ascending=True)
#log_mean_weigthed_return = clsROR.sum_weighted_returns(mean_log_returns, _weights, value_col_name='Value')
_l_mean_weighted_sum = clsROR.sum_weighted_returns(
    _merged_mean_ror,
    _weights,
    value_col_name='sma'
)
_mean_df = pd.DataFrame()
for returns in _l_mean_weighted_sum:
    for exp_ret in returns['Expected Return']:
        _mean_df = pd.concat([_mean_df,pd.DataFrame([{'Date' : returns['Date'], 'return' : exp_ret}])])

_min_date = (_mean_df["Date"].min()).date()
_max_date = (_mean_df["Date"].max()).date()
_title = "Sum of log weighted returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.scatter(_mean_df, x="Date", y=_mean_df.columns,
              hover_data={"Date": "|%B %d, %Y"},
              title=_title)
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()


distutils Version classes are deprecated. Use packaging.version instead.



In [106]:
_log_returns = clsROR.get_logarithmic_returns(_topNassets_df, value_col_name='market_cap')
trans_log_returns = clsETL.transfrom_data(_log_returns,value_col_name="log")
_mean_log_returns = clsETL.rolling_mean(trans_log_returns, period=7)
print(trans_log_returns)
print(_mean_log_returns)

         Date   bitcoin   cardano  ethereum    solana
0  2022-01-01       NaN       NaN       NaN       NaN
1  2022-01-02  0.031853       NaN  0.022514  0.040423
2  2022-01-03 -0.007786       NaN  0.015547 -0.008847
3  2022-01-04 -0.019356       NaN -0.018207 -0.031966
4  2022-01-05 -0.004654       NaN  0.013208 -0.011679
5  2022-01-06 -0.056964       NaN -0.069419 -0.081436
6  2022-01-07 -0.009956       NaN -0.039528 -0.027983
7  2022-01-08 -0.039560       NaN -0.066872 -0.094945
8  2022-01-09  0.006525       NaN -0.030758  0.047429
9  2022-01-10 -0.001570       NaN  0.014726 -0.018885
10 2022-01-11  0.000959       NaN -0.020367 -0.035378
11 2022-01-12  0.021491       NaN  0.052587  0.034906
12 2022-01-13  0.027818       NaN  0.038618  0.081268
13 2022-01-14 -0.031668       NaN -0.037241 -0.034455
14 2022-01-15  0.011855       NaN  0.016928 -0.001632
15 2022-01-16  0.000822       NaN  0.005646  0.010584
16 2022-01-17 -0.000581       NaN  0.008362  0.002446
17 2022-01-18 -0.019174     

In [173]:
_date = datetime.datetime(2022,1,7)
_merged_df = pd.merge(_mean_df, _actuals_df, how ='inner', on=['Date'])
_merged_df = _merged_df.rename(columns={_merged_df.columns[1]:'mean',_merged_df.columns[2]:'actual'})
_merged_df['variance'] = _merged_df['actual'].sub(_merged_df['mean'])
_merged_df = _merged_df.loc[_merged_df['Date']==_date]

_min_date = (_merged_df["Date"].min()).date()
_max_date = (_merged_df["Date"].max()).date()
_title = "Efficient Frontier for data from "+str(_min_date)+" to "+str(_max_date)
fig = px.scatter(_merged_df, x='variance', y="mean",
              hover_data={"Date": "|%B %d, %Y"},
              title=_title)
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()


distutils Version classes are deprecated. Use packaging.version instead.



In [164]:
from datetime import date, timedelta

pred_marketcap_df = piv_marketcap_df.copy()
num_days = (end_dt - start_dt).days + 1
sharp = (_07_day_roll_simp_return_mean[_07_day_roll_simp_return_mean['Date'] == end_dt][_l_coin_ids] - 0.02/365)/ \
    _07_day_roll_simp_return_stdv[_07_day_roll_simp_return_stdv['Date'] == end_dt][_l_coin_ids]
#for now_date in (start_dt + timedelta(n) for n in range(num_days)):
#    print(now_date)
print(sharp)
print(simple_returns[simple_returns['Date'] == end_dt][_l_coin_ids])

      bitcoin   cardano  ethereum  litecoin   ripple    solana
334 -0.023394 -0.385817  0.184135 -0.077693 -0.19041 -0.145532
      bitcoin   cardano  ethereum  litecoin    ripple    solana
334 -0.014434 -0.033933  0.042531  0.003433  0.005133  0.019014


## Simple returns of Top N assets

In [55]:
import plotly.express as px

#transp_topNassets = clsETL.transpose_pivot(_topNassets_df)
#simple_returns_df = cls_etp.get_simple_returns(piv_marketcap_df)
simple_returns_df = clsROR.get_simple_returns(piv_marketcap_df)
_min_date = (simple_returns_df["Date"].min()).date()
_max_date = (simple_returns_df["Date"].max()).date()
_title = "Expected Returns from "+str(_min_date)+" to "+str(_max_date)
fig = px.line(simple_returns_df, x="Date", y=simple_returns_df.columns,
              hover_data={"Date": "|%B %d, %Y"},
              title=_title)
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()


distutils Version classes are deprecated. Use packaging.version instead.



## Holding Period Return

In [58]:
hpr_df = clsROR.get_holding_period_returns(piv_marketcap_df, value_col_name = "market_cap")
print("holding preriod return from \n%s to %s" % (str(start_dt.date()), str(end_dt.date())))
print("\n",hpr_df)

holding preriod return from 
2022-01-01 to 2022-02-01

 bitcoin        -0.169241
bitcoin_cash         NaN
cardano        -0.200925
ethereum       -0.271575
litecoin       -0.251447
ripple         -0.252219
solana         -0.407364
dtype: float64


# DEPRECATED

## Rolling mean and standard deviation

In [88]:
#rolling_marketcap_mean = clsETL.rolling_mean(piv_marketcap_df, period=7)
#transp_rolling_marketcap_mean = clsETL.transpose_pivot(rolling_marketcap_mean)
rolling_marketcap_stdv = clsETL.rolling_stdv(piv_marketcap_df, period=7)
transp_rolling_marketcap_stdv = clsETL.transpose_pivot(rolling_marketcap_stdv)

In [282]:
''' DEPRECATED '''
import numpy as np
topN = 5
_neg_log_df = actual_log_ror.copy()
_neg_log_df.dropna(axis=0, how='any', inplace=True)
_neg_log_df = _neg_log_df.sort_values(by=['Date','ror'])
_neg_log_df['ror'] = _neg_log_df['ror']*(-1)
_topNassets_df = clsETL.get_fixed_topN_assets(_neg_log_df, N=topN, val_col_name='ror')
_sifnif_topNassets_df = clsETL.get_significant_topN_assets(_neg_log_df, val_col_name='ror')
_topNassets_df['ror'] = _topNassets_df['ror']*(-1)
_topNassets_df=_topNassets_df.reindex()
print("Completed getting %d list with top assets" % (_topNassets_df.shape[0]))

Completed getting 296 list with top assets
