<a href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL/blob/master/FinRL_Explainable_DRL_for_Portfolio_Management_An_Empirical_Approach.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Explainable Deep Reinforcement Learning for Portfolio Managemnet: an Emprical Approach.

Tutorials to use FinRL Library to perform explainable portfolio allocation in one [Jupyter Notebook](https://colab.research.google.com/drive/117v2qWo-qPC7OPd7paY1wYkOUywU_DWZ?usp=sharing)

* This tutorial is based on the [portfolio allocation tutorial](https://github.com/AI4Finance-Foundation/FinRL/blob/master/FinRL_portfolio_allocation_NeurIPS_2020.ipynb) in FinRL Library.
* This blog is based on our paper: Explainable Deep Reinforcement Learning for Portfolio Managemnet: an Emprical Approach
* Please report any issues to our Github: https://github.com/AI4Finance-LLC/FinRL-Library/issues
* **Pytorch Version** 



# Content

* [1. Problem Definition](#0)
* [2. Getting Started - Load Python packages](#1)
    * [2.1. Install Packages](#1.1)    
    * [2.2. Check Additional Packages](#1.2)
    * [2.3. Import Packages](#1.3)
    * [2.4. Create Folders](#1.4)
* [3. Download Data](#2)
* [4. Preprocess Data](#3)        
    * [4.1. Technical Indicators](#3.1)
    * [4.2. Perform Feature Engineering](#3.2)
* [5.Build Environment](#4)  
    * [5.1. Training & Trade Data Split](#4.1)
    * [5.2. User-defined Environment](#4.2)   
    * [5.3. Initialize Environment](#4.3)    
* [6.Implement DRL Algorithms](#5)  
* [7.Backtesting Performance](#6)  
    * [7.1. BackTestStats](#6.1)
    * [7.2. BackTestPlot](#6.2)   
    * [7.3. Baseline Stats](#6.3)   
    * [7.3. Compare to Stock Market Index](#6.4)             

<a id='0'></a>
# Part 1. Problem Definition

This problem is to empirically explain the trading performance of DRL agents for the portfolio management task.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:


* Action: The action space describes the allowed portfolio weights that the agent interacts with the
environment. Each element in the portfolio weights is between [0, 1].

* Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The logorithmic rate of portfolio return when action a is taken at state s and arriving at new state s',  i.e., r(s, a, s′) = ln(v'/v), where v′ and v represent the portfolio
values at state s′ and s, respectively

* State: The state space describes  an agent’s perception of a market.  Just as a human trader needs to analyze various information before executing a trade, so
our trading agent observes many different features to better learn in an interactive environment.

* Environment: Dow 30 consituents

We use Yahoo Finance API as the data source.


<a id='1'></a>
# Part 2. Getting Started- Load Python Packages

<a id='1.1'></a>
## 2.1. Install all the packages through FinRL library


In [1]:
# ## install finrl library
# !pip install plotly==4.4.1
# !wget https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage -O /usr/local/bin/orca
# !chmod +x /usr/local/bin/orca
# !apt-get install xvfb libgtk2.0-0 libgconf-2-4
# !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git
# !pip install PyPortfolioOpt


<a id='1.2'></a>
## 2.2. Check if the additional packages needed are present, if not install them. 
* Yahoo Finance API
* pandas
* numpy
* matplotlib
* stockstats
* OpenAI gym
* stable-baselines
* tensorflow
* pyfolio

<a id='1.3'></a>
## 2.3. Import Packages

In [2]:
import datetime
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
%matplotlib inline

from finrl.apps import config
from finrl.finrl_meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.finrl_meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.finrl_meta.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.drl_agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline, get_baseline_tdx,convert_daily_return_to_pyfolio_ts


# import sys
# sys.path.append("../FinRL-Library")

  'Module "zipline.assets" not found; multipliers will not be applied'


In [3]:
from lutils.stock import LTdxHq

<a id='1.4'></a>
## 2.4. Create Folders

In [4]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

<a id='2'></a>
# Part 3. Download Data
Yahoo Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free.
* FinRL uses a class **YahooDownloader** to fetch data from Yahoo Finance API
* Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).


In [5]:
# df = YahooDownloader(start_date = '2008-01-01',
#                      end_date = '2021-09-02',
#                      ticker_list = config.DOW_30_TICKER).fetch_data()

In [6]:
# stock_codes_all = ['600603', '600583', '600582', '600565', '600558', 
#                    '600551', '600509', '600503', '600488', '600469', 
#                    '600439', '600433', '600428', '600425', '600387', 
#                    '600383', '600382', '600369', '600339', '600337',
#                    '600335', '600326', '600320', '600312', '002666', 
#                    '002654', '002641', '002628', '002627', '002620', 
# #                    '002608', '002593', '002586', '002582', '002566', '002560',
# #                    '002314', '002307', '002303', '002285', '002267', '002251',
# #                    '002244', '002225', '002224', '002195', '002181', '002160',
# #                    '002146', '002144', '002101', '002094', '002087', '002083',
# #                    '002081', '002077', '002062', '002047', '002042', '002040',
# #                    '002033', '002031', '002029', '002004', '000961', #, '001965'
# #                    '000950', '000937', '000936', '000919', '000917', '000899',
# #                    '000888', '000886', '000861', '000813', '000797', '000767',
# #                    '000750', '000726', '000722', '000719', '000715', '000709',
# #                    '000705', '000690', '000686', '000685', '000680', '000671',
# #                    '000666', '000652', '000632', '000627', '000616', '000608',
# #                    '000589', '000563', '000560', '000557', '000553', '000552',
# #                    '000541', '000531', '000528', '000525', '000521', '000517',
# #                    '000514', '000507', '000501', '000419', '000404', '000402',
# #                    '000156', '000078', '000069', '000059', '000056', '000042',
# #                    '000035', '000031', '000006', '601113', '600387', '600382',
                   
#                    '600266', '600235', '600208', '600191', '600190', '600187',
#                    '600185', '600168', '600166', '600162', '600159', '600157',
#                    '600155', '600133', '600128', '600126', '600125', '600120',
#                    '600106', '600101', '600094', '600076', '600052', '600051',
                   
#                    '600509', '600503', '600488', '600469', '600439', '600433',
#                    '600428', '600425', '600387', '600383', '600382', '600369',
#                    '600339', '600337', '600335', '600326', '600320', '600312',
                   
#                    '300559', '300554', '300550', '300535', '300510', '300509',
#                    '300486', '300483', '300462', '300446', '300434', '300425',
#                    '300423', '300419', '300417', '300412', '300411', '300407',
                   
#                    '002958', '002873', '002818', '002798', '002788', '002763',
#                    '002758', '002757', '002753', '002734', '002697', '002637',
#                    '002636', '002592', '002558', '002532', '002517', '002496',
                   
#                    '002043', '002035', '002014', '002002', '000987', '000959',
#                    '000951', '000932', '000923', '000921', '000898', '000885',
#                    '000883', '000877', '000833', '000830', '000828', '000825',
                   
#                    '300009', '300008', '300005', '003032', '003006', '003000',
#                    '002961', '002945', '002937', '002935', '002931', '002923',
#                    '002908', '002907', '002892', '002886', '002878', '002858',
                   
#                   ]

In [7]:
# ltdxhq = LTdxHq()

# indexs = None
# dfs = []
# for code in stock_codes_all:
#     df = ltdxhq.get_k_data_daily(code, start='2020-01-01') # 2012-01-01
    
#     if indexs is None:
#         indexs = df.index
#     else:
#         indexs = indexs.union(df.index)
    
# #     df = df.assign(date = df.index)
# #     df = df.assign(day = df.index.weekday)
# #     df.date = df.date.dt.strftime('%Y-%m-%d')
#     df = df.assign(tic = code)
# #     df.index = range(df.shape[0])
    
#     dfs.append(df)
#     print('----------- over %s min: %s max: %s -----------' % (code, df.index.min(), df.index.max()))

# for i, df in enumerate(dfs):
#     df = df.reindex(indexs)
#     df = df.assign(date = df.index)
#     df = df.assign(day = df.index.weekday)
#     df.index = range(df.shape[0])
    
#     dfs[i] = df.ffill()
    
# ddf = pd.concat(dfs)
# # df.index = range(df.shape[0])

# ltdxhq.close()

In [8]:
ddf = pd.read_pickle('d:/ddf.pkl').dropna()

In [9]:
# gdf = ddf[-252:].groupby('tic')

In [10]:
# gdf.filter(lambda x: x['close'].mean() > 30).tic.unique()

In [11]:
ddf = ddf.sort_values(['date','tic'], ignore_index=True)
ddf.index = ddf.date.factorize()[0]

In [12]:
dl = ddf.loc[-60:,:]
dd = dl.pivot_table(index = 'date',columns = 'tic', values = 'close').pct_change().dropna()
corr = dd.cov().corr()
# corr.style.background_gradient(cmap='coolwarm')

# c1 = corr.abs().unstack().sort_values(ascending = True)
# c1 = corr.unstack().sort_values(ascending = True)

In [13]:
# corr.style.background_gradient(cmap='coolwarm')

In [14]:
# corr.sum().sort_values()

In [15]:
# s = corr.unstack()
# so = s.sort_values(kind="quicksort")

In [16]:
# set(so.head(20).index.get_level_values(1).values)

In [17]:
# c1.head(5)

In [18]:
# stock_codes = []
# for c in c1.head(200).index.get_level_values(0).values:
#     if c not in stock_codes:
#         stock_codes.append(c)
#     if len(stock_codes) >= 40:
#         break

In [19]:
# stock_codes = list(corr[corr.max() < 0.5].sum().sort_values().index.values[:40])

In [20]:
stock_codes = list(corr.sum().sort_values().index.values[:6]) # ascending = False

In [21]:
stock_codes

['000921', '600365', '000609', '601868', '002372', '001965']

In [22]:
# stock_codes = ['000921', '002032', '300406', '603789']
# stock_codes = ['300452', '600908', '601166', '601186', '601288', '601398', '601618', '601988', '603323', '603538', '605218']

In [23]:
# print(config.DOW_30_TICKER)
# stock_codes = ['603636']
# stock_codes = ['600603', '600583', '600582', '600565', '600558', 
#                '600551', '600509', '600503', '600488', '600469', 
#                '600439', '600433', '600428', '600425', '600387', 
#                '600383', '600382', '600369', '600339', '600337',
#                '600335', '600326', '600320', '600312', '002666', 
#                '002654', '002641', '002628', '002627', '002620', 
#                '002608', '002593', '002586', '002582', '002566', '002560',]
# stock_codes = list(c1.head(200).index.get_level_values(0).values)[:20]

In [24]:
ltdxhq = LTdxHq()

indexs = None
dfs = []
for code in stock_codes:
    df = ltdxhq.get_k_data_daily(code, start='2018-01-01') # 2014-01-01
    
    if indexs is None:
        indexs = df.index
    else:
        indexs = indexs.union(df.index)
    
#     df = df.assign(date = df.index)
#     df = df.assign(day = df.index.weekday)
#     df.date = df.date.dt.strftime('%Y-%m-%d')
    df = df.assign(tic = code)
#     df.index = range(df.shape[0])
    
    dfs.append(df)
    print('----------- over %s min: %s max: %s -----------' % (code, df.index.min(), df.index.max()))

for i, df in enumerate(dfs):
    df = df.reindex(indexs)
    df = df.assign(date = df.index)
    df = df.assign(day = df.index.weekday)
    df.index = range(df.shape[0])
    
    dfs[i] = df.ffill()
    
df = pd.concat(dfs)
# df.index = range(df.shape[0])

ltdxhq.close()

----------- over 000921 min: 2018-01-02 00:00:00 max: 2022-01-20 00:00:00 -----------
----------- over 600365 min: 2018-03-20 00:00:00 max: 2022-01-20 00:00:00 -----------
----------- over 000609 min: 2018-01-02 00:00:00 max: 2022-01-20 00:00:00 -----------
----------- over 601868 min: 2021-09-28 00:00:00 max: 2022-01-20 00:00:00 -----------
----------- over 002372 min: 2018-01-02 00:00:00 max: 2022-01-20 00:00:00 -----------
----------- over 001965 min: 2018-01-02 00:00:00 max: 2022-01-20 00:00:00 -----------


# Part 4: Preprocess Data
Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.
* Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.
* Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation.

In [25]:
fe = FeatureEngineer(use_technical_indicator=True,
                     use_turbulence=False,
                     user_defined_feature = False)

df = fe.preprocess_data(df)

Successfully added technical indicators


## Add covariance matrix as states

In [26]:
# add covariance matrix as states
df=df.sort_values(['date','tic'], ignore_index=True)
df.index = df.date.factorize()[0]

cov_list = []
return_list = []

# look back is one year
lookback=252
for i in range(lookback, len(df.index.unique())):
    data_lookback = df.loc[i-lookback:i,:]
    price_lookback = data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
    return_lookback = price_lookback.pct_change().dropna()
    return_list.append(return_lookback)

    covs = return_lookback.cov().values
    cov_list.append(covs)


df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list': cov_list,'return_list': return_list})
df = df.merge(df_cov, on='date')
df = df.sort_values(['date','tic']).reset_index(drop=True)

In [27]:
df['daily_variance'] = (df.high-df.low) / df.close

In [28]:
dl = df.loc[-30:,:]
dd = dl.pivot_table(index = 'date',columns = 'tic', values = 'close').pct_change().dropna()
dd.cov().corr().style.background_gradient(cmap='coolwarm')

tic,000609,000921,001965,002372
tic,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
609,1.0,-0.228613,-0.230215,-0.293139
921,-0.228613,1.0,-0.39942,0.104902
1965,-0.230215,-0.39942,1.0,-0.767548
2372,-0.293139,0.104902,-0.767548,1.0


<a id='4'></a>
# Part 5. Design Environment
Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.

Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.

The action space describes the allowed portfolio weights that the agent interacts with the environment. Each element in the portfolio weights vector is non-negative and no more than 100%. Also, the sum of elements in each portfolio weight should equal to 100%.

## Training data split: 2009-01-01 to 2020-06-30

In [29]:
train = data_split(df, '2019-01-01','2022-01-01') # 2021-07-01 2022-01-01

In [30]:
train.head()

Unnamed: 0,open,close,high,low,volume,amount,tic,date,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list,daily_variance
0,5.27,5.29,5.32,5.25,5759311.0,30399482.0,609,2019-01-15,1,-0.08255,5.946131,4.840869,48.0639,-59.971539,0.911034,5.441,5.467333,"[[0.0009595613956845235, 0.0003124109814202052...",tic 000609 000921 001965 00...,0.013233
0,7.34,7.55,7.55,7.22,20004778.0,161089488.0,921,2019-01-15,1,0.171956,7.579382,5.955618,57.9551,171.516755,60.304291,6.799667,6.725167,"[[0.0009595613956845235, 0.0003124109814202052...",tic 000609 000921 001965 00...,0.043709
0,7.39,7.42,7.43,7.34,1422853.0,11541926.0,1965,2019-01-15,1,-0.0086,7.496838,7.139162,49.760481,27.822945,12.144795,7.363,7.4935,"[[0.0009595613956845235, 0.0003124109814202052...",tic 000609 000921 001965 00...,0.012129
0,11.7,12.04,12.08,11.68,4112392.0,64759288.0,2372,2019-01-15,1,0.160269,12.085685,10.869315,56.642725,165.038307,13.980941,11.421,11.001833,"[[0.0009595613956845235, 0.0003124109814202052...",tic 000609 000921 001965 00...,0.033223
1,5.28,5.3,5.59,5.16,13215401.0,70665424.0,609,2019-01-16,2,-0.076507,5.937906,4.832094,48.18694,-33.431536,13.031968,5.425,5.483167,"[[0.000959630829728534, 0.00031221297698058804...",tic 000609 000921 001965 00...,0.081132



## Environment for Portfolio Allocation


In [31]:
import numpy as np
import pandas as pd
from gym.utils import seeding
import gym
from gym import spaces
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from stable_baselines3.common.vec_env import DummyVecEnv


class StockPortfolioEnv(gym.Env):
    """A portfolio allocation environment for OpenAI gym

    Attributes
    ----------
        df: DataFrame
            input data
        stock_dim : int
            number of unique stocks
        hmax : int
            maximum number of shares to trade
        initial_amount : int
            start money
        transaction_cost_pct: float
            transaction cost percentage per trade
        reward_scaling: float
            scaling factor for reward, good for training
        state_space: int
            the dimension of input features
        action_space: int
            equals stock dimension
        tech_indicator_list: list
            a list of technical indicator names
        turbulence_threshold: int
            a threshold to control risk aversion
        day: int
            an increment number to control date

    Methods
    -------
    _sell_stock()
        perform sell action based on the sign of the action
    _buy_stock()
        perform buy action based on the sign of the action
    step()
        at each step the agent will return actions, then 
        we will calculate the reward, and return the next observation.
    reset()
        reset the environment
    render()
        use render to return other functions
    save_asset_memory()
        return account value at each time step
    save_action_memory()
        return actions/positions at each time step
        

    """
    metadata = {'render.modes': ['human']}

    def __init__(self, 
                 df,
                 stock_dim,
                 hmax,
                 initial_amount,
                 transaction_cost_pct,
                 reward_scaling,
                 state_space,
                 action_space,
                 tech_indicator_list,
                 turbulence_threshold=None,
                 lookback=252,
                 day=0):
        #super(StockEnv, self).__init__()
        #money = 10 , scope = 1
        self.day = day
        self.lookback=lookback
        self.df = df
        self.stock_dim = stock_dim
        self.hmax = hmax
        self.initial_amount = initial_amount
        self.transaction_cost_pct =transaction_cost_pct
        self.reward_scaling = reward_scaling
        self.state_space = state_space
        self.action_space = action_space
        self.tech_indicator_list = tech_indicator_list

        # action_space normalization and shape is self.stock_dim
        self.action_space = spaces.Box(low=0, high=1,shape=(self.action_space,)) 
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape = (self.state_space+len(self.tech_indicator_list),self.state_space))

        # load data from a pandas dataframe
        self.data = self.df.loc[self.day,:]
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.terminal = False
        self.turbulence_threshold = turbulence_threshold
        # initalize state: inital portfolio return + individual stock return + individual weights
        self.portfolio_value = self.initial_amount

        # memorize portfolio value each step
        self.asset_memory = [self.initial_amount]
        # memorize portfolio return each step
        self.portfolio_return_memory = [0]
        self.actions_memory=[[1 / self.stock_dim]*self.stock_dim]
        self.date_memory=[self.data.date.unique()[0]]

        
    def step(self, actions):
        self.terminal = self.day >= len(self.df.index.unique()) - 1

        if self.terminal:
            df = pd.DataFrame(self.portfolio_return_memory)
            df.columns = ['daily_return']
#             plt.plot(df.daily_return.cumsum(),'r')
#             plt.savefig('results/cumulative_reward.png')
#             plt.close()
            
#             plt.plot(self.portfolio_return_memory,'r')
#             plt.savefig('results/rewards.png')
#             plt.close()

            print("=================================")
            print("begin_total_asset:{}".format(self.asset_memory[0]))
            print("end_total_asset:{}".format(self.portfolio_value))

            df_daily_return = pd.DataFrame(self.portfolio_return_memory)
            df_daily_return.columns = ['daily_return']
            if df_daily_return['daily_return'].std() != 0:
                sharpe = (252 ** 0.5) * df_daily_return['daily_return'].mean() / df_daily_return['daily_return'].std()
                print("Sharpe: ", sharpe)
            print("=================================")
            
            return self.state, self.reward, self.terminal,{}

        else:
            weights = self.softmax_normalization(actions)
            self.actions_memory.append(weights)
            last_day_memory = self.data

            #load next state
            self.day += 1
            self.data = self.df.loc[self.day,:]
            self.covs = self.data['cov_list'].values[0]
            self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list], axis=0)
            portfolio_return = sum(((self.data.close.values / last_day_memory.close.values) - 1) * weights)
#             log_portfolio_return = np.log(sum((self.data.close.values / last_day_memory.close.values) * weights))
            # update portfolio value
            new_portfolio_value = self.portfolio_value * (1 + portfolio_return)
#             new_portfolio_value = self.portfolio_value * (1 + log_portfolio_return)
            self.portfolio_value = new_portfolio_value

            # save into memory
            self.portfolio_return_memory.append(portfolio_return)
            self.date_memory.append(self.data.date.unique()[0])
            self.asset_memory.append(new_portfolio_value)

            # the reward is the new portfolio value or end portfolo value
            self.reward = new_portfolio_value

        return self.state, self.reward, self.terminal, {}

    def reset(self):
        self.asset_memory = [self.initial_amount]
        self.day = 0
        self.data = self.df.loc[self.day,:]
        # load states
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list], axis=0)
        self.portfolio_value = self.initial_amount
        #self.cost = 0
        #self.trades = 0
        self.terminal = False
        self.portfolio_return_memory = [0]
        self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
        self.date_memory=[self.data.date.unique()[0]]
        return self.state
    
    def render(self, mode='human'):
        return self.state
        
    def softmax_normalization(self, actions):
        numerator = np.exp(actions)
        denominator = np.sum(np.exp(actions))
        softmax_output = numerator / denominator
        return softmax_output

    
    def save_asset_memory(self):
        date_list = self.date_memory
        portfolio_return = self.portfolio_return_memory
        #print(len(date_list))
        #print(len(asset_list))
        df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
        return df_account_value

    def save_action_memory(self):
        # date and close price length must match actions length
        date_list = self.date_memory
        df_date = pd.DataFrame(date_list)
        df_date.columns = ['date']
        
        action_list = self.actions_memory
        df_actions = pd.DataFrame(action_list)
        df_actions.columns = self.data.tic.values
        df_actions.index = df_date.date
        #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
        return df_actions

    def _seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def get_sb_env(self):
        e = DummyVecEnv([lambda: self])
        obs = e.reset()
        return e, obs

In [32]:
stock_dimension = len(train.tic.unique())
state_space = stock_dimension
print(f'Stock Dimension: {stock_dimension}, State Space: {state_space}')

Stock Dimension: 4, State Space: 4


In [33]:
config.TECHNICAL_INDICATORS_LIST

['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma']

In [34]:
# ['daily_variance', 'change', 'log_volume', 'close','day', 'macd', 'rsi_30', 'boll_ub', 'dx_30']
# ['macd', 'boll_ub', 'boll_lb', 'rsi_30', 'boll_ub', 'dx_30', 'close_30_sma', 'close_60_sma'] # cci_30
tech_indicator_list = ['macd', 'boll_ub', 'boll_lb', 'rsi_30', 'daily_variance', 'dx_30', 'close_30_sma', 'close_60_sma']
env_kwargs = {
    "hmax": 100, 
    "initial_amount": 100000, 
    "transaction_cost_pct": 0, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": tech_indicator_list, # config.TECHNICAL_INDICATORS_LIST, 
    "action_space": stock_dimension, 
    "reward_scaling": 1e-1
    
}

e_train_gym = StockPortfolioEnv(df = train, **env_kwargs)

In [35]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>


<a id='5'></a>
# Part 6: Implement DRL Algorithms
* The implementation of the DRL algorithms are based on **OpenAI Baselines** and **Stable Baselines**. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups.
* We use two DRL algorithms in FinRL library PPO andf A2C

### Model 1: **A2C**


In [36]:
agent = DRLAgent(env = env_train)

A2C_PARAMS = {"n_steps": 10, "ent_coef": 0.005, "learning_rate": 0.0001}
model_a2c = agent.get_model(model_name="a2c", model_kwargs = A2C_PARAMS)

{'n_steps': 10, 'ent_coef': 0.005, 'learning_rate': 0.0001}
Using cuda device


In [37]:
trained_a2c = agent.train_model(model=model_a2c, tb_log_name='a2c', total_timesteps=1000)

begin_total_asset:100000
end_total_asset:145543.4677899015
Sharpe:  0.6127363110147936
-------------------------------------
| time/                 |           |
|    fps                | 279       |
|    iterations         | 100       |
|    time_elapsed       | 3         |
|    total_timesteps    | 1000      |
| train/                |           |
|    entropy_loss       | -5.67     |
|    explained_variance | -3.58e-07 |
|    learning_rate      | 0.0001    |
|    n_updates          | 99        |
|    policy_loss        | 3.01e+06  |
|    reward             | 101608.2  |
|    std                | 0.998     |
|    value_loss         | 3.9e+11   |
-------------------------------------


### Model 2: **PPO**


In [38]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.0001,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo", model_kwargs = PPO_PARAMS)

{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.0001, 'batch_size': 128}
Using cuda device


In [39]:
trained_ppo = agent.train_model(model=model_ppo, tb_log_name='ppo', total_timesteps=1000)

begin_total_asset:100000
end_total_asset:169101.45476955493
Sharpe:  0.8092037346819474
begin_total_asset:100000
end_total_asset:173364.77201910227
Sharpe:  0.8373545804698366
---------------------------------
| time/              |          |
|    fps             | 609      |
|    iterations      | 1        |
|    time_elapsed    | 3        |
|    total_timesteps | 2048     |
| train/             |          |
|    reward          | 130566.4 |
---------------------------------


## Back-Testing
Assume that we have $1,000,000 initial capital at 2020-01-01. We use the PPO, A2C, SVM, Linear Regression, Decision Tree, Random Foreset models to trade Dow jones 30 constituent stocks.


In [103]:
trade = data_split(df, '2021-01-01', '2022-01-22') # '2021-12-20', '2023-01-01'
e_trade_gym = StockPortfolioEnv(df = trade, **env_kwargs)

In [104]:
import torch
%matplotlib inline
import plotly.express as px

In [105]:
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models
import pandas as pd
from pypfopt import EfficientFrontier
from pypfopt import expected_returns
from pypfopt import objective_functions
unique_tic = trade.tic.unique()
unique_trade_date = trade.date.unique()

In [106]:
import pyfolio
%matplotlib inline
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline, get_baseline_tdx, convert_daily_return_to_pyfolio_ts

baseline_df = get_baseline_tdx(ticker='600469', start='2022-01-01', end='2023-01-01') # "600558"

baseline_df['date'] = baseline_df.index # pd.to_datetime(baseline_df["date"], format="%Y-%m-%d")
baseline_df.index = range(baseline_df.shape[0])

baseline_df_stats = backtest_stats(baseline_df, value_col_name='close')
baseline_returns = get_daily_return(baseline_df, value_col_name='close')

dji_cumpod =(baseline_returns + 1).cumprod() - 1

Annual return         -0.720442
Cumulative returns    -0.068359
Annual volatility      0.397178
Sharpe ratio          -3.260455
Calmar ratio          -8.782534
Stability              0.438016
Max drawdown          -0.082031
Omega ratio            0.579891
Sortino ratio         -3.781332
Skew                        NaN
Kurtosis                    NaN
Tail ratio             0.566184
Daily value at risk   -0.055178
dtype: float64


In [107]:
from pyfolio import timeseries

df_daily_return_a2c, df_actions_a2c = DRLAgent.DRL_prediction(model=trained_a2c, environment=e_trade_gym)
df_daily_return_ppo, df_actions_ppo = DRLAgent.DRL_prediction(model=trained_ppo, environment=e_trade_gym)
time_ind = pd.Series(df_daily_return_a2c.date)
a2c_cumpod =(df_daily_return_a2c.daily_return + 1).cumprod() - 1
ppo_cumpod =(df_daily_return_ppo.daily_return + 1).cumprod() - 1
DRL_strat_a2c = convert_daily_return_to_pyfolio_ts(df_daily_return_a2c)
DRL_strat_ppo = convert_daily_return_to_pyfolio_ts(df_daily_return_ppo)

perf_func = timeseries.perf_stats 
perf_stats_all_a2c = perf_func(returns=DRL_strat_a2c, 
                               factor_returns=DRL_strat_a2c, 
                               positions=None, transactions=None, turnover_denom="AGB")
perf_stats_all_ppo = perf_func(returns=DRL_strat_ppo, 
                               factor_returns=DRL_strat_ppo, 
                               positions=None, transactions=None, turnover_denom="AGB")

begin_total_asset:100000
end_total_asset:120474.36783310909
Sharpe:  0.8596234101821464
hit end!
begin_total_asset:100000
end_total_asset:128564.78886694447
Sharpe:  1.0898618276604934
hit end!


In [108]:
ppo_cumpod

0      0.000000
1     -0.014925
2     -0.002292
3     -0.027836
4     -0.011583
         ...   
251    0.215407
252    0.210270
253    0.244560
254    0.283721
255    0.285648
Name: daily_return, Length: 256, dtype: float64

In [109]:
def extract_weights(drl_actions_list):
    a2c_weight_df = {'date':[], 'weights':[]}
    for i in range(len(drl_actions_list)):
        date = drl_actions_list.index[i]
        tic_list = list(drl_actions_list.columns)
        weights_list = drl_actions_list.reset_index()[list(drl_actions_list.columns)].iloc[i].values
        weight_dict = {'tic':[], 'weight':[]}
        for j in range(len(tic_list)):
            weight_dict['tic'] += [tic_list[j]]
            weight_dict['weight'] += [weights_list[j]]

        a2c_weight_df['date'] += [date]
        a2c_weight_df['weights'] += [pd.DataFrame(weight_dict)]

    a2c_weights = pd.DataFrame(a2c_weight_df)
    return a2c_weights

a2c_weights = extract_weights(df_actions_a2c)
ppo_weights = extract_weights(df_actions_ppo)

## Machine Learning Models

We trained the machine learning models with technical indicators: MACD, RSI, CCI, DX

In [110]:
train.head() # [train['date'] == '2020-03-01']

Unnamed: 0,open,close,high,low,volume,amount,tic,date,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list,daily_variance
0,5.27,5.29,5.32,5.25,5759311.0,30399482.0,609,2019-01-15,1,-0.08255,5.946131,4.840869,48.0639,-59.971539,0.911034,5.441,5.467333,"[[0.0009595613956845235, 0.0003124109814202052...",tic 000609 000921 001965 00...,0.013233
0,7.34,7.55,7.55,7.22,20004778.0,161089488.0,921,2019-01-15,1,0.171956,7.579382,5.955618,57.9551,171.516755,60.304291,6.799667,6.725167,"[[0.0009595613956845235, 0.0003124109814202052...",tic 000609 000921 001965 00...,0.043709
0,7.39,7.42,7.43,7.34,1422853.0,11541926.0,1965,2019-01-15,1,-0.0086,7.496838,7.139162,49.760481,27.822945,12.144795,7.363,7.4935,"[[0.0009595613956845235, 0.0003124109814202052...",tic 000609 000921 001965 00...,0.012129
0,11.7,12.04,12.08,11.68,4112392.0,64759288.0,2372,2019-01-15,1,0.160269,12.085685,10.869315,56.642725,165.038307,13.980941,11.421,11.001833,"[[0.0009595613956845235, 0.0003124109814202052...",tic 000609 000921 001965 00...,0.033223
1,5.28,5.3,5.59,5.16,13215401.0,70665424.0,609,2019-01-16,2,-0.076507,5.937906,4.832094,48.18694,-33.431536,13.031968,5.425,5.483167,"[[0.000959630829728534, 0.00031221297698058804...",tic 000609 000921 001965 00...,0.081132


In [111]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor

def prepare_data(trainData):
    train_date = sorted(set(trainData.date.values))
    X = []
    for i in range(0, len(train_date) - 1):
        d = train_date[i]
        d_next = train_date[i+1]
        y = train.loc[train['date'] == d_next].return_list.iloc[0].loc[d_next].reset_index()
        y.columns = ['tic', 'return']
        x = train.loc[train['date'] == d][np.concatenate((tech_indicator_list, ['tic'])).tolist()] # ['tic', 'macd', 'rsi_30', 'boll_lb', 'dx_30']] # cci_30
        train_piece = pd.merge(x, y, on = 'tic')
        train_piece['date'] = [d] * len(train_piece)
        X += [train_piece]
    trainDataML = pd.concat(X)
    X = trainDataML[tech_indicator_list].values # cci_30
    Y = trainDataML[['return']].values

    return X, Y

train_X, train_Y = prepare_data(train)
rf_model = RandomForestRegressor(max_depth = 50, min_samples_split = 15, random_state = 0).fit(train_X, train_Y.reshape(-1))
dt_model = DecisionTreeRegressor(random_state = 0, max_depth = 50, min_samples_split = 15).fit(train_X, train_Y.reshape(-1))
svm_model = SVR(epsilon=0.14).fit(train_X, train_Y.reshape(-1))
lr_model = LinearRegression().fit(train_X, train_Y)

In [112]:
def output_predict(model, reference_model = False):
    meta_coefficient = {"date":[], "weights":[]}

    portfolio = pd.DataFrame(index = range(1), columns = unique_trade_date)
    initial_capital = 100000
    portfolio.loc[0,unique_trade_date[0]] = initial_capital

    for i in range(len(unique_trade_date) - 1):
      
        current_date = unique_trade_date[i]
        next_date = unique_trade_date[i+1]
        df_current = df[df.date == current_date].reset_index(drop=True)
        tics = df_current['tic'].values
        features = df_current[tech_indicator_list].values # cci_30
        df_next = df[df.date == next_date].reset_index(drop=True)
        if not reference_model:
            predicted_y = model.predict(features)
            mu = predicted_y
            Sigma = risk_models.sample_cov(df_current.return_list[0], returns_data=True)
        else:
            mu = df_next.return_list[0].loc[next_date].values
            Sigma = risk_models.sample_cov(df_next.return_list[0], returns_data=True)
        predicted_y_df = pd.DataFrame({"tic":tics.reshape(-1,), "predicted_y":mu.reshape(-1,)})
        min_weight, max_weight = 0, 1
        ef = EfficientFrontier(mu, Sigma)
        weights = ef.nonconvex_objective(
            objective_functions.sharpe_ratio,
            objective_args=(ef.expected_returns, ef.cov_matrix),
            weights_sum_to_one=True,
            constraints=[
                {"type": "ineq", "fun": lambda w: w - min_weight},  # greater than min_weight
                {"type": "ineq", "fun": lambda w: max_weight - w},  # less than max_weight
            ],)
      
        weight_df = {"tic":[], "weight":[]}
        meta_coefficient["date"] += [current_date]
        # it = 0
        for item in weights:
            weight_df['tic'] += [item]
            weight_df['weight'] += [weights[item]]
      
        weight_df = pd.DataFrame(weight_df).merge(predicted_y_df, on = ['tic'])
        meta_coefficient["weights"] += [weight_df]
        cap = portfolio.iloc[0, i]
        #current cash invested for each stock
        current_cash = [element * cap for element in list(weights.values())]
        # current held shares
        current_shares = list(np.array(current_cash) / np.array(df_current.close))
        # next time period price
        next_price = np.array(df_next.close)
        portfolio.iloc[0, i+1] = np.dot(current_shares, next_price)
      
    portfolio=portfolio.T
    portfolio.columns = ['account_value']
    portfolio = portfolio.reset_index()
    portfolio.columns = ['date', 'account_value']
    stats = backtest_stats(portfolio, value_col_name = 'account_value')
    portfolio_cumprod =(portfolio.account_value.pct_change()+1).cumprod()-1

    return portfolio, stats, portfolio_cumprod, pd.DataFrame(meta_coefficient)

lr_portfolio, lr_stats, lr_cumprod, lr_weights = output_predict(lr_model)
dt_portfolio, dt_stats, dt_cumprod, dt_weights = output_predict(dt_model)
svm_portfolio, svm_stats, svm_cumprod, svm_weights = output_predict(svm_model)
rf_portfolio, rf_stats, rf_cumprod, rf_weights = output_predict(rf_model)
reference_portfolio, reference_stats, reference_cumprod, reference_weights = output_predict(None, True)

Annual return         -0.121388
Cumulative returns    -0.123191
Annual volatility      0.417570
Sharpe ratio          -0.103524
Calmar ratio          -0.260743
Stability              0.084810
Max drawdown          -0.465548
Omega ratio            0.982627
Sortino ratio         -0.149903
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.016154
Daily value at risk   -0.052780
dtype: float64
Annual return           57.853208
Cumulative returns      61.785843
Annual volatility        0.445839
Sharpe ratio             9.466666
Calmar ratio           540.699371
Stability                0.986023
Max drawdown            -0.106997
Omega ratio              5.333395
Sortino ratio           29.685309
Skew                          NaN
Kurtosis                      NaN
Tail ratio               2.484689
Daily value at risk     -0.039422
dtype: float64
Annual return         -0.481521
Cumulative returns    -0.486899
Annual volatility      0.436202
Sharpe ratio    

# Part 7: Explanation Method Implementation


### Integrated Gradient
Implement the explanation method using integrated gradients and regression coefficients.

In [113]:
# trade.loc[0].iloc[0].cov_list.shape

In [114]:
#         print(input.reshape(-1, stock_dimension*(stock_dimension + 4)).shape)
#         print(actions.reshape(-1, stock_dimension).shape)

In [115]:
meta_Q = {"date":[], "feature":[], "Saliency Map":[], "algo":[]}
for algo in {"A2C", "PPO"}:
    model = eval("trained_" + algo.lower())
    df_actions = eval("df_actions_" + algo.lower())
    for i in range(len(unique_trade_date)-1):
        date = unique_trade_date[i]
        covs = trade[trade['date'] == date].cov_list.iloc[0]
        features = trade[trade['date'] == date][tech_indicator_list] #[['macd','rsi_30', 'boll_lb', 'dx_30']].values # cci_30
        input = np.append(covs, features.T, axis = 0)
        actions = df_actions.loc[date].values
        orig_Q = model.policy.evaluate_actions(torch.cuda.FloatTensor(input).reshape(-1, stock_dimension * (stock_dimension + len(tech_indicator_list))), torch.cuda.FloatTensor(actions).reshape(-1, stock_dimension)) # 
        orig_Q = orig_Q[0].detach().cpu().numpy()[0]
        for idx in range(len(tech_indicator_list)): # ['macd','rsi_30', 'boll_lb', 'dx_30'])): # cci_30
            perturbed_feature = features
            perturbed_noise = np.random.normal(0, 1, stock_dimension)
            perturbed_feature.iloc[:,idx] = [0] * stock_dimension
            perturbed_input = np.append(covs, perturbed_feature.T, axis = 0)
            perturbed_Q = model.policy.evaluate_actions(torch.cuda.FloatTensor(perturbed_input).reshape(-1, stock_dimension * (stock_dimension + len(tech_indicator_list))), torch.cuda.FloatTensor(actions).reshape(-1, stock_dimension))
            perturbed_Q = perturbed_Q[0].detach().cpu().numpy()[0]
            meta_Q['date'] += [date]
            meta_Q['algo'] += [algo]
            meta_Q['feature'] += [tech_indicator_list[idx]] # cci_30 ['macd','rsi_30', 'boll_lb', 'dx_30']
            meta_Q['Saliency Map'] += [orig_Q[0] - perturbed_Q[0]]

meta_Q = pd.DataFrame(meta_Q)

### Regression Coefficient
Implement the linear regression to measure the feature weights.

In [116]:
import statsmodels.api as sm
meta_score_coef = {"date":[], "coef":[], "algo":[]}

for algo in ["LR", "RF", "Reference Model", "SVM", "DT", "A2C", "PPO"]:
    if algo == "LR":
        weights = lr_weights
    elif algo == "RF":
        weights = rf_weights
    elif algo == "DT":
        weights = dt_weights
    elif algo == "SVM":
        weights = svm_weights
    elif algo == "A2C":
        weights = a2c_weights
    elif algo == "PPO":
        weights = ppo_weights
    else:
        weights = reference_weights

    for i in range(len(unique_trade_date) - 1):
        date = unique_trade_date[i]
        next_date = unique_trade_date[i+1]
        df_temp = df[df.date==date].reset_index(drop=True)
        df_temp_next = df[df.date==next_date].reset_index(drop=True)
        weight_piece = weights[weights.date == date].iloc[0]['weights']
        piece_return = pd.DataFrame(df_temp_next.return_list.iloc[0].loc[next_date]).reset_index()
        piece_return.columns = ['tic', 'return']
        X = df_temp[np.concatenate((tech_indicator_list, ['tic'])).tolist()] # ['macd','rsi_30', 'boll_lb', 'dx_30', 'tic']] # cci_30
        X_next = df_temp_next[np.concatenate((tech_indicator_list, ['tic'])).tolist()] # [['macd','rsi_30', 'boll_lb', 'dx_30', 'tic']] # cci_30
        piece = weight_piece.merge(X, on = 'tic').merge(piece_return, on = 'tic')
        piece['Y'] = piece['return'] * piece['weight']
        X = piece[tech_indicator_list] # ['macd','rsi_30', 'boll_lb', 'dx_30']] # cci_30
        X = sm.add_constant(X)
        Y = piece[['Y']]
        model = sm.OLS(Y,X)
        results = model.fit()
        meta_score_coef["coef"] += [(X * results.params).sum(axis = 0)]
        meta_score_coef["date"] += [date]
        meta_score_coef["algo"] += [algo]

meta_score_coef = pd.DataFrame(meta_score_coef)


In a future version of pandas all arguments of concat except for the argument 'objs' will be keyword-only



### Correlation Coefficient
Calculate the  sing-step and multi-step correlation coefficients

In [117]:
meta_score_coef.coef[0]

const            -0.044508
macd             -0.010997
boll_ub          -0.196528
boll_lb           1.290190
rsi_30            0.219260
daily_variance   -0.000172
dx_30            -0.060048
close_30_sma     -0.242147
close_60_sma     -0.985820
dtype: float64

In [118]:
performance_score = {'date':[], 'algo':[], 'score':[]}

for i in range(0, len(unique_trade_date)):
    date_ = unique_trade_date[i]
    if len(meta_score_coef[(meta_score_coef['date'] == date_)]) == 0:
        continue 
    lr_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'LR')]['coef'].values[0][tech_indicator_list].values # cci_30
    rf_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'RF')]['coef'].values[0][tech_indicator_list].values # cci_30
    reference_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'Reference Model')]['coef'].values[0][tech_indicator_list].values # cci_30
    dt_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'DT')]['coef'].values[0][tech_indicator_list].values # cci_30
    svm_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'SVM')]['coef'].values[0][tech_indicator_list].values # cci_30

    saliency_coef_a2c = meta_Q[(meta_Q['date'] == date_) & (meta_Q['algo'] == "A2C")]['Saliency Map'].values
    saliency_coef_ppo = meta_Q[(meta_Q['date'] == date_) & (meta_Q['algo'] == "PPO")]['Saliency Map'].values

    lr_score = np.corrcoef(lr_coef, reference_coef)[0][1]
    rf_score = np.corrcoef(rf_coef, reference_coef)[0][1]
    dt_score = np.corrcoef(dt_coef, reference_coef)[0][1]
    svm_score = np.corrcoef(svm_coef, reference_coef)[0][1]
    saliency_score_a2c = np.corrcoef(saliency_coef_a2c, reference_coef)[0][1]
    saliency_score_ppo = np.corrcoef(saliency_coef_ppo, reference_coef)[0][1]

    for algo in ['LR', 'A2C', 'PPO', 'RF', 'DT', 'SVM']:
        performance_score['date'] += [date_]
        performance_score['algo'] += [algo]
        if algo == 'LR':
            score = lr_score
        elif algo == 'RF':
            score = rf_score
        elif algo == 'DT':
            score = dt_score
        elif algo == 'A2C':
            score = saliency_score_a2c
        elif algo == 'SVM':
            score = svm_score
        else:
            score = saliency_score_ppo
        performance_score['score'] += [score]
performance_score = pd.DataFrame(performance_score)


invalid value encountered in true_divide


invalid value encountered in true_divide



In [119]:
multi_performance_score = {"date": [], "algo": [], "score": []}
window = 15
for i in range(len(unique_trade_date) - window ):
    date_ = unique_trade_date[i]
    if len(meta_score_coef[(meta_score_coef['date'] == date_)]) == 0:
        continue 
    lr_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'LR')]['coef'].values[0][tech_indicator_list].values # cci_30
    rf_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'RF')]['coef'].values[0][tech_indicator_list].values # cci_30
    reference_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'Reference Model')]['coef'].values[0][tech_indicator_list].values # cci_30
    for w in range(1, window):
        date_f = unique_trade_date[i + w]
        prx_coef = meta_score_coef[(meta_score_coef['date'] == date_f) & (meta_score_coef['algo'] == 'Reference Model')]['coef'].values[0][tech_indicator_list].values # cci_30
        reference_coef += prx_coef
    reference_coef = reference_coef / window
    dt_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'DT')]['coef'].values[0][tech_indicator_list].values # cci_30
    svm_coef = meta_score_coef[(meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'SVM')]['coef'].values[0][tech_indicator_list].values # cci_30
    saliency_coef_a2c = meta_Q[(meta_Q['date'] == date_) & (meta_Q['algo'] == "A2C")]['Saliency Map'].values
    saliency_coef_ppo = meta_Q[(meta_Q['date'] == date_) & (meta_Q['algo'] == "PPO")]['Saliency Map'].values
    lr_score = np.corrcoef(lr_coef, reference_coef)[0][1]
    rf_score = np.corrcoef(rf_coef, reference_coef)[0][1]
    dt_score = np.corrcoef(dt_coef, reference_coef)[0][1]
    svm_score = np.corrcoef(svm_coef, reference_coef)[0][1]
    saliency_score_a2c = np.corrcoef(saliency_coef_a2c, reference_coef)[0][1]
    saliency_score_ppo = np.corrcoef(saliency_coef_ppo, reference_coef)[0][1]

    for algo in ["LR", "A2C", "RF", "PPO", "DT", "SVM"]:
        multi_performance_score["date"] += [date_]
        multi_performance_score["algo"] += [algo]
        if algo == "LR":
            score = lr_score 
        elif algo == "RF":
            score = rf_score
        elif algo == "DT":
            score = dt_score
        elif algo == "A2C":
            score = saliency_score_a2c
        elif algo == "SVM":
            score = svm_score
        else:
            score = saliency_score_ppo
        multi_performance_score["score"] += [score]

multi_performance_score = pd.DataFrame(multi_performance_score)

### Data Visualization

In [120]:
from datetime import datetime as dt

import matplotlib.pyplot as plt
import plotly
import plotly.graph_objs as go
trace1_portfolio = go.Scatter(x = time_ind, y = a2c_cumpod, mode = 'lines', name = 'A2C')
trace2_portfolio = go.Scatter(x = time_ind, y = ppo_cumpod, mode = 'lines', name = 'PPO')
trace3_portfolio = go.Scatter(x = time_ind, y = dji_cumpod, mode = 'lines', name = 'DJIA')
trace4_portfolio = go.Scatter(x = time_ind, y = lr_cumprod, mode = 'lines', name = 'LR')
trace5_portfolio = go.Scatter(x = time_ind, y = rf_cumprod, mode = 'lines', name = 'RF')
trace6_portfolio = go.Scatter(x = time_ind, y = dt_cumprod, mode = 'lines', name = 'DT')
trace7_portfolio = go.Scatter(x = time_ind, y = svm_cumprod, mode = 'lines', name = 'SVM')

In [121]:
fig = go.Figure()
fig.add_trace(trace1_portfolio)
fig.add_trace(trace2_portfolio)
fig.add_trace(trace3_portfolio)
fig.add_trace(trace4_portfolio)
fig.add_trace(trace5_portfolio)
fig.add_trace(trace6_portfolio)
fig.add_trace(trace7_portfolio)

fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=15,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2
        
    ),
)
fig.update_layout(title={
        #'text': "Cumulative Return using FinRL",
        'y':0.85,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})

fig.update_layout(
    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont = dict(size = 30), title = "Cumulative Return"),
    font=dict(
        size=40,
    ),
)
fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))

fig.update_xaxes(showline=True, linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')

fig.show()

#### We found that A2C and PPO succeeded in the portfoli management task and is better than all other algorithms/benchmark.

In [122]:
meta_score = {"Annual return":[], "Annual volatility":[], "Max drawdown":[], "Sharpe ratio":[], "Algorithm":[], "Calmar ratio":[]}
for name in ["LR", "A2C", "RF", "Reference Model", "PPO", "SVM", "DT", "DJI"]:
    if name == "DT":
        annualreturn = dt_stats["Annual return"]
        annualvol = dt_stats["Annual volatility"]
        sharpeRatio = dt_stats["Sharpe ratio"]
        maxdradown = dt_stats["Max drawdown"]
        calmarratio = dt_stats["Calmar ratio"]
    elif name == "LR":
        annualreturn = lr_stats["Annual return"]
        annualvol = lr_stats["Annual volatility"]
        sharpeRatio = lr_stats["Sharpe ratio"]
        maxdradown = lr_stats["Max drawdown"]
        calmarratio = lr_stats["Calmar ratio"]
    elif name == "SVM":
        annualreturn = svm_stats["Annual return"]
        annualvol = svm_stats["Annual volatility"]
        sharpeRatio = svm_stats["Sharpe ratio"]
        maxdradown = svm_stats["Max drawdown"]
        calmarratio = svm_stats["Calmar ratio"]
    elif name == "RF":
        annualreturn = rf_stats["Annual return"]
        annualvol = rf_stats["Annual volatility"]
        sharpeRatio = rf_stats["Sharpe ratio"]
        maxdradown = rf_stats["Max drawdown"]
        calmarratio = rf_stats["Calmar ratio"]
    elif name == "Reference Model":
        annualreturn = reference_stats["Annual return"]
        annualvol = reference_stats["Annual volatility"]
        sharpeRatio = reference_stats["Sharpe ratio"]
        maxdradown = reference_stats["Max drawdown"]
        calmarratio = reference_stats["Calmar ratio"]
    elif name == "PPO":
        annualreturn = perf_stats_all_ppo["Annual return"]
        annualvol = perf_stats_all_ppo["Annual volatility"]
        sharpeRatio = perf_stats_all_ppo["Sharpe ratio"]
        maxdradown = perf_stats_all_ppo["Max drawdown"]
        calmarratio = perf_stats_all_ppo["Calmar ratio"]
    elif name == "DJI":
        annualreturn = baseline_df_stats["Annual return"]
        annualvol = baseline_df_stats["Annual volatility"]
        sharpeRatio = baseline_df_stats["Sharpe ratio"]
        maxdradown = baseline_df_stats["Max drawdown"]
        calmarratio = baseline_df_stats["Calmar ratio"]
    else:
        annualreturn = perf_stats_all_a2c["Annual return"]
        annualvol = perf_stats_all_a2c["Annual volatility"]
        sharpeRatio = perf_stats_all_a2c["Sharpe ratio"]
        maxdradown = perf_stats_all_a2c["Max drawdown"]
        calmarratio = perf_stats_all_a2c["Calmar ratio"]
    meta_score["Algorithm"] += [name]
    meta_score["Annual return"] += [annualreturn]
    meta_score["Annual volatility"] += [annualvol]
    meta_score["Max drawdown"] += [maxdradown]
    meta_score["Sharpe ratio"] += [sharpeRatio]
    meta_score["Calmar ratio"] += [calmarratio]

meta_score = pd.DataFrame(meta_score).sort_values("Sharpe ratio")

In [125]:
postiveRatio = pd.DataFrame(performance_score.groupby("algo").apply(lambda x : np.mean(x['score'])))

postiveRatio = postiveRatio.reset_index()
postiveRatio.columns = ['algo', 'avg_correlation_coefficient']
postiveRatio['Sharpe Ratio'] = [0] * 6

# postiveRatio.plot.bar(x = 'algo', y = 'avg_correlation_coefficient')

postiveRatiom = pd.DataFrame(multi_performance_score.groupby("algo").apply(lambda x : np.mean(x['score'])))
postiveRatiom = postiveRatiom.reset_index()
postiveRatiom.columns = ['algo', 'avg_correlation_coefficient']
postiveRatiom['Sharpe Ratio'] = [0] * 6

# postiveRatiom.plot.bar(x = 'algo', y = 'avg_correlation_coefficient')


for algo in ['A2C', 'PPO', 'LR','DT', 'RF', 'SVM']:
    postiveRatio.loc[postiveRatio['algo'] == algo, 'Sharpe Ratio'] = meta_score.loc[meta_score['Algorithm'] == algo,'Sharpe ratio'].values[0]
    postiveRatiom.loc[postiveRatio['algo'] == algo, 'Sharpe Ratio'] = meta_score.loc[meta_score['Algorithm'] == algo,'Sharpe ratio'].values[0]

postiveRatio.sort_values("Sharpe Ratio", inplace= True)

postiveRatiom.sort_values("Sharpe Ratio", inplace= True)

In [126]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(x=postiveRatiom['algo'], y=postiveRatiom['Sharpe Ratio'], name="Sharpe Ratio", marker_size = 15, line_width=5),
    secondary_y=True,
)

fig.add_trace(
    go.Bar(x=postiveRatiom['algo'], y=postiveRatiom['avg_correlation_coefficient'], name="Multi-Step Average Correlation Coefficient          ", width
    =0.38),
    secondary_y=False,
)
fig.add_trace(
    go.Bar(x=postiveRatio['algo'], y=postiveRatio['avg_correlation_coefficient'], name="Single-Step Average Correlation Coefficient           ", width
    =0.38),
    secondary_y=False,
)
    
fig.update_layout(
    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
)
fig.update_layout(legend=dict(
    yanchor="top",
    y=1.5,
    xanchor="right",
    x=0.95
))
fig.update_layout(font_size = 15)

# Set x-axis title
fig.update_xaxes(title_text="Model")
fig.update_xaxes(showline=True, linecolor='black',showgrid=True,gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(showline=True,linecolor='black',showgrid=True, secondary_y=False, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')
# Set y-axes titles
fig.update_yaxes(title_text="Average Correlation Coefficient", secondary_y=False, range = [-0.1,0.1])
fig.update_yaxes(title_text="Sharpe Ratio", secondary_y=True,range = [-0.5,2.5])

fig.show()

#### The correlation coefficient represents the level of prediction power. 

We found that:
>*  The sharpe ratio is in accordance with both single-step and  multi-step average correlation coefficient.
>* DRL agents is better at multi-step prediction than ML algorithms while worse at single-step prediction

In [127]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots


fig = make_subplots(rows=2, cols=3)

trace0 = go.Histogram(x=performance_score[performance_score['algo'] == 'A2C']['score'].values, nbinsx=25, name = 'A2C',histnorm='probability')
trace1 = go.Histogram(x=performance_score[performance_score['algo'] == 'PPO']['score'].values, nbinsx=25, name = 'PPO',histnorm='probability')
trace2 = go.Histogram(x=performance_score[performance_score['algo'] == 'DT']['score'].values, nbinsx=25, name = 'DT',histnorm='probability')
trace3 = go.Histogram(x=performance_score[performance_score['algo'] == 'LR']['score'].values, nbinsx=25, name = 'LR',histnorm='probability')
trace4 = go.Histogram(x=performance_score[performance_score['algo'] == 'SVM']['score'].values, nbinsx=25, name = 'SVM',histnorm='probability')
trace5 = go.Histogram(x=performance_score[performance_score['algo'] == 'RF']['score'].values, nbinsx=25, name = 'RF',histnorm='probability')


fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)
fig.append_trace(trace2, 1, 3)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)
fig.append_trace(trace5, 2, 3)
# Update xaxis properties
fig.update_xaxes(title_text="Correlation coefficient", row=2, col=2)
fig.update_yaxes(title_text="Frequency", row=1, col=1)
fig.update_yaxes(title_text="Frequency", row=2, col=1)

fig.update_layout(

    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
     font=dict(
       
        size=18,
    ),

)
fig.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="left",
    x=1
))

fig.update_xaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')

fig.show()

#### Histogram of single-step correlation coefficient

In [128]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots


fig = make_subplots(rows=2, cols=3)

trace0 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'A2C']['score'].values, nbinsx=25, name = 'A2C',histnorm='probability')
trace1 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'PPO']['score'].values, nbinsx=25, name = 'PPO',histnorm='probability')
trace2 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'DT']['score'].values, nbinsx=25, name = 'DT',histnorm='probability')
trace3 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'LR']['score'].values, nbinsx=25, name = 'LR',histnorm='probability')
trace4 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'SVM']['score'].values, nbinsx=25, name = 'SVM',histnorm='probability')
trace5 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'RF']['score'].values, nbinsx=25, name = 'RF',histnorm='probability')

fig.update_layout(yaxis1 = dict(range=[0, 0.2]))
fig.update_layout(yaxis2 = dict(range=[0, 0.2]))
fig.update_layout(yaxis3 = dict(range=[0, 0.4]))
fig.update_layout(yaxis4 = dict(range=[0, 0.4]))
fig.update_layout(yaxis5 = dict(range=[0, 0.4]))
fig.update_layout(yaxis6 = dict(range=[0, 0.4]))

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)
fig.append_trace(trace2, 1, 3)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)
fig.append_trace(trace5, 2, 3)
# Update xaxis properties
fig.update_xaxes(title_text="Correlation coefficient", row=2, col=2)
fig.update_yaxes(title_text="Frequency", row=1, col=1)
fig.update_yaxes(title_text="Frequency", row=2, col=1)

fig.update_layout(

    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
     font=dict(
       
        size=18,
    ),

)
fig.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="left",
    x=1
))

fig.update_xaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')

fig.show()

#### Histogram of multi-step correlation coefficient