# Content

# Prepping

## Drive location

Followed cells are written to install `FinRL` which are adjusted for Vietnam stock market by our group

In [None]:
%mkdir CS106-Vietnam-Stock-Trading

In [None]:
%cd CS106-Vietnam-Stock-Trading

In [None]:
!sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev libgl1-mesa-glx

`!pip install` below has already install `vnquant` repo in it

In [None]:
!pip install git+https://github.com/ThangDuong59/CS106-Vietnam-Stock-Trading.git

## Essential modules

In [3]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
import datetime
import itertools

%matplotlib inline
from pprint import pprint

In [None]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

# A. Problem definition

This problem is to design an automated trading solution for single/multiple stock trading. We model the stock trading process as a **Markov Decision Process (MDP)**. We then formulate our trading goal as a maximization problem.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:


* ***Action:*** The action space describes the allowed actions that the agent interacts with the
environment. Normally, a ∈ A includes three actions: $a ∈ {−1, 0, 1}$, where $−1, 0, 1$ represent
selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use
an action space ${−k, ..., −1, 0, 1, ..., k}$, where k denotes the number of shares. For example, "Buy
10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

* ***Reward function:*** `r(s, a, s′)` is the incentive mechanism for an agent to learn a better action. The change of the portfolio value when action a is taken at state s and arriving at new state s',  i.e., $r(s, a, s′) = v′ − v$, where $v'$ and $v$ represent the portfolio
values at state s′ and s, respectively

* ***State:*** The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before executing a trade, so
our trading agent observes many different features to better learn in an interactive environment.

* ***Environment:*** VN30 for multiple stock and VNM for single stock

The data is provided through [**vnquant**](https://github.com/ThangDuong59/vnquant) which is adjusted by us for suitable to FinRL

And this repo is based on [**FinRL: A Deep Reinforcement Learning Framework for Quantitative Finance**](https://github.com/AI4Finance-LLC/FinRL#finrl-a-deep-reinforcement-learning-framework-for-quantitative-finance----)

In [None]:
from finrl.config import config
from finrl.marketdata.yahoodownloader import YahooDownloader
from finrl.marketdata.vnquantdownloader import vnquantDownloader
from finrl.preprocessing.preprocessors import FeatureEngineer
from finrl.preprocessing.data import data_split
from finrl.model.models import DRLAgent, DRLEnsembleAgent
from finrl.env.env_stocktrading import StockTradingEnv
from finrl.trade.backtest import backtest_stats, backtest_plot, get_daily_return, get_baseline

## Data

`vnquant`, written by Pham Dinh Khanh, is a package allow you to crawl data through **VNDIRECT** API.

The reason we don't use default crawler is **Yahoo Downloader** is just able to download some common stock in Vietnam.

-----
class vnquantDownloader:
    Provides methods for retrieving daily stock data from
    VNDirect

    Attributes
    ----------
        start_date : str
            start date of the data (modified from config.py)
        end_date : str
            end date of the data (modified from config.py)
        ticker_list : list
            a list of stock tickers (modified from config.py)

    Methods
    -------
    fetch_data()
        Fetches data from VNDirect API
        Returns
        -------
        `pd.DataFrame`
            7 columns: A date, open, high, low, close, volume and tick symbol
            for the specified stock ticker
        """


In [9]:
config.VN_30_TICKER

['BID',
 'BVH',
 'CTG',
 'FPT',
 'GAS',
 'HDB',
 'HPG',
 'KDH',
 'MBB',
 'MSN',
 'MWG',
 'NVL',
 'PDR',
 'PLX',
 'PNJ',
 'POW',
 'REE',
 'SBT',
 'SSI',
 'STB',
 'TCB',
 'VCB',
 'VHM',
 'VIC',
 'VJC',
 'VNM',
 'VPB',
 'VRE']

In [8]:
ticker = "VNM"#@param{type:"string"}
type(ticker)
config.SINGLE_TICKER = [ticker]
config.SINGLE_TICKER

['VCB']

In [None]:
config.START_DATE = '2013-01-02' #@param {type:"date"}
config.START_TRADE_DATE = "2021-07-01" #@param {type:"date"}
config.END_DATE = "2019-01-01" #@param {type:"date"}

Users can choose their own ticker

In [None]:
ticker_list = config.VN_30_TICKER #@param ["config.VN_30_TICKER", "config.SINGLE_TICKER"] {type:"raw", allow-input: true}
df = vnquantDownloader(start_date=config.START_DATE,
                        end_date=config.END_DATE,
                        ticker_list=ticker_list, ).fetch_data()

In [14]:
df.tail(20)

Unnamed: 0,date,open,high,low,close,volume,tic,day
41883,2021-07-01,43400.0,43700.0,42750.0,32185.0,25460700.0,MBB,3
41884,2021-07-01,111400.0,114000.0,110100.0,113500.0,1706400.0,MSN,3
41885,2021-07-01,152000.0,152500.0,150500.0,151100.0,917200.0,MWG,3
41886,2021-07-01,123100.0,123600.0,119200.0,120000.0,6497433.0,NVL,3
41887,2021-07-01,93300.0,96200.0,92900.0,95900.0,5189800.0,PDR,3
41888,2021-07-01,54900.0,56000.0,54000.0,55500.0,2307300.0,PLX,3
41889,2021-07-01,100700.0,101200.0,99500.0,100200.0,362600.0,PNJ,3
41890,2021-07-01,12050.0,12150.0,11950.0,12100.0,7807900.0,POW,3
41891,2021-07-01,57500.0,58400.0,57300.0,57600.0,434400.0,REE,3
41892,2021-07-01,20800.0,21500.0,20800.0,21350.0,3495900.0,SBT,3


### Download data

Save data to `.csv` file

In [None]:
now = datetime.datetime.now().strftime("%Y%m%d-%Hh%M")
df.to_csv("./" + config.DATA_SAVE_DIR + "/" + now + ".csv")

## Data adjustment

-----
class FeatureEngineer:
    Provides methods for preprocessing the stock price data

    Attributes
    ----------
        use_technical_indicator : boolean
            we technical indicator or not
        tech_indicator_list : list
            a list of technical indicator names (modified from config.py)
        use_turbulence : boolean
            use turbulence index or not
        user_defined_feature:boolean
            user user defined features or not

    Methods
    -------
    preprocess_data()
        main method to do the feature engineering

In [6]:
# Using technical indicators, check https://pypi.org/project/stockstats/ for different names
config.TECHNICAL_INDICATORS_LIST

['macd', 'rsi_30', 'cci_30', 'dx_30']

In [None]:
fe = FeatureEngineer(use_technical_indicator=True,
                    tech_indicator_list = config.TECHNICAL_INDICATORS_LIST,
                    use_turbulence=True,
                    user_defined_feature = False)

In [None]:
processed = fe.preprocess_data(df)

list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(
    processed['date'].min(), processed['date'].max()).astype(str))
combination = list(itertools.product(list_date, list_ticker))

processed_full = pd.DataFrame(combination, columns=["date", "tic"]).merge(
    processed, on=["date", "tic"], how="left")
processed_full = processed_full[processed_full['date'].isin(
    processed['date'])]
processed_full = processed_full.sort_values(['date', 'tic'])

processed_full = processed_full.fillna(0)

Successfully added technical indicators
Successfully added turbulence index


In [None]:
processed_full.sample(n = 20)

In [None]:
train = data_split(processed_full, config.START_DATE,
                    config.START_TRADE_DATE)
trade = data_split(
    processed_full, config.START_TRADE_DATE, config.END_DATE)

In [None]:
stock_dimension = len(train.tic.unique())
state_space = (
    1
    + 2 * stock_dimension
    + len(config.TECHNICAL_INDICATORS_LIST) * stock_dimension
)

In [None]:
stock_dimension, state_space

(1, 7)

# B. Implementation

## Env initial params

In [None]:
env_kwargs = {
        "hmax": 50000, 
        "initial_amount": 100000000,
        "buy_cost_pct": 0.0015,
        "sell_cost_pct": 0.0015,
        "state_space": state_space,
        "stock_dim": stock_dimension,
        "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST,
        "action_space": stock_dimension,
        "reward_scaling": 1e-4
    }

## Env initialization

In [None]:
e_train_gym = StockTradingEnv(df=train, **env_kwargs)
env_train, _ = e_train_gym.get_sb_env()

agent = DRLAgent(env=env_train)

## Model training

### SAC model

In [None]:
print("==============Model Training===========")
now = datetime.datetime.now().strftime("%Y%m%d-%Hh%M")

model_sac = agent.get_model("sac")
trained_sac = agent.train_model(
    model=model_sac, tb_log_name="sac", total_timesteps=100000
)

### A2C model

In [None]:
print("==============Model Training===========")
now = datetime.datetime.now().strftime("%Y%m%d-%Hh%M")

model_a2c = agent.get_model("a2c")
trained_a2c = agent.train_model(
    model=model_a2c, tb_log_name="a2c", total_timesteps=100000
)

{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007}
Using cuda device
Logging to tensorboard_log/a2c/a2c_1
------------------------------------
| time/                 |          |
|    fps                | 182      |
|    iterations         | 100      |
|    time_elapsed       | 2        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -1.42    |
|    explained_variance | 0        |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | -99.7    |
|    std                | 1        |
|    value_loss         | 8.95e+03 |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 241      |
|    iterations         | 200      |
|    time_elapsed       | 4        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -1.43    |
|    explained_variance | 0        |

### PPO model

In [None]:
print("==============Model Training===========")
now = datetime.datetime.now().strftime("%Y%m%d-%Hh%M")

model_ppo = agent.get_model("ppo")
trained_ppo = agent.train_model(
    model=model_ppo, tb_log_name="ppo", total_timesteps=100000
)

{'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 64}
Using cuda device
Logging to tensorboard_log/ppo/ppo_1
-----------------------------
| time/              |      |
|    fps             | 399  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 391          |
|    iterations           | 2            |
|    time_elapsed         | 10           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 0.0013894257 |
|    clip_fraction        | 0.0102       |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.42        |
|    explained_variance   | -0.000149    |
|    learning_rate        | 0.00025      |
|    loss                 | 1.28e+04     |
|    n_updates            | 10           |
|    polic

### DDPG model

In [None]:
print("==============Model Training===========")
now = datetime.datetime.now().strftime("%Y%m%d-%Hh%M")

model_ddpg = agent.get_model("ddpg")
trained_ddpg = agent.train_model(
    model=model_ddpg, tb_log_name="ppo", total_timesteps=50000
)

### TD3 model

In [None]:
print("==============Model Training===========")
now = datetime.datetime.now().strftime("%Y%m%d-%Hh%M")

model_td3 = agent.get_model("td3")
trained_td3 = agent.train_model(
    model=model_td3, tb_log_name="td3", total_timesteps=30000
)

{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001}
Using cuda device
Logging to tensorboard_log/td3/td3_1
---------------------------------
| time/              |          |
|    episodes        | 4        |
|    fps             | 158      |
|    time_elapsed    | 37       |
|    total timesteps | 5976     |
| train/             |          |
|    actor_loss      | 3.97e+04 |
|    critic_loss     | 2.6e+09  |
|    learning_rate   | 0.001    |
|    n_updates       | 4482     |
---------------------------------
---------------------------------
| time/              |          |
|    episodes        | 8        |
|    fps             | 143      |
|    time_elapsed    | 83       |
|    total timesteps | 11952    |
| train/             |          |
|    actor_loss      | 2.98e+04 |
|    critic_loss     | 1.99e+07 |
|    learning_rate   | 0.001    |
|    n_updates       | 10458    |
---------------------------------
day: 1493, episode: 10
begin_total_asset: 100000000.00
end_to

### Ensemble Agent

Ensemble agent contains 3 DRL algorithms: A2C, PPO, DDPG

`class DRLEnsembleAgent` has been adjusted by original repo (AI4Finance-LLC) for suitable with ensemble method.

In [None]:
# env_esemble_kwargs
train_period = [config.START_DATE, config.START_TRADE_DATE]
val_test_period = [config.START_TRADE_DATE, config.END_DATE]

env_ensemble_kwargs = {
        "df": processed_full,
        "train_period": train_period,
        "val_test_period": val_test_period,
        "rebalance_window": 63,
        "validation_window": 63,
        "stock_dim": stock_dimension,
        "hmax": 50000,
        "initial_amount": 100000000,
        "buy_cost_pct": 0.0015,
        "sell_cost_pct": 0.0015,
        "reward_scaling": 1e-4,
        "state_space": state_space,
        "action_space": stock_dimension,
        "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST,
        "print_verbosity": 10
    }

timesteps_dict = {
        'a2c': 100000,
        'ppo': 100000,
        'ddpg': 50000
    }

In [None]:
# Initialize Ensemble Agent
ensembleAgent = DRLEnsembleAgent(**env_ensemble_kwargs)

In [None]:
# Train 
print("==============Model Training===========")
now = datetime.datetime.now().strftime("%Y%m%d-%Hh%M")
ensemble_model = ensembleAgent.run_ensemble_strategy(
    config.A2C_PARAMS, config.PPO_PARAMS, config.DDPG_PARAMS, timesteps_dict=timesteps_dict)
ensemble_model.to_csv(
    "./" + config.RESULTS_DIR + "/df_ensemble_summary" + now + ".csv")

## Model trading

**Đổi model trading thành model trading vừa mới được train ở phía trên**

In [None]:
print("==============Start Trading===========")
e_trade_gym = StockTradingEnv(
    df=trade, turbulence_threshold=250, **env_kwargs)

df_account_value, df_actions = DRLAgent.DRL_prediction(
    model=trained_ddpg, environment=e_trade_gym                          # Đổi model
)
df_account_value.to_csv(
    "./" + config.RESULTS_DIR + "/df_account_value_" + now + ".csv"
)
df_actions.to_csv("./" + config.RESULTS_DIR +
                    "/df_actions_" + now + ".csv")

hit end!


## Backtest

In [None]:
print("==============Get Backtest Results===========")
now = datetime.datetime.now().strftime("%Y%m%d-%Hh%M")
perf_stats_all = backtest_stats(df_account_value)              # Đổi lại thành df_account_value
perf_stats_all = pd.DataFrame(perf_stats_all)
perf_stats_all.to_csv("./" + config.RESULTS_DIR +
                        "/perf_stats_all_" + now + ".csv")