# Trade Smarter w/ Reinforcement Learning
## A deep dive into TensorTrade - the Python framework for trading and investing using deep reinforcement learning

Winning high stakes poker tournaments, beating world-class StarCraft players, and autonomously driving Tesla's futuristic sports cars. What do they all have in common? Each of these extremely complex tasks were long thought to be impossible by machines, until only recently being made possible through the massive advancements in deep reinforcement learning. Reinforcement learning is beginning to take over the world.

A little over two months ago, I decided I wanted to be a part of the revolution, so I set out on a journey to create a profitable Bitcoin trading strategy using state-of-the-art deep reinforcement learning algorithms. While I made quite a bit of progress on that front, I realized that the tooling for this sort of project can be quite daunting to wrap your head around, and as such, it is very easy to get lost in the details.

In between optimizing my previous library for distributed high-performance computing (HPC) systems; getting lost in endless pipelines of data and feature optimizations; and running my head in circles around efficient model set-up, tuning, training, and evaluation; I realized that there had to be a better way of doing things. After countless hours of researching existing projects, spending endless nights watching PyData conference talks, and having many back-and-forth conversations with the hundreds of members of the  RL trading Discord community, I realized there weren't any existing solutions that were all that good.

There were many bits and pieces of great reinforcement learning trading systems spread across the inter-webs, but nothing solid and complete. For this reason, I've decided to create an open source Python framework for getting any trading strategy from idea to production, efficiently, using deep reinforcement learning. 

Enter TensorTrade. The idea was to create a highly modular framework for building efficient reinforcement learning trading strategies in a composable, maintainable way. Sounds like a mouthful of buzz-words if you ask me, so let's get into the meat.

# Overview

TensorTrade is an open source Python framework for training, evaluating, and deploying robust trading strategies using deep reinforcement learning. The framework focuses on being highly composable and extensible, to allow the system to scale from simple trading strategies on a single CPU to complex investment strategies run on a distribution of HPC machines. Under the hood, the framework uses many of the APIs from existing machine learning frameworks to maintain high quality data pipelines and learning models.

One of the main goals of TensorTrade is to enable fast experimentation with algorithmic trading strategies by leveraging the existing tools and pipelines provided by pandas, gym, sklearn, ray, keras, and tensorflow. It aims to simplify the process of testing and deploying robust trading agents using deep reinforcement learning, so you can focus on creating profitable strategies.

## RL Primer

In case your reinforcement learning chops are a bit rusty, let's quickly go over the basic concepts.

Every reinforcement learning problem starts out with an environment and one or more agents that can interact with the environment.

This technique is based off Markov Decision Processes (MDP) dating back to the 1950s.The agent will first observe the environment, then build a model of the current state and the expected value of actions within that environment. Based on that model, the agent will then take the action it has deemed as having the highest expected value.

The agent will then be rewarded by an amount corresponding to the actual value of the action taken within the environment. The reinforcement learning agent can then, through the process of trial and error (i.e. reinforcement), improve its underlying model and take more rewarding actions over time.

If you still need a bit of refreshment on the subject, there is a link to an article titled Introduction to Deep Reinforcement Learning in the references for this article, which is much more in-depth. Let's move on.

## Getting Started

The following tutorial should provide enough examples to get you started with creating simple trading strategies using TensorTrade, although you will quickly see the framework is capable of handling much more complex configurations.

## Installation

TensorTrade requires Python 3.6 or later, so make sure you've got a valid version before pip installing the framework.

In [2]:
!pip install tensortrade

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.[0m
Collecting tensortrade
Installing collected packages: tensortrade
Successfully installed tensortrade-0.0.1a1


# TensorTrade Components

At the core of TensorTrade are  trading strategies. Trading strategies combine reinforcement learning agents with composable trading logic in the form of a gym environment. A trading environment is made up of a set of modular components that can be mixed and matched to create highly diverse trading and investment strategies. I will explain this in further detail later, but for now it is enough to know the basics.

The code snippets in this section should serve as guidelines for creating new components. There will likely be missing implementation details that will become more clear in a later section or as more components are defined.

## Trading Strategy

A TradingStrategy consists of a learning agent and one or more trading environments to tune, train, and evaluate on. If only one environment is provided, it will be used for tuning, training, and evaluating. Otherwise, a separate environment may be provided for each step.

Trading environments are fully configurable gym environments with highly composable InstrumentExchange, FeaturePipeline, ActionStrategy, and RewardStrategy components. The InstrumentExchange provides observations to the environment and executes the agent's trades, the FeaturePipeline optionally transforms the exchange output into a more meaningful set of features before it is passed to the agent, the ActionStrategy converts the agent's actions into executable trades, and the RewardStrategy calculates the reward for each time step based on the agent's performance.

If it seems a bit complicated now, trust me, it's not. That is all there is to it, now it's just a matter of composing each of these components into a complete strategy. Let's begin.

In [3]:
from tensortrade.strategies import TensorforceTradingStrategy

agent_spec = {
    "type": "ppo_agent",
    "step_optimizer": {
        "type": "adam",
        "learning_rate": 1e-4
    },
    "discount": 0.99,
    "likelihood_ratio_clipping": 0.2,
}
network_spec = [
    dict(type='dense', size=64, activation="tanh"),
    dict(type='dense', size=32, activation="tanh")
]

strategy = TensorforceTradingStrategy(environment=environment,
                                      agent_spec=agent_spec,
                                      network_spec=network_spec)

ModuleNotFoundError: No module named 'gym'

Don't worry if the agent_spec and network_spec seem a bit confusing now, I will go over them in more detail later on.

## Trading Environment

A trading environment is a reinforcement learning environment that follows OpenAI's gym.Env specification. This allows us to leverage many of the existing reinforcement learning models in our trading agent, if we'd like. 

In [4]:
from tensortrade.environments import TradingEnvironment
environment = TradingEnvironment(exchange=exchange,
                                 feature_pipeline=feature_pipeline,
                                 action_strategy=action_strategy,
                                 reward_strategy=reward_strategy)

ModuleNotFoundError: No module named 'gym'

While the recommended use case is to plug a trading environment into a trading strategy, you can obviously use the trading environment separately, however you'd like.

## Instrument Exchanges

Instrument exchanges determine the universe of tradable instruments within a trading environment, return observations to the environment on each time step, and execute trades made within the environment. There are two types of instrument exchanges: live and simulated. 

Live exchanges are implementations of InstrumentExchange backed by live pricing data and a live trade execution engine. For example, CCXTExchange is a live exchange, which is capable of returning pricing data and executing trades on hundreds of live cryptocurrency exchanges, such as Binance and Coinbase. 

In [5]:
import ccxt
from tensortrade.exchanges.live import CCXTExchange

coinbase = ccxt.coinbasepro()
exchange = CCXTExchange(exchange=coinbase, base_instrument='USD')

ModuleNotFoundError: No module named 'ccxt'

Simulated exchanges, on the other hand, are implementations of InstrumentExchange backed by simulated pricing data and trade execution. For example, FBMExchange is a simulated exchange, which generates pricing and volume data using fractional brownian motion (FBM). Since its price is simulated, the trades it executes must be simulated as well. The exchange uses a simple slippage model to simulate price and volume slippage on trades, though like almost everything in TensorTrade, this slippage model can easily be swapped out for something more complex.

Though the FBMExchange generates fake price and volume data using a stochastic model, it is simply an implementation of SimulatedExchange. Under the hood, SimulatedExchange only requires a data_frame of price history to generate its simulations. This data_frame can either be provided by a coded implementation such as FBMExchange, or at runtime such as in the following example.

In [6]:
import pandas as pd
from tensortrade.exchanges.simulated import SimulatedExchange

df = pd.read_csv('./data/btc_ohclv_1h.csv')
exchange = SimulatedExchange(data_frame=df, base_instrument='USD')

ModuleNotFoundError: No module named 'gym'

## Feature Pipelines

Feature pipelines are meant for transforming observations from the environment into meaningful features for an agent to learn from. If a pipeline has been set for a particular trading environment, then observations will be passed through the FeaturePipeline before being output to the agent. For example, a feature pipeline could normalize all price values, make a time series stationary, add a moving average column, and remove an unnecessary column, before the observation is returned to the agent.

Feature pipelines can be initialized with an arbitrary number of comma-separated transformers. Each Transformer needs to be initialized with the set of columns to transform, or if nothing is passed, all columns will be transformed.

In [7]:
from tensortrade.features import FeaturePipeline
from tensortrade.features.scalers import MinMaxNormalizer
from tensortrade.features.stationarity import FractionalDifference

normalize_price = MinMaxNormalizer(["open", "high", "low", "close"])
difference_all = FractionalDifference(difference_order=0.6)
feature_pipeline = FeaturePipeline(normalize_price, difference_all)



This feature pipeline normalizes the price values between 0 and 1, before making the entire time series stationary by fractionally differencing consecutive values.

## Action Strategies

Action strategies define the action space of the environment and convert an agent's actions into executable trades. For example, if we were using a discrete action space of 3 actions (0 = hold, 1 = buy, 2 = sell), our learning agent does not need to know that returning an action of 1 is equivalent to buying an instrument. Rather, our agent needs to know the reward for returning an action of 1 in specific circumstances, and can leave the implementation details of converting actions to trades to the ActionStrategy.

In [8]:
from tensortrade.actions import DiscreteActionStrategy

action_strategy = DiscreteActionStrategy(n_actions=20, 
                                         instrument_symbol='BTC')

ModuleNotFoundError: No module named 'gym'

This discrete action strategy uses 20 discrete actions, which equates to 4 discrete amounts for each of the 5 trade types (market buy/sell, limit buy/sell, and hold). E.g. [0,5,10,15]=hold, 1=market buy 25%, 2=market sell 25%, 3=limit buy 25%, 4=limit sell 25%, 6=market buy 50%, 7=market sell 50%, etc…

## Reward Strategies

Reward strategies receive the trade taken at each time step and return a float, corresponding to the benefit of that specific action. For example, if the action taken this step was a sell that resulted in positive profits, our RewardStrategy could return a positive number to encourage more trades like this. On the other hand, if the action was a sell that resulted in a loss, the strategy could return a negative reward to teach the agent not to make similar actions in the future. A version of this example algorithm is implemented in SimpleProfitStrategy, however more complex strategies can obviously be used instead.

In [9]:
from tensortrade.rewards import SimpleProfitStrategy

reward_strategy = SimpleProfitStrategy()

The simple profit strategy returns a reward of -1 for not holding a trade, 1 for holding a trade, 2 for purchasing an instrument, and a value corresponding to the (positive/negative) profit earned by a trade if an instrument was sold.

## Learning Agents

Up until this point, we haven't seen the "deep" part of the deep reinforcement learning framework. This is where learning agents come in. Learning agents are where the math (read: magic) happens.

At each time step, the agent takes the observation from the environment as input, runs it through its underlying model (a neural network most of the time), and outputs the action to take. For example, the observation might be the previous open, high, low, and close price from the exchange. The learning model would take these values as input and output a value corresponding to the action to take, such as buy, sell, or hold.

It is important to remember the learning model has no intuition of the prices or trades being represented by these values. Rather, the model is simply learning which values to output for specific input values or sequences of input values, to earn the highest reward.

In this example, we will be using the Tensorforce library to provide learning agents to our trading strategy, although the TensorTrade framework is compatible with many reinforcement learning libraries such as Ray's RLLib, OpenAI's Baselines (or the much better maintained Stable Baselines), Intel's Coach, or anything from the TensorFlow line such as TF Agents.

It is possible that custom learning agents will be added to this framework in the future, though it will always be a goal of the framework to be interoperable with as many existing reinforcement learning libraries as possible, since there is so much concurrent growth in the space.

In [10]:
from tensorforce.agents import Agent

agent = Agent.from_spec(spec=agent_spec,
                        kwargs=dict(states=environment.states,            
                                    actions=environment.actions,             
                                    network=network_spec))

ModuleNotFoundError: No module named 'tensorforce'

Note: This example uses the tensorforce library to provide learning agents. This is not required to use TensorTrade, though it is required for this tutorial.

# Putting it All Together

Now that we know about each component that makes up a TradingStrategy, let's build and evaluate one.

For a quick recap, a TradingStrategy is made up of a TradingEnvironment and a learning agent. A TradingEnvironment is a gym environment that takes an InstrumentExchange, an ActionStrategy, a RewardStrategy, and an optional FeaturePipeline, and returns observations and rewards that the learning agent can be trained and evaluated on.

## Creating an Environment

The first step is to create a TradingEnvironment using the components outlined above.

In [11]:
from tensortrade.exchanges.simulated import FBMExchange
from tensortrade.features.scalers import MinMaxNormalizer
from tensortrade.features.stationarity import FractionalDifference
from tensortrade.features import FeaturePipeline
from tensortrade.rewards import SimpleProfitStrategy
from tensortrade.actions import DiscreteActionStrategy
from tensortrade.environments import TradingEnvironment

exchange = FBMExchange(base_instrument='BTC', timeframe='1h')
normalize_price = MinMaxNormalizer(["open", "high", "low", "close"])
difference = FractionalDifference(difference_order=0.6)
feature_pipeline = FeaturePipeline(normalize_price, difference)
reward_strategy = SimpleProfitStrategy()
action_strategy = DiscreteActionStrategy(n_actions=20, instrument_symbol='ETH/BTC')

environment = TradingEnvironment(exchange=exchange,
                                 feature_pipeline=feature_pipeline,
                                 action_strategy=action_strategy,
                                 reward_strategy=reward_strategy)

ModuleNotFoundError: No module named 'gym'

Simple enough, now environment is a gym environment that can be used by any compatible trading strategy or learning agent.

## Defining the Agent

Now that the environment is set up, it's time to create our learning agent. Again, we will be using Tensorforce for this, but feel free to drop in any other reinforcement learning agent here.

Since we are using TensorforceTradingStrategy, all we need to do is provide an agent specification and a network specification for the underlying neural network to be trained. For this example, we will be using a simple proximal policy optimization (PPO) agent and a simple dense network.

For more examples of agent and network specifications, see the Tensorforce Github.

In [12]:
agent_spec = {
    "type": "ppo_agent",
    "step_optimizer": {
        "type": "adam",
        "learning_rate": 1e-4
    },
    "discount": 0.99,
    "likelihood_ratio_clipping": 0.2,
}

network_spec = [
    dict(type='dense', size=64, activation="tanh"),
    dict(type='dense', size=32, activation="tanh")
]

## Training a Strategy

Creating our trading strategy is as simple as plugging in the environment, the agent specification, and the network specification.

In [13]:
from tensortrade.strategies import TensorforceTradingStrategy

strategy = TensorforceTradingStrategy(environment=environment,
                                      agent_spec=agent_spec,
                                      network_spec=network_spec)

ModuleNotFoundError: No module named 'gym'

Then to train the strategy (i.e. train the agent on the current environment), all we need to do is pass should_train=True to strategy.run().

In [15]:
performance = strategy.run(steps=100000, should_train=True)

NameError: name 'strategy' is not defined

And voila! Three hours later you will see the results of how your agent has done! If this feedback loop is a bit slow for you, you can pass a callback function to run function, which will be called at the end of each episode. The function will pass in a data frame containing the agent's progress that episode, and expects a bool in return. If True, the agent will continue training, otherwise, the agent will stop and return its performance.

TODO: Example Performance

## Saving and Restoring

All trading strategies are capable of saving their agent to a file, for later restoring. The environment is not saved, as it does not have state that we care about preserving. To save our TensorflowTradingStrategy to a file, we just to need to provide the path of file to our strategy.

In [16]:
strategy.save_agent(path="../agents/ppo_btc_1h")

NameError: name 'strategy' is not defined

This specific strategy saves multiple files, including a directory of models to the path provided.

To restore the agent from the file, we first need to instantiate our strategy, before calling restore_agent.

In [18]:
strategy = TensorforceTradingStrategy(environment=environment)
strategy.restore_agent(path="../agents/ppo_btc/1h")

NameError: name 'TensorforceTradingStrategy' is not defined

Our strategy is now restored back to its previous state, and ready to be used again. Let's see how it does.

## Strategy Evaluation

To evaluate our strategy's performance on unseen data, we will need to run it on a new environment.

In [19]:
from pandas import pd
from tensortrade.environments import TradingEnvironment
from tensortrade.exchanges.simulated import SimulatedExchange

df = pd.read_csv('./btc_ohlcv_1h.csv')
exchange = SimulatedExchange(data_frame=df, base_instrument='BTC')
environment = TradingEnvironment(exchange=exchange,
                                 feature_pipeline=feature_pipeline,
                                 action_strategy=action_strategy,
                                 reward_strategy=reward_strategy)

strategy.environment = environment

test_performance = strategy.run(episodes=1, should_train=False)

ImportError: cannot import name 'pd' from 'pandas' (//anaconda3/lib/python3.7/site-packages/pandas/__init__.py)

## Tuning a Strategy

TODO:

## Live Trading

TODO:

In [20]:
import ccxt
from tensortrade.environments import TradingEnvironment
from tensortrade.exchanges.live import CCXTExchange

coinbase = ccxt.coinbasepro(...)
exchange = CCXTExchange(exchange=coinbase,
                        base_instrument='USD', 
                        timeframe='1h')

environment = TradingEnvironment(exchange=exchange,
                                 feature_pipeline=feature_pipeline,
                                 action_strategy=action_strategy,
                                 reward_strategy=reward_strategy)

strategy = TradingStrategy.restore('Trained_PPO_agent.json')
strategy.environment = environment

test_perf = strategy.evaluate(steps=0, callback=trading_cb)

ModuleNotFoundError: No module named 'ccxt'

# Final Thoughts

TensorTrade is a powerful framework capable of building highly modular, high performance trading systems. It is fairly simple and easy to experiment with new trading and investment strategies, while allowing you to leverage components from one strategy in another. But don't take my word for it, create a strategy of your own and start teaching your robots to take over the world!

While this tutorial should be enough to get you started, there is still quite a lot more to learn if you want to create a profitable trading strategy. I encourage you to head over to the Github and dive into the codebase, or take a look at our documentation at tensortrade.org. There is also quite an active Discord community with over 750 total members, so if you have questions, feedback, or feature requests, feel free to drop them there!

I've gotten the project to a highly usable state. Though, my time is limited, and I believe there are many of you out there who could make valuable contributions to the open source codebase. So if you are a developer or data scientist with an interest in building state-of-the-art trading systems, I'd love to see you open a pull request, even if its just a simple test case!

Others have asked how they can contribute to the project without writing code. There are currently two ways that you can do that. The first is to help write documentation for the existing code, which you can get paid to do. If you'd like to do this, please talk to me on Discord. The other way to contribute is to sponsor this project either on Patreon or with BTC/ETH donations. Your support means a lot to me and allows me to spend more of my free time working on developing the framework and writing articles like this.

Thanks for reading! As always, all of the code for this tutorial can be found on my GitHub. Leave a comment below if you have any questions or feedback, I'd love to hear from you! I can also be reached on Twitter at @notadamking.

## References

###Introduction to Deep Reinforcement Learning

https://medium.com/@jonathan_hui/rl-introduction-to-deep-reinforcement-learning-35c25e04c199

### Policy Gradient Algorithms

https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#reinforce