## Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

#  Quote Rule and Lee Ready Algorithm

In the previous unit, we discussed how to use quote rule and lee ready algorithm to calculate the order flow. The quote rule uses the bid-ask, trade price and trade volume data to calculate the order flow. However, the quote rule has a limitation which can be solved by the lee-ready algorithm. 

In this notebook, we will calculate the order flow using the quote rule and understand its limitation. We will use the lee-ready algorithm to calculate order flow and overcome the limitation of quote rule.

The notebook is structured as follows:
1. [Read the Data](#data)
1. [Calculate Midpoint and Trade Direction](#midpoint)
1. [Calculate Order Flow with Quote Rule](#orderflow)
1. [Lee-Ready Algorithm](#leeready)
1. [Calculate the Order Flow](#lrorderflow)



## Import Libraries

In [1]:
# For data manipulation
import pandas as pd
import numpy as np

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

<a id='data'></a>
## Read the Data
Import the pickle file `tick_data_es500_2013_10_02.bz2` as the dataframe `tick_data`. This pickle file has tick data and best bid, ask data of 	E-mini S&P 500 futures for the date `2013-10-02`. This dataset has columns `bid`, `ask`, `trade_price` and `trade_size` with datetime in milliseconds as the index.

**Note:** _The python version 3.9.5 and pandas version 1.4.4 were used to create this pickle file. To avoid issues reading this pickle file, ensure that the python and pandas versions are equal or higher than the versions mentioned._

In [2]:
# Read the 'tick_data.pkl' pickle file as 'tick_data' dataframe
tick_data = pd.read_pickle('../data_modules/tick_data_es500_2013_10_02.bz2')

# Print first 5 trades of the 'tick_data' dataframe
tick_data.head()

Unnamed: 0,bid,ask,trade_price,trade_size
,,,,
2013-10-02 00:00:00.604800,1428.5,1428.75,1428.75,1.0
2013-10-02 00:00:10.972800,1428.5,1428.75,1428.75,25.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,17.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,11.0
2013-10-02 00:00:19.526400,1428.75,1429.0,1429.0,28.0


<a id='midpoint'></a>
## Calculate Midpoint and Trade Direction
The quote rule uses the midpoint to calculate the order flow. The midpoint is the average of the bid price and ask price.

In [3]:
# Storing the average of bid, ask price in the column 'midpoint'
tick_data['midpoint'] = (tick_data['bid'] + tick_data['ask']) / 2

According to the quote rule, the order flow is positive if the trade price is above the midpoint. Similarly, the order flow is negative if the trade price is below the midpoint.

Create a column `trade_direction`. For each trade, assign 1 as the trade direction if the trade price is above the midpoint and assign -1 as the trade direction if the trade price is below the midpoint.

In [4]:
# Create a column 'trade_direction' with initial value of 0 for each trade
tick_data['trade_direction'] = 0

# Assign values to the 'trade_direction' by comparing 'midpoint' with 'trade_price'
tick_data.loc[tick_data['midpoint'] <
              tick_data['trade_price'], 'trade_direction'] = 1
tick_data.loc[tick_data['midpoint'] >
              tick_data['trade_price'], 'trade_direction'] = -1

# Print first 5 trades of the 'tick_data' dataframe
tick_data.head()

Unnamed: 0,bid,ask,trade_price,trade_size,midpoint,trade_direction
,,,,,,
2013-10-02 00:00:00.604800,1428.5,1428.75,1428.75,1.0,1428.625,1.0
2013-10-02 00:00:10.972800,1428.5,1428.75,1428.75,25.0,1428.625,1.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,17.0,1428.625,1.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,11.0,1428.625,1.0
2013-10-02 00:00:19.526400,1428.75,1429.0,1429.0,28.0,1428.875,1.0


<a id='orderflow'></a>
## Calculate Order Flow with Quote Rule
To calculate the order flow of each trade, we will multiply the column `trade_size` with the column `trade_direction`. Finally, the result is stored in a new column called `order_flow` in the `tick_data` dataframe.

In [5]:
# Calculate the order flow using quote rule
tick_data['order_flow'] = tick_data['trade_direction'] * \
    tick_data['trade_size']

# Print first 5 trades of the 'tick_data' dataframe
tick_data.head()

Unnamed: 0,bid,ask,trade_price,trade_size,midpoint,trade_direction,order_flow
,,,,,,,
2013-10-02 00:00:00.604800,1428.5,1428.75,1428.75,1.0,1428.625,1.0,1.0
2013-10-02 00:00:10.972800,1428.5,1428.75,1428.75,25.0,1428.625,1.0,25.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,17.0,1428.625,1.0,17.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,11.0,1428.625,1.0,11.0
2013-10-02 00:00:19.526400,1428.75,1429.0,1429.0,28.0,1428.875,1.0,28.0


So far, we have classified trades when the midpoint is greater or less than the trade price. But, what if the trade occurred at the midpoint? i.e. what if the midpoint of a trade is equal to its trade price? In this case, the `trade_direction` is 0 and the order flow would be 0 indicating that the `order_flow` is undefined.  

Let's see how many trades in `tick_data` have the same midpoint as the `trade_price` that resulted in undefined order flow.

In [6]:
# Total number of trades
print(f'Total number of trades: {len(tick_data)}')

# Number of trades with undefined order flow
undefined_trades = len(tick_data[tick_data['order_flow'] == 0])
print(f'Total number of trades with undefined order flow: {undefined_trades}')

Total number of trades: 77390
Total number of trades with undefined order flow: 158


Out of 77390 trades, 158 trades have undefined order flow. The quote rule can't calculate the order flow if the trade price is equal to the midpoint. This is the limitation of using the quote rule to calculate the order flow. <br>

We can overcome this by using the lee-ready algorithm

<a id='leeready'></a>
## Lee-Ready Algorithm
The Lee-ready algorithm uses both the tick rule and the quote rule to calculate the order flow. <br>
It uses
* Quote rule for trades away from the midpoint
* Tick rule for trades at the midpoint

Let's import the pickle file `tick_data_es500_2013_10_02.bz2` as the dataframe `tick_data` and calculate the order flow using the lee-ready algorithm.

**Note:** _The python version 3.9.5 and pandas version 1.4.4 were used to create this pickle file. To avoid issues reading this pickle file, ensure that the python and pandas versions are equal or higher than the versions mentioned._

In [7]:
# Read the 'tick_data.pkl' pickle file as 'tick_data' dataframe
tick_data = pd.read_pickle('../data_modules/tick_data_es500_2013_10_02.bz2')

# Print first 5 trades of the 'tick_data' dataframe
tick_data.head()

Unnamed: 0,bid,ask,trade_price,trade_size
,,,,
2013-10-02 00:00:00.604800,1428.5,1428.75,1428.75,1.0
2013-10-02 00:00:10.972800,1428.5,1428.75,1428.75,25.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,17.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,11.0
2013-10-02 00:00:19.526400,1428.75,1429.0,1429.0,28.0


Since the lee-ready algorithm uses both tick rule and quote rule, let's create the data required to calculate order flow using both methods. 

We need the data of the tick direction for the tick rule. 
* Calculate tick direction and store it in the column `tick_direction`. 
* Exclude the first trade since there was no previous trade to the very first trade to apply the tick rule. 
* Use the forward-fill (`ffill`) method to replace the value of 0 in the `tick_direction` column with the previous non-zero tick direction value.

In [8]:
# Storing tick direction of each trade in the column 'trade_direction'
tick_data['tick_direction'] = tick_data['trade_price'].diff().apply(np.sign)

# Excluding first trade from the order flow calculations
tick_data = tick_data[1:]

# Replace all 0 values in the 'tick_direction' column
# with the previous non-zero value in the same column
tick_data['tick_direction'] = tick_data['tick_direction'].replace(
    0, method='ffill')

# Print first 5 trades of the 'tick_data' dataframe
tick_data.head()

Unnamed: 0,bid,ask,trade_price,trade_size,tick_direction
,,,,,
2013-10-02 00:00:10.972800,1428.5,1428.75,1428.75,25.0,0.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,17.0,1.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,11.0,1.0
2013-10-02 00:00:19.526400,1428.75,1429.0,1429.0,28.0,1.0
2013-10-02 00:00:19.612800,1428.75,1429.0,1429.0,3.0,1.0


The tick direction of the first trade is still 0 since it has no previous trades to replace the 0 value of the tick direction. For accurate order flow calculation using tick rule, we will remove the first trade since its tick direction is undefined.

In [9]:
# Excluding first trade from the order flow calculations since it's tick direction is undefined
tick_data = tick_data[1:]

We need the data of midpoint and trade direction for the quote rule. 
* Let's calculate the midpoint of each trade using bid, ask price and store it in the column `midpoint`. 
* Similarly, we will calculate the trade direction using the midpoint, trade price and store it in the column `trade_direction`.

In [10]:
# Storing the average of bid, ask price in the column 'midpoint'
tick_data['midpoint'] = (tick_data['bid'] + tick_data['ask']) / 2

# Create a column 'trade_direction' with initial value of 0 for each trade
tick_data['trade_direction'] = 0

# Assign values to the 'trade_direction' by comparing 'midpoint' with 'trade_price'
tick_data.loc[tick_data['midpoint'] <
              tick_data['trade_price'], 'trade_direction'] = 1
tick_data.loc[tick_data['midpoint'] >
              tick_data['trade_price'], 'trade_direction'] = -1

# Print first 5 trades of the 'tick_data' dataframe
tick_data.head()

Unnamed: 0,bid,ask,trade_price,trade_size,tick_direction,midpoint,trade_direction
,,,,,,,
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,17.0,1.0,1428.625,1.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,11.0,1.0,1428.625,1.0
2013-10-02 00:00:19.526400,1428.75,1429.0,1429.0,28.0,1.0,1428.875,1.0
2013-10-02 00:00:19.612800,1428.75,1429.0,1429.0,3.0,1.0,1428.875,1.0
2013-10-02 00:00:24.019199,1428.75,1429.0,1429.0,5.0,1.0,1428.875,1.0


<a id='lrorderflow'></a>
## Calculate the Order Flow

To calculate the order flow using the lee-ready algorithm, as a first step, we will check if the value of the `trade_direction` is non-zero. 
* If the `trade_direction` of a trade is non-zero, the quote rule is applied where `order_flow` would be the product of `trade_direction` and `trade_size`.

* If the `trade_direction` of a trade is zero, the tick rule is applied where `order_flow` would be the product of `tick_direction` and `trade_size`.

Use the `where` method from the NumPy library to calculate the `order_flow`

In [11]:
# Calculating the order flow using the lee-ready algorithm
tick_data['order_flow'] = np.where(tick_data['trade_direction'] != 0, tick_data['trade_direction']
                                   * tick_data['trade_size'], tick_data['tick_direction'] * tick_data['trade_size'])

# Print first 5 trades of the 'tick_data' dataframe
tick_data.head()

Unnamed: 0,bid,ask,trade_price,trade_size,tick_direction,midpoint,trade_direction,order_flow
,,,,,,,,
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,17.0,1.0,1428.625,1.0,17.0
2013-10-02 00:00:19.439999,1428.5,1428.75,1429.0,11.0,1.0,1428.625,1.0,11.0
2013-10-02 00:00:19.526400,1428.75,1429.0,1429.0,28.0,1.0,1428.875,1.0,28.0
2013-10-02 00:00:19.612800,1428.75,1429.0,1429.0,3.0,1.0,1428.875,1.0,3.0
2013-10-02 00:00:24.019199,1428.75,1429.0,1429.0,5.0,1.0,1428.875,1.0,5.0


In [12]:
# Total number of trades
print(f'Total number of trades: {len(tick_data)}')

# Number of trades with undefined order flow
undefined_trades = len(tick_data[tick_data['order_flow'] == 0])
print(f'Total number of trades with undefined order flow: {undefined_trades}')

Total number of trades: 77388
Total number of trades with undefined order flow: 0


Lee-ready algorithm calculated the order flow of all trades in the `tick_data` dataframe and overcame the limitation of quote rule.

## Conclusion

So far, we used bid-ask, trade price and trade volume to calculate the order flow. However, order flow can also be calculated by using just the bar data by Bulk Volume Classification (BVC) method. In the next unit, you will learn the BVC method for the order flow calculation. <br><br>