Final Project: Enhanced Sign Trading Strategy on High Frequency Data

James Chen (12449658), Will Duckett (12446921), Eduardo Scheffer (12449668), Coby Tran (12449760)

### Abstract
This project explores a Sign Trading Strategy that leverages Order Flow Imbalance (OFI) to predict short-term price movements in high-frequency markets. We classify trades into BUY and SELL transactions based on whether they match the standing bid or offer price and analyze their clustering behavior.

By measuring order flow imbalances, we identify periods of persistent buying or selling pressure, allowing us to identify trade opportunities. Our approach capitalizes on the fact that trade events tend to cluster, meaning past BUY trades influence future BUY trades, and vice versa.

### Introduction
The volume of Tick-By-Tick (TBT) data has surged in recent years, with many contracts exceeding 10 million ticks per trading day. This increase provides deeper insight into order flow and reveals systematic trade clustering effects that influence short-term price movements. One key observation is that trades are not independent events; instead, BUY and SELL trades cluster together, creating directional momentum in the market.

This project develops a Sign Trading Strategy that utilizes Order Flow Imbalance (OFI) to detect and exploit these trade clustering patterns. A SELL trade occurs when a market sell order matches the highest bid price, while a BUY trade occurs when a market buy order matches the lowest ask price. When there is an imbalance in the number of BUY versus SELL trades over a given time window, directional market pressure emerges. BUY trade dominance signals upward pressure, while SELL trade dominance signals downward pressure.

OFI quantifies this imbalance, offering a real-time signal for market direction. Since past BUY trades influence future BUY and SELL trades, OFI helps traders anticipate short-term price movements. The rise of high-frequency algorithmic trading has made order flow analysis essential, as market makers adjust bid-ask spreads based on trade clustering, arbitrageurs exploit pricing inefficiencies, and speculators capitalize on directional pressure.

By integrating Sign Trading with OFI, this strategy systematically identifies and responds to trade clustering effects, providing a structured framework to capture short-term market inefficiencies.

### Data
This project focuses on high-frequency trading, requiring highly liquid assets to ensure efficient trade execution and minimize slippage. Liquidity is critical in strategies relying on Order Flow Imbalance (OFI), as it allows for accurate tracking of trade clustering effects without market impact distorting signals.

To capture a range of market conditions, we will select equities from sectors with varying volatilities, allowing us to assess the robustness of our strategy across different levels of market activity. The dataset will be sourced from the WRDS Trade and Quote (TAQ) database, which provides granular tick-by-tick trade and order book data.

The specific equities for analysis have not yet been determined, but selection criteria will prioritize high-volume stocks with active limit order books, ensuring reliable trade execution data. This dataset will allow us to systematically test the effectiveness of Sign Trading with OFI.

In [None]:
import pull_taq
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
NTLS = pd.read_csv('NBBO/28082.csv')
GLP = pd.read_csv('NBBO/27667.csv')
IEF = pd.read_csv('NBBO/23870.csv')
ITUB = pd.read_csv('NBBO/23444.csv')
IIT = pd.read_csv('NBBO/14081.csv')

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
NTLS

In [None]:
NTLS['Timestamp'] = pd.to_datetime(NTLS['Date'].astype(str) + ' ' + NTLS['Time'].astype(str))
NTLS.set_index('Timestamp', inplace=True)

market_open = pd.to_datetime("09:30:00").time()
market_close = pd.to_datetime("16:00:00").time()
NTLS = NTLS[(NTLS.index.time >= market_open) & (NTLS.index.time <= market_close)]

NTLS = NTLS[~NTLS.index.dayofweek.isin([5, 6])]

NTLS['Mid Price'] = (NTLS['Bid Price'] + NTLS['Ask Price']) / 2
NTLS['Size Weighted Price'] = (NTLS['Bid Price'] * NTLS['Ask Size'] + NTLS['Ask Price'] * NTLS['Bid Size']) / (NTLS['Ask Size'] + NTLS['Bid Size'])

NTLS['Market Index'] = range(len(NTLS))
NTLS.reset_index(inplace=True)

def format_xaxis(ax):
    tick_interval = 1200
    tick_positions = NTLS['Market Index'][::tick_interval]
    tick_labels = NTLS['Timestamp'][::tick_interval].dt.strftime('%Y-%m-%d %H:%M')
    ax.set_xticks(tick_positions)
    ax.set_xticklabels(tick_labels)
    plt.xticks(rotation=45)

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(NTLS.index, NTLS['Bid-Ask Spread'], label='Bid-Ask Spread', color='blue', alpha=0.6)
ax.set_xlabel('Time')
ax.set_ylabel('Spread ($)')
ax.set_title('Bid-Ask Spread Over Time (Market Hours Only)')
format_xaxis(ax)
ax.legend()
plt.show()

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(NTLS.index, NTLS['Bid Size'], label='Bid Size', color='green', alpha=0.6)
ax.plot(NTLS.index, NTLS['Ask Size'], label='Ask Size', color='red', alpha=0.6)
ax.set_xlabel('Time')
ax.set_ylabel('Size')
ax.set_title('Market Depth Over Time (Market Hours Only)')
format_xaxis(ax)
ax.legend()
plt.show()

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(NTLS.index, NTLS['Mid Price'], label='Mid Price', color='purple', alpha=0.6)
ax.set_xlabel('Time')
ax.set_ylabel('Mid Price ($)')
ax.set_title('Mid Price Over Time (Market Hours Only)')
format_xaxis(ax)
ax.legend()
plt.show()


In [None]:
X = []
y = []
i = 0
for h in range(10, 15):
    for m in range(60):
        s = str(m)
        if len(s) == 1:
            s = '0' + s
        s = str(h) + ':' + s + ':59.999'
        midprice = NTLS['Mid Price'].iloc[i]
        bid_ask_ratio = np.log(NTLS['Bid Size'].iloc[i] / NTLS['Ask Size'].iloc[i])
        while NTLS['Time'].iloc[i] <= s:
            i += 1
        X.append(bid_ask_ratio)
        y.append(NTLS['Mid Price'].iloc[i] - midprice)
        
plt.scatter(X, y)
a, b = np.polyfit(X, y, 1)
plt.xlabel('log(Bid size/Ask size)')
plt.ylabel('1m midprice change')
plt.plot(X, [a*x+b for x in X], 'r')
plt.show()

print(f"slope of line of best fit: {a}")

In [None]:
X = []
y = []
i = 0
for h in range(10, 15):
    for m in range(60):
        s = str(m)
        if len(s) == 1:
            s = '0' + s
        s = str(h) + ':' + s + ':59.999'
        weightedprice = NTLS['Size Weighted Price'].iloc[i]
        bid_ask_ratio = np.log(NTLS['Bid Size'].iloc[i] / NTLS['Ask Size'].iloc[i])
        while NTLS['Time'].iloc[i] <= s:
            i += 1
        X.append(bid_ask_ratio)
        y.append(NTLS['Size Weighted Price'].iloc[i] - weightedprice)
        
plt.scatter(X, y)
a, b = np.polyfit(X, y, 1)
plt.xlabel('log(Bid size/Ask size)')
plt.ylabel('1m size-weighted price change')
plt.plot(X, [a*x+b for x in X], 'r')
plt.show()

print(f"slope of line of best fit: {a}")

### Strategy
Trade classification is a fundamental component of our Sign Trading with Order Flow Imbalance (OFI) strategy. Since WRDS TAQ data does not explicitly identify limit orders, we determine whether a trade was a BUY or SELL based on its execution relative to the prevailing bid-ask quotes.

A trade is classified as a BUY if it occurs at or near the best ask price, indicating an aggressive market buy order. Conversely, a trade is classified as a SELL if it occurs at or near the best bid price, indicating an aggressive market sell order.

Order Flow Imbalance (OFI) Calculation
Order Flow Imbalance (OFI) measures the net BUY vs. SELL pressure over a time window h, where ΔN represents the number of trades executed:

OFI = ΔNᵀ⁻ʰ,ᵀˢ - ΔNᵀ⁻ʰ,ᵀᵇ / (ΔNᵀ⁻ʰ,ᵀˢ + ΔNᵀ⁻ʰ,ᵀᵇ)
​
 
where:

ΔNᵀ⁻ʰ,ᵀˢ = Number of SELL trades executed in the time window [T - h, T]
ΔNᵀ⁻ʰ,ᵀᵇ = Number of BUY trades executed in the time window [T - h, T]
A positive OFI indicates strong selling pressure, implying a higher probability of price declines, whereas a negative OFI signals strong buying pressure, suggesting upward price momentum.