# LSEG US Equity Flow™ Analytics Demo Notebook

# Unifier Data Warehouse & API Simplify Data Access With A Single Interface

In [1]:
import os
from IPython.display import display

### If you have not installed unifier you can do so by running the following command:

In [2]:
# pip install unifier

## Simply Import and initialize Unifier and Go!

In [3]:
from unifier import unifier

In [None]:
unifier.user = 'unifier username'
unifier.token ='unifier api token'
os.environ['UNIFIER_USER'] = unifier.user
os.environ['UNIFIER_TOKEN'] = unifier.token

##### Replace "unifier username" with your account email and "unifier api token" with your Unifier API token found in the access my data page of the Exponential website.

# LSEG US Equity Flow™ Analytics Documentation

## Overview
The **LSEG US Equity Flow™ Analytics** dataset provides unparalleled insights into US equity market flows with daily granularity and over 15 years of historical coverage. Designed for both fundamental and systematic investors, it enables detailed analysis of market actions and reactions.

### Key Details
- **Product**: LSEG US Equity Flow™ Analytics  
- **Version**: Indigo Panther  
- **Coverage**: US Equities  
- **Delivery Frequency**: Daily and Minutely / 15-minute delayed / Real-time 
- **Delivery Time**: 3 am ET / 15-minute delayed / Real-time  
- **Delivery Method**: Unifier API  
- **Data Frequency**: 1 minute, HourlyDaily, Weekly  
- **Data Size**: 50–200 MB/day  
- **Deep History**: January 1, 2007 to Present (17+ years)  

---

## Datasets: **`lseg_us_equity_flow_daily`** and **`lseg_us_equity_flow_1min`**
### Description
These datasets offer daily and minutely data intervals and capture comprehensive market flow details for the US equity market. The `symbol` field represents tickers, providing insights into market activity on a daily and minutely basis.

### Applications
- **Support/Resistance Identification**: Pinpoint critical levels by symbol.  
- **Market Impact Analysis**: Estimate impact functions overall and by investor type.  
- **Flow Correlation Visualization**: Understand cross-asset flow correlations.  
- **Momentum & Reversal Signals**: Identify actionable market signals.  
- **HFT Behavior Analysis**: Detect patterns in unexplained or curious high-frequency trading behaviors.  
- **Risk Management**: Analyze concentration risk and other systematic factors.  

---

### Field Descriptions

| **Column Name**   | **Data Type** | **Description**                                                                |
|--------------------|---------------|--------------------------------------------------------------------------------|
| `asof_datetime`    | string        | Timestamp indicating the latest moment at which the data would be available in a real-time feed in America/New York time.                                |
| `extended_hours`   | int           | Indicates if the minute bar is during or completing during the extended
hours session (1: Extended Session; 0: Regular Session starts at 09:31:00
NYT and ends at 16:01:00 NYT)                          |
| `ticker`           | string        | Security trading symbol                 |
| `trade_count`      | int           | Number of discrete trades in time period                                                     |
| `dollar_volume`    | double        | Dollar volume of trades in time period period                                                     |
| `m1_inst_buy`      | double        | Buy dollar volume calculated using Method 1 to identify institutional trades.                                  |
| `m1_inst_sell`     | double        | Sell dollar volume calculated using Method 1 to identify institutional trades.                                 |
| `m1_inst_buy_count`     | int           | Buy trade count calculated using Method 1 to identify institutional trades.                                    |
| `m1_inst_sell_count`    | int           | Sell trade volume calculated using Method 1 to identify institutional trades.                                   |
| `m2_inst_buy `     | double        | Buy dollar volume calculated using Method 2 to identify institutional trades.                                  |
| `m2_inst_sell`     | double        | Sell dollar volume calculated using Method 2 to identify institutional trades.                                 |
| `m2_inst_buy_count`     | int           | Buy trade count calculated using Method 2 to identify institutional trades.                                    |
| `m2_inst_sell_count`    | int           | Sell trade count calculated using Method 2 to identify institutional trades.                                   |
| `m3_inst_buy`      | double        | Buy dollar volume calculated using Method 3 to identify institutional trades.                                  |
| `m3_inst_sell`     | double        | Sell dollar volume calculated using Method 3 to identify institutional trades.                                 |
| `m3_inst_buy_count`     | int           | Buy trade count calculated using Method 3 to identify institutional trades.                                    |
| `m3_inst_sell_count`    | int           | Sell trade count calculated using Method 3 to identify institutional trades.                                   |
| `m4_inst_buy`      | double        | Buy dollar volume calculated using Method 4 to identify institutional trades.                                  |
| `m4_inst_sell`     | double        | Sell dollar volume calculated using Method 4 to identify institutional trades.                                 |
| `m4_inst_buy_count`     | int           | Buy trade count calculated using Method 4 to identify institutional trades.                                    |
| `m4_inst_sell_count`    | int           | Sell trade count calculated using Method 4 to identify institutional trades.                                   |
| `m5_inst_buy`      | double        | Buy dollar volume calculated using Method 5 to identify institutional trades.                                |
| `m5_inst_sell`     | double        | Sell dollar volume calculated using Method 5 to identify institutional trades.                               |
| `m5_inst_buy_count`     | int           | Buy trade count calculated using Method 5 to identify institutional trades.                                    |
| `m5_inst_sell_count`    | int           | Sell trade count calculated using Method 5 to identify institutional trades.                                   |
| `retail_buy`       | double        | Buy dollar volume calculated using Method 6 to identify retail trades.                                  |
| `retail_sell`      | double        | Sell dollar volume calculated using Method 6 to identify retail trades.                                 |
| `retail_buy_count `     | int           | Buy trade count calculated using Method 6 to identify retail trades.                                    |
| `retail_sell_count `    | int           | Sell trade count calculated using Method 6 to identify retail trades.                                   |

---

### Why This Dataset is Unique
This is a new dataset, with unprecedented 1 minute granularity and 15 years of history. It offers analysts
the unique ability to distinguish institutional and retail flow. The team that developed this product has
over 20 years of experience in HFT and other systematic strategies across all major asset classes. It
provides near-realtime market flow color comprehensively across the entire US market. Fundamental
and systematic investors alike can utilize this data to interpret market action and reaction to new events
to directly disentangle whether returns are explained by market impact vs changes in expectations based
on new information. In the near futures, extended versions of this data will built upon this foundational
dataset and will provide even greater granularity, flow decomposition and more investor types. 

---

### Potential Use Cases
- Identify critical support/resistance levels by symbol
- Estimate market impact functions overall and by investor type
- Visualize cross-asset flow correlations overall and by investor type
- Identify Momentum Signals
- Identify Reversal Signals
- Overlay on stat arb models to understand when temporary market impact is ending
- Identify what types of players are responsible for unexplained or curious HFT behaviors
observed in other strategies
- Analyze Concentration Risk
- Understand Stock Option Short Gamma Behavior of Dealers
- Other Example Strategies Might Be:
  - Enter after large position changes
  - Enter positions based on price threshold + position increases
  - Enter based on position threshold and exist after a certain period of time
- Identifying large market moving trades as they happen
- Intraday momentum trades
- 13D/13F Announcement Predictions
- M&A Position Tracking
- Index Add/Delete Strategy Tracking
- Closed-End-Fund Arbitrage Strategies
- Open/Closing Auction Imbalance Prediction
- Close-Open Returns
- Open-Close Returns
- Can be combined with XTech Option Flow Analytic to provide complete view of order imbalances each minute

---


## Retrieve data for a specific date using the asof_date parameter

In [5]:
df = unifier.get_dataframe(name="lseg_us_equity_flow_daily", asof_date='2024-01-23', limit=100)
display(df)

Unnamed: 0,asof_datetime,timestamp,includes_extended_hours,ticker,trade_count,dollar_volume,method1_inst_buy,method1_inst_sell,method1_inst_buy_count,method1_inst_sell_count,...,method5_inst_buy,method5_inst_sell,method5_inst_buy_count,method5_inst_sell_count,retail_buy,retail_sell,retail_buy_count,retail_sell_count,date,asof_date
0,2022-01-13 20:00:03.000,2022-01-13 09:31:00.000,0,MCAF,3,1.023371e+04,6822.47,3411.24,2,1,...,5116.85,5116.85,0,0,0.00,0.00,0,0,2022-01-13,2022-01-13
1,2022-01-13 20:00:03.000,2022-01-13 09:44:00.000,0,RMGCU,3,2.772000e+02,0.00,277.20,0,3,...,138.60,138.60,0,0,0.00,0.00,0,0,2022-01-13,2022-01-13
2,2022-01-13 20:00:03.000,2022-01-13 10:31:00.000,0,FWAC,1,9.825000e+00,4.91,4.91,0,0,...,4.91,4.91,0,0,0.00,0.00,0,0,2022-01-13,2022-01-13
3,2022-01-13 20:00:03.000,2022-01-13 10:32:00.000,0,BRLI,3,3.068000e+01,5.12,25.57,0,2,...,10.23,20.46,0,1,0.00,0.00,0,0,2022-01-13,2022-01-13
4,2022-01-13 20:00:03.000,2022-01-13 10:39:00.000,0,GHACU,7,6.930060e+03,0.00,3960.03,0,7,...,3465.03,495.00,0,0,0.00,2970.03,0,3,2022-01-13,2022-01-13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2022-01-13 20:00:03.000,2022-01-13 16:00:00.000,0,DAX,76,6.531154e+05,168873.01,483047.10,28,48,...,105886.22,546033.89,13,63,1195.25,0.00,2,0,2022-01-13,2022-01-13
96,2022-01-13 20:00:03.000,2022-01-13 16:00:00.000,1,DCTH,334,3.493620e+05,229769.87,105675.85,147,169,...,165780.84,169664.91,9,9,10507.44,3408.88,8,7,2022-01-13,2022-01-13
97,2022-01-13 20:00:03.000,2022-01-13 16:00:00.000,0,DHCNI,163,5.587572e+05,301046.79,228906.92,93,69,...,300432.52,229521.20,76,76,9823.00,18980.51,4,4,2022-01-13,2022-01-13
98,2022-01-13 20:00:03.000,2022-01-13 16:00:00.000,1,DIBS,1850,1.575766e+06,730261.83,759547.13,850,939,...,535633.96,954174.99,267,966,48753.12,37203.43,60,28,2022-01-13,2022-01-13


In [6]:
df = unifier.get_dataframe(name="lseg_us_equity_flow_1min", asof_date='2024-01-23', limit=100)
display(df)

Unnamed: 0,asof_datetime,timestamp,extended_hours,ticker,trade_count,dollar_volume,method1_inst_buy,method1_inst_sell,method1_inst_buy_count,method1_inst_sell_count,...,method5_inst_buy,method5_inst_sell,method5_inst_buy_count,method5_inst_sell_count,retail_buy,retail_sell,retail_buy_count,retail_sell_count,date,asof_date
0,2023-01-19 04:01:03.000,2023-01-19 04:01:00.000,1,APRN,1,0.9298,0.93,0.00,1,0,...,0.46,0.46,0,0,0,0,0,0,2023-01-19,2023-01-19
1,2023-01-19 04:01:03.000,2023-01-19 04:01:00.000,1,ARKK,8,75170.2300,75170.23,0.00,8,0,...,37585.12,37585.12,0,0,0,0,0,0,2023-01-19,2023-01-19
2,2023-01-19 04:01:03.000,2023-01-19 04:01:00.000,1,ARVL,16,899.1000,899.10,0.00,16,0,...,730.52,168.58,10,0,0,0,0,0,2023-01-19,2023-01-19
3,2023-01-19 04:01:03.000,2023-01-19 04:01:00.000,1,BTU,6,7144.2800,3572.14,3572.14,3,3,...,4762.85,2381.43,4,2,0,0,0,0,2023-01-19,2023-01-19
4,2023-01-19 04:01:03.000,2023-01-19 04:01:00.000,1,CTXR,8,329.8700,41.23,288.64,1,7,...,164.94,164.94,0,0,0,0,0,0,2023-01-19,2023-01-19
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2023-01-19 04:05:03.000,2023-01-19 04:05:00.000,1,GOOGL,6,12263.4600,3065.86,9197.59,1,4,...,6131.73,6131.73,0,0,0,0,0,0,2023-01-19,2023-01-19
96,2023-01-19 04:05:03.000,2023-01-19 04:05:00.000,1,HOOD,1,18.5800,18.58,0.00,1,0,...,9.29,9.29,0,0,0,0,0,0,2023-01-19,2023-01-19
97,2023-01-19 04:05:03.000,2023-01-19 04:05:00.000,1,MEGL,4,1454.2600,1454.26,0.00,4,0,...,727.13,727.13,0,0,0,0,0,0,2023-01-19,2023-01-19
98,2023-01-19 04:05:03.000,2023-01-19 04:05:00.000,1,NUZE,19,18360.8800,966.36,17394.52,1,18,...,11596.35,6764.53,12,7,0,0,0,0,2023-01-19,2023-01-19


## Retrieve data for a specific date range using the back_to and up_to parameters


In [7]:
df = unifier.get_dataframe(name='lseg_us_equity_flow_daily', back_to='2024-01-01', up_to='2024-02-01', limit=100)
display(df)

Unnamed: 0,asof_datetime,timestamp,includes_extended_hours,ticker,trade_count,dollar_volume,method1_inst_buy,method1_inst_sell,method1_inst_buy_count,method1_inst_sell_count,...,method5_inst_buy,method5_inst_sell,method5_inst_buy_count,method5_inst_sell_count,retail_buy,retail_sell,retail_buy_count,retail_sell_count,date,asof_date
0,2024-01-31 20:00:03.000,2024-01-31 08:02:00.000,1,ZVV,4,7787.4400,3893.72,3893.72,0,0,...,3893.72,3893.72,0,0,0.00,0.00,0,0,2024-01-31,2024-01-31
1,2024-01-31 20:00:03.000,2024-01-31 09:31:00.000,0,MITAU,24,46118.0000,0.00,46118.00,0,24,...,3843.17,42274.83,0,20,0.00,0.00,0,0,2024-01-31,2024-01-31
2,2024-01-31 20:00:03.000,2024-01-31 09:32:00.000,0,BYNOW,9,376.1100,0.00,376.11,0,9,...,188.06,188.06,0,0,0.00,0.00,0,0,2024-01-31,2024-01-31
3,2024-01-31 20:00:03.000,2024-01-31 09:36:00.000,0,BCSAW,2,1259.6446,0.00,1259.64,0,2,...,629.82,629.82,0,0,0.00,0.00,0,0,2024-01-31,2024-01-31
4,2024-01-31 20:00:03.000,2024-01-31 10:07:00.000,0,SVIIR,4,3527.1100,2645.33,881.78,3,1,...,1763.56,1763.56,0,0,0.00,0.00,0,0,2024-01-31,2024-01-31
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2024-01-31 20:00:03.000,2024-01-31 16:00:00.000,0,RILYP,93,140637.9205,44984.01,92112.64,38,55,...,106435.11,30661.54,67,26,2418.93,1122.34,2,2,2024-01-31,2024-01-31
96,2024-01-31 20:00:03.000,2024-01-31 16:00:00.000,0,SGMA,66,33880.3806,14356.63,7395.20,39,26,...,11458.74,10293.08,2,1,10183.16,1945.38,4,3,2024-01-31,2024-01-31
97,2024-01-31 20:00:03.000,2024-01-31 16:00:00.000,1,SIEB,112,9472.4463,2806.07,5144.71,67,39,...,3958.61,3992.16,0,1,1511.55,10.13,1,2,2024-01-31,2024-01-31
98,2024-01-31 20:00:03.000,2024-01-31 16:00:00.000,0,SONDW,5,22.8237,15.21,5.97,2,3,...,11.41,9.76,0,0,0.00,1.65,0,1,2024-01-31,2024-01-31


In [8]:
df = unifier.get_dataframe(name='lseg_us_equity_flow_1min', back_to='2024-01-01', up_to='2024-02-01', limit=100)
display(df)

Unnamed: 0,asof_datetime,timestamp,extended_hours,ticker,trade_count,dollar_volume,method1_inst_buy,method1_inst_sell,method1_inst_buy_count,method1_inst_sell_count,...,method5_inst_buy,method5_inst_sell,method5_inst_buy_count,method5_inst_sell_count,retail_buy,retail_sell,retail_buy_count,retail_sell_count,date,asof_date
0,2024-01-12 04:01:03.000,2024-01-12 04:01:00.000,1,ACHR,6,459.98,0.00,459.98,0,6,...,229.99,229.99,0,0,0,0,0,0,2024-01-12,2024-01-12
1,2024-01-12 04:01:03.000,2024-01-12 04:01:00.000,1,BABA,23,68608.72,11931.95,56676.77,4,19,...,34304.36,34304.36,1,1,0,0,0,0,2024-01-12,2024-01-12
2,2024-01-12 04:01:03.000,2024-01-12 04:01:00.000,1,BCS,2,1091.09,0.00,1091.09,0,2,...,545.54,545.54,0,0,0,0,0,0,2024-01-12,2024-01-12
3,2024-01-12 04:01:03.000,2024-01-12 04:01:00.000,1,BITF,10,9071.19,8164.07,907.12,9,1,...,4535.60,4535.60,0,0,0,0,0,0,2024-01-12,2024-01-12
4,2024-01-12 04:01:03.000,2024-01-12 04:01:00.000,1,BTBT,8,22312.75,2789.09,19523.66,1,7,...,11156.38,11156.38,0,0,0,0,0,0,2024-01-12,2024-01-12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2024-01-12 04:04:03.000,2024-01-12 04:04:00.000,1,GBTC,1,163.60,163.60,0.00,1,0,...,0.00,163.60,0,1,0,0,0,0,2024-01-12,2024-01-12
96,2024-01-12 04:04:03.000,2024-01-12 04:04:00.000,1,HUT,1,11.03,0.00,11.03,0,1,...,11.03,0.00,1,0,0,0,0,0,2024-01-12,2024-01-12
97,2024-01-12 04:04:03.000,2024-01-12 04:04:00.000,1,MLEC,96,11257.16,2227.98,9029.18,17,75,...,5452.69,5804.47,0,3,0,0,0,0,2024-01-12,2024-01-12
98,2024-01-12 04:04:03.000,2024-01-12 04:04:00.000,1,MSFT,8,9225.56,4612.78,4612.78,4,4,...,9225.56,0.00,8,0,0,0,0,0,2024-01-12,2024-01-12


## Retrieve data for a specific date with a specific ticker (key='ticker') using the asof_date parameter

In [9]:
df = unifier.get_dataframe(name='lseg_us_equity_flow_daily',key='AAPL', asof_date='2024-01-01', limit=100)
display(df)

Unnamed: 0,asof_datetime,timestamp,includes_extended_hours,ticker,trade_count,dollar_volume,method1_inst_buy,method1_inst_sell,method1_inst_buy_count,method1_inst_sell_count,...,method5_inst_buy,method5_inst_sell,method5_inst_buy_count,method5_inst_sell_count,retail_buy,retail_sell,retail_buy_count,retail_sell_count,date,asof_date
0,2022-02-03 20:00:03.000,2022-02-03 20:00:00.000,1,AAPL,721432,1.511118e+10,6.490541e+09,7.552551e+09,351868,367781,...,5.949665e+09,8.093427e+09,273816,447616,5.888254e+08,4.792656e+08,33643,27719,2022-02-03,2022-02-03
1,2022-07-28 20:00:03.000,2022-07-28 20:00:00.000,1,AAPL,724244,1.245526e+10,6.130130e+09,5.098118e+09,382878,339353,...,7.450229e+09,3.778019e+09,453580,270653,7.604271e+08,4.665845e+08,53739,29574,2022-07-28,2022-07-28
2,2022-08-19 20:00:03.000,2022-08-19 20:00:00.000,1,AAPL,562557,1.177423e+10,4.679662e+09,5.435149e+09,275583,286413,...,3.948097e+09,6.166714e+09,243350,319163,1.112364e+09,5.470561e+08,65024,28565,2022-08-19,2022-08-19
3,2022-08-09 20:00:03.000,2022-08-09 20:00:00.000,1,AAPL,472765,9.496666e+09,4.599126e+09,3.672822e+09,247354,224623,...,4.638737e+09,3.633211e+09,304584,168181,8.166409e+08,4.080775e+08,50500,23131,2022-08-09,2022-08-09
4,2022-07-27 20:00:03.000,2022-07-27 20:00:00.000,1,AAPL,569107,1.171313e+10,5.386154e+09,5.064410e+09,309423,257240,...,6.313486e+09,4.137078e+09,419894,149209,7.565323e+08,5.060341e+08,46830,28605,2022-07-27,2022-07-27
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2023-01-25 20:00:03.000,2023-01-25 20:00:00.000,1,AAPL,532498,8.736258e+09,3.856750e+09,3.691151e+09,279801,251919,...,4.402354e+09,3.145546e+09,337373,193586,7.524067e+08,4.359510e+08,58370,27597,2023-01-25,2023-01-25
96,2023-11-15 20:00:03.000,2023-11-15 20:00:00.000,1,AAPL,556096,9.531350e+09,4.733168e+09,3.968774e+09,327140,227672,...,5.652683e+09,3.049259e+09,353209,201787,4.083944e+08,4.210134e+08,25289,26229,2023-11-15,2023-11-15
97,2023-10-23 20:00:03.000,2023-10-23 20:00:00.000,1,AAPL,633049,8.981178e+09,4.635545e+09,3.628542e+09,388222,243094,...,4.849949e+09,3.414139e+09,430433,202436,4.155547e+08,3.015356e+08,27549,19713,2023-10-23,2023-10-23
98,2023-11-13 20:00:03.000,2023-11-13 20:00:00.000,1,AAPL,522511,7.388626e+09,3.363324e+09,3.296622e+09,293893,226548,...,6.241047e+08,6.035842e+09,47548,474963,4.213336e+08,3.073463e+08,28867,19698,2023-11-13,2023-11-13


In [10]:
df = unifier.get_dataframe(name='lseg_us_equity_flow_1min',key='AAPL', asof_date='2024-01-01', limit=100)
display(df)

Unnamed: 0,asof_datetime,timestamp,extended_hours,ticker,trade_count,dollar_volume,method1_inst_buy,method1_inst_sell,method1_inst_buy_count,method1_inst_sell_count,...,method5_inst_buy,method5_inst_sell,method5_inst_buy_count,method5_inst_sell_count,retail_buy,retail_sell,retail_buy_count,retail_sell_count,date,asof_date
0,2013-08-26 04:02:03.000,2013-08-26 04:02:00.000,1,AAPL,2,2.192112e+05,219211.20,0.00,2,0,...,219211.20,0.00,2,0,0.00,0.00,0,0,2013-08-26,2013-08-26
1,2013-08-26 04:03:03.000,2013-08-26 04:03:00.000,1,AAPL,2,1.007500e+05,100750.00,0.00,2,0,...,100750.00,0.00,2,0,0.00,0.00,0,0,2013-08-26,2013-08-26
2,2013-08-26 04:16:03.000,2013-08-26 04:16:00.000,1,AAPL,1,1.395027e+05,0.00,139502.74,0,1,...,69751.37,69751.37,0,0,0.00,0.00,0,0,2013-08-26,2013-08-26
3,2013-08-26 04:31:03.000,2013-08-26 04:31:00.000,1,AAPL,2,1.005860e+05,50293.00,50293.00,1,1,...,0.00,100586.00,0,2,0.00,0.00,0,0,2013-08-26,2013-08-26
4,2013-08-26 05:14:03.000,2013-08-26 05:14:00.000,1,AAPL,2,2.515000e+05,0.00,251500.00,0,2,...,251500.00,0.00,2,0,0.00,0.00,0,0,2013-08-26,2013-08-26
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2013-08-26 09:42:03.000,2013-08-26 09:42:00.000,0,AAPL,406,3.114275e+07,17182207.86,11275823.91,239,167,...,28458031.77,0.00,386,20,1150594.28,1534125.70,15,20,2013-08-26,2013-08-26
96,2013-08-26 09:43:03.000,2013-08-26 09:43:00.000,0,AAPL,335,2.664336e+07,11929860.70,12486587.53,164,171,...,24336915.82,79532.41,320,15,1113453.66,1113453.66,14,14,2013-08-26,2013-08-26
97,2013-08-26 09:44:03.000,2013-08-26 09:44:00.000,0,AAPL,453,4.171900e+07,24957723.38,15103566.92,281,172,...,32877886.52,7183403.78,367,86,920949.20,736759.36,10,8,2013-08-26,2013-08-26
98,2013-08-26 09:45:03.000,2013-08-26 09:45:00.000,0,AAPL,286,2.389829e+07,12617628.04,9860133.18,161,125,...,15040881.10,7436880.11,190,96,835604.51,584923.15,10,7,2013-08-26,2013-08-26


## Retrieve data for a specific date range with a specific ticker (key='ticker') using the back_to and up_to parameters


In [11]:
df = unifier.get_dataframe(name='lseg_us_equity_flow_daily',key='AAPL', back_to='2024-01-01', up_to='2024-02-01', limit=100)
display(df)

Unnamed: 0,asof_datetime,timestamp,includes_extended_hours,ticker,trade_count,dollar_volume,method1_inst_buy,method1_inst_sell,method1_inst_buy_count,method1_inst_sell_count,...,method5_inst_buy,method5_inst_sell,method5_inst_buy_count,method5_inst_sell_count,retail_buy,retail_sell,retail_buy_count,retail_sell_count,date,asof_date
0,2024-01-18 20:00:03.000,2024-01-18 20:00:00.000,1,AAPL,777009,14074120000.0,6525488000.0,6259469000.0,444246,329015,...,6136234000.0,6648724000.0,461588,313287,697219800.0,591946800.0,39269,31654,2024-01-18,2024-01-18
1,2024-01-02 20:00:03.000,2024-01-02 20:00:00.000,1,AAPL,990769,14124460000.0,6623136000.0,6059647000.0,542285,446477,...,5313102000.0,7369681000.0,246488,744247,936135300.0,505543300.0,69922,36466,2024-01-02,2024-01-02
2,2024-01-03 20:00:03.000,2024-01-03 20:00:00.000,1,AAPL,646672,9850288000.0,4962202000.0,3895633000.0,384971,260417,...,5887228000.0,2970608000.0,337365,309307,532395200.0,460056400.0,36499,31646,2024-01-03,2024-01-03
3,2024-01-04 20:00:03.000,2024-01-04 20:00:00.000,1,AAPL,704320,11606130000.0,5085719000.0,5255783000.0,410959,291530,...,3824973000.0,6516528000.0,285462,418792,852138200.0,412488400.0,52306,25296,2024-01-04,2024-01-04
4,2024-01-08 20:00:03.000,2024-01-08 20:00:00.000,1,AAPL,659243,10293350000.0,5566881000.0,3683946000.0,432591,223277,...,7146960000.0,2103867000.0,426813,231163,652208600.0,390315400.0,45879,26647,2024-01-08,2024-01-08
5,2024-01-05 20:00:03.000,2024-01-05 20:00:00.000,1,AAPL,675347,10732440000.0,5070129000.0,4547816000.0,394186,277242,...,3601071000.0,6016874000.0,349875,324250,712155900.0,402334900.0,46270,25440,2024-01-05,2024-01-05
6,2024-01-17 20:00:03.000,2024-01-17 20:00:00.000,1,AAPL,585485,8177436000.0,4069209000.0,3347815000.0,350633,232777,...,4263993000.0,3153031000.0,301896,283518,465238600.0,295173300.0,33131,20739,2024-01-17,2024-01-17
7,2024-01-24 20:00:03.000,2024-01-24 20:00:00.000,1,AAPL,587964,9464511000.0,4085833000.0,4527814000.0,329297,256206,...,2184623000.0,6429024000.0,217941,370023,429731700.0,421132300.0,27663,27274,2024-01-24,2024-01-24
8,2024-01-09 20:00:03.000,2024-01-09 20:00:00.000,1,AAPL,530846,7457582000.0,4200570000.0,2515406000.0,348168,182397,...,5819325000.0,896650600.0,453101,77735,423238100.0,318368000.0,32096,23905,2024-01-09,2024-01-09
9,2024-01-10 20:00:03.000,2024-01-10 20:00:00.000,1,AAPL,547961,8236493000.0,4186066000.0,3235051000.0,347582,199699,...,5489185000.0,1931931000.0,394720,153237,496595800.0,318780500.0,33865,20667,2024-01-10,2024-01-10


In [12]:
df = unifier.get_dataframe(name='lseg_us_equity_flow_1min',key='AAPL', back_to='2024-01-01', up_to='2024-02-01', limit=100)
display(df)

Unnamed: 0,asof_datetime,timestamp,extended_hours,ticker,trade_count,dollar_volume,method1_inst_buy,method1_inst_sell,method1_inst_buy_count,method1_inst_sell_count,...,method5_inst_buy,method5_inst_sell,method5_inst_buy_count,method5_inst_sell_count,retail_buy,retail_sell,retail_buy_count,retail_sell_count,date,asof_date
0,2024-01-26 04:01:03.000,2024-01-26 04:01:00.000,1,AAPL,46,107649.18,39783.39,67865.79,17,29,...,47974.09,59675.09,0,5,0,0,0,0,2024-01-26,2024-01-26
1,2024-01-26 04:02:03.000,2024-01-26 04:02:00.000,1,AAPL,26,156397.20,78198.60,78198.60,12,12,...,78198.60,78198.60,0,0,0,0,0,0,2024-01-26,2024-01-26
2,2024-01-26 04:03:03.000,2024-01-26 04:03:00.000,1,AAPL,30,88152.24,19099.65,69052.59,6,23,...,44076.12,44076.12,0,0,0,0,0,0,2024-01-26,2024-01-26
3,2024-01-26 04:04:03.000,2024-01-26 04:04:00.000,1,AAPL,14,60497.49,12963.75,47533.74,3,11,...,30248.75,30248.75,0,0,0,0,0,0,2024-01-26,2024-01-26
4,2024-01-26 04:05:03.000,2024-01-26 04:05:00.000,1,AAPL,26,156928.33,93553.43,63374.90,15,10,...,78464.16,78464.16,0,0,0,0,0,0,2024-01-26,2024-01-26
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2024-01-26 05:37:03.000,2024-01-26 05:37:00.000,1,AAPL,8,3685.76,0.00,3685.76,0,8,...,1842.88,1842.88,4,4,0,0,0,0,2024-01-26,2024-01-26
96,2024-01-26 05:38:03.000,2024-01-26 05:38:00.000,1,AAPL,4,1551.83,0.00,1551.83,0,4,...,387.96,1163.87,1,3,0,0,0,0,2024-01-26,2024-01-26
97,2024-01-26 05:39:03.000,2024-01-26 05:39:00.000,1,AAPL,5,4656.37,931.27,3725.10,1,4,...,4656.37,0.00,5,0,0,0,0,0,2024-01-26,2024-01-26
98,2024-01-26 05:40:03.000,2024-01-26 05:40:00.000,1,AAPL,6,13581.75,6790.88,6790.88,3,3,...,9054.50,4527.25,4,2,0,0,0,0,2024-01-26,2024-01-26


In [13]:
import pandas as pd

In [14]:
def plot_flow(
    flow_df,
    price_df,
    method="method5_inst",
    title="Flow Visualization",
    zscore_window=60,
    start_date=None,
    end_date=None,
    show_fig=True,
    use_deviation=False,
    short_window=10,
    long_window=60,
):
    """
    Plot institutional flow visualization.
    For min flow, index should be timestamp.
    For daily flow, index should be date.
    """
    import pandas as pd
    import numpy as np
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots

    # Ensure datetime index
    flow_df.index = pd.to_datetime(flow_df.index)
    flow_df = flow_df.copy()
    
    price_df = price_df.copy()
    price_df.index = pd.to_datetime(price_df.index)

    # Filter by date
    if start_date:
        start_date = pd.to_datetime(start_date)
        flow_df = flow_df[flow_df.index >= start_date]
        price_df = price_df[price_df.index >= start_date]
    if end_date:
        end_date = pd.to_datetime(end_date)
        flow_df = flow_df[flow_df.index <= end_date]
        price_df = price_df[price_df.index <= end_date]

    # Align to shared datetime range
    min_timestamp = max(flow_df.index.min(), price_df.index.min())
    max_timestamp = min(flow_df.index.max(), price_df.index.max())
    flow_df = flow_df[(flow_df.index >= min_timestamp) & (flow_df.index <= max_timestamp)]
    price_df = price_df[(price_df.index >= min_timestamp) & (price_df.index <= max_timestamp)]

    # Apply deviation logic if needed
    flow_df[f'{method}_netflow'] = flow_df[f'{method}_buy'] - flow_df[f'{method}_sell']
    if use_deviation:
        short_ma = flow_df[f'{method}_netflow'].rolling(window=short_window).mean()
        long_ma = flow_df[f'{method}_netflow'].rolling(window=long_window).mean()
        flow_df[f'{method}_netflow'] = short_ma - long_ma
        flow_df.dropna(inplace=True)

    # Calculate z-score
    rolling_mean = flow_df[f'{method}_netflow'].rolling(zscore_window).mean()
    rolling_std = flow_df[f'{method}_netflow'].rolling(zscore_window).std().replace(0, np.nan)
    flow_df['zscore'] = (flow_df[f'{method}_netflow'] - rolling_mean) / rolling_std

    # Format datetime for category plotting
    flow_df['date_str'] = flow_df.index.astype(str)

    # Forward-fill price across flow timestamps so they share the same x-axis
    price_aligned = flow_df[['date_str']].copy()
    price_aligned['price'] = price_df['close'].reindex(flow_df.index, method='ffill')

    # Set up plot
    fig = make_subplots(
        rows=3, cols=1,
        shared_xaxes=True,
        row_heights=[0.5, 0.25, 0.25],
        vertical_spacing=0.08,
        specs=[[{"secondary_y": True}], [{}], [{}]]
    )

    # Row 1: Cumulative net flow
    name = "Cumulative NetFlow" if not use_deviation else "Cumulative NetFlow (detrended)"
    fig.add_trace(
        go.Scatter(
            x=flow_df['date_str'],
            y=flow_df[f'{method}_netflow'].cumsum(),
            name=name,
            line=dict(color='blue'),
            fill='tozeroy',
            fillcolor='rgba(0, 0, 255, 0.1)'
        ),
        row=1, col=1,
        secondary_y=False
    )

    # Price line (aligned to flow timestamps)
    fig.add_trace(
        go.Scatter(
            x=price_aligned['date_str'],
            y=price_aligned['price'],
            name="Price",
            line=dict(color='black'),
        ),
        row=1, col=1,
        secondary_y=True
    )

    # Row 2: Inst buy and sell
    fig.add_trace(
        go.Scatter(
            x=flow_df['date_str'],
            y=flow_df[f'{method}_buy'],
            name="Inst Buy",
            line=dict(color='green'),
            opacity=0.3
        ),
        row=2, col=1
    )
    fig.add_trace(
        go.Scatter(
            x=flow_df['date_str'],
            y=-flow_df[f'{method}_sell'],
            name="Inst Sell",
            line=dict(color='red'),
            opacity=0.3
        ),
        row=2, col=1
    )

    # Row 3: Z-score
    fig.add_trace(
        go.Scatter(
            x=flow_df['date_str'],
            y=flow_df['zscore'],
            name=f"Z-Score (window={zscore_window})",
            line=dict(color='#1E88E5')
        ),
        row=3, col=1
    )
    for level in [-2, 2]:
        fig.add_hline(y=level, line=dict(color="gray", dash="dash"), row=3, col=1)

    # Layout
    fig.update_layout(
        template="plotly_white",
        width=1200,
        height=900,
        images=[
            dict(
                source="Exponential-Title-Wide.png",
                xref="paper",
                yref="paper",
                x=0.5,
                y=0.5,
                sizex=0.8,
                sizey=0.5,
                xanchor="center",
                yanchor="middle",
                sizing="contain",
                opacity=0.12,
                layer="below",
            )
        ],
        title=dict(text=title, x=0.5, font=dict(size=20)),
        font=dict(size=14),
        legend=dict(orientation="h", y=1.02, x=1, xanchor="right", yanchor="bottom"),
        hovermode="x unified"
    )

    # X-axis: category mode to avoid gaps
# Reduce number of tick labels by sampling every Nth point
    max_ticks = 10
    all_ticks = flow_df['date_str'].tolist()
    tick_step = max(1, len(all_ticks) // max_ticks)
    sparse_ticks = all_ticks[::tick_step]

    fig.update_xaxes(type='category', tickangle=45, tickvals=sparse_ticks, row=1, col=1)
    fig.update_xaxes(type='category', tickangle=45, tickvals=sparse_ticks, row=2, col=1)
    fig.update_xaxes(title="Datetime", type='category', tickangle=45, tickvals=sparse_ticks, row=3, col=1)


    # Y-axes
    fig.update_yaxes(title="Net Flow (cumsum)", row=1, col=1, secondary_y=False)
    fig.update_yaxes(title="Price", row=1, col=1, secondary_y=True)
    fig.update_yaxes(title="Buy / Sell", row=2, col=1)
    fig.update_yaxes(title="Z-Score", row=3, col=1)

    if show_fig:
        fig.show()

    return fig


### Min flow data

In [15]:
# pip install yfinance

In [16]:
import pandas as pd
import yfinance as yf
from datetime import datetime, timedelta

# Set fixed 1-week window starting 15 days ago
start_date = datetime.today() - timedelta(days=15)
end_date = start_date + timedelta(days=7)

start_str = start_date.strftime('%Y-%m-%d')
end_str = end_date.strftime('%Y-%m-%d')

# === Load 1-minute equity flow data from Unifier ===
flow_df = unifier.get_dataframe(
    name='lseg_us_equity_flow_1min',
    key='AAPL',
    back_to=start_str,
    up_to=end_str
)
flow_df.set_index('timestamp', inplace=True)
flow_df.sort_index(inplace=True)
flow_df.index = pd.to_datetime(flow_df.index)

# Filter to regular US market hours: 9:30 AM to 4:00 PM
flow_df = flow_df[(flow_df.index.hour >= 9) & (flow_df.index.hour <= 16)]
flow_df = flow_df[~((flow_df.index.hour == 9) & (flow_df.index.minute < 30))]

# === Load 1-minute price data from yfinance ===
price_df = yf.download(
    'AAPL',
    interval='1m',
    start=start_str,
    end=end_str,
    progress=False
)
price_df = price_df[['Close']].rename(columns={'Close': 'close'})
price_df.index = pd.to_datetime(price_df.index)
price_df.sort_index(inplace=True)

# Optional: match flow_df market hours

# Strip timezone from price_df (make it tz-naive)
price_df.index = price_df.index.tz_convert('US/Eastern')

# flow data is probably naive (no timezone) → localize to US/Eastern
flow_df.index = flow_df.index.tz_localize('US/Eastern')

# === Align to shared datetime range ===

# Filter to regular market hours: 9:30 AM to just before 4:00 PM
price_df = price_df[(price_df.index.hour > 9) & (price_df.index.hour < 16) |
                    ((price_df.index.hour == 9) & (price_df.index.minute >= 30))]

flow_df = flow_df[(flow_df.index.hour > 9) & (flow_df.index.hour < 16) |
                  ((flow_df.index.hour == 9) & (flow_df.index.minute >= 30))]

min_timestamp = max(flow_df.index.min(), price_df.index.min())
max_timestamp = min(flow_df.index.max(), price_df.index.max())

flow_df = flow_df[(flow_df.index >= min_timestamp) & (flow_df.index <= max_timestamp)]
price_df = price_df[(price_df.index >= min_timestamp) & (price_df.index <= max_timestamp)]


  price_df = yf.download(


In [17]:
fig = plot_flow(flow_df,price_df,method="method5_inst")

### daily flow

In [18]:
# pip install yfinance

In [19]:
# load equity flow data
flow_df = unifier.get_dataframe(name='lseg_us_equity_flow_daily',key='AAPL', back_to='2022-04-05', up_to='2025-07-11')
flow_df.set_index('date', inplace=True)
flow_df.sort_index(inplace=True)
flow_df.index=pd.to_datetime(flow_df.index)

# load daily price data from yfinance
import yfinance as yf
price_df = yf.download('AAPL', start='2022-04-05', end='2025-07-12')
price_df = price_df[['Close']].rename(columns={'Close': 'close'})
price_df.columns=['close']
price_df.sort_index(inplace=True)
price_df.index=pd.to_datetime(price_df.index)

# align price and flow
common_dates = sorted(list(set(flow_df.index).intersection(set(price_df.index))))
flow_df = flow_df.loc[common_dates]
price_df = price_df.loc[common_dates]





YF.download() has changed argument auto_adjust default to True

[*********************100%***********************]  1 of 1 completed


In [20]:
fig = plot_flow(flow_df,price_df,method="method5_inst",zscore_window=20)

### LSEG US Equity Flow 1min Sample Query

In [None]:
import os
import time
import pandas as pd
from unifier import unifier


# ---- CONFIG ----
dataset_name = 'lseg_us_equity_flow_1min'


# ---- USER INPUT ----
# Set exact date range (YYYY-MM-DD)
back_to_date = "2024-03-01"   # start date
up_to_date   = "2024-03-05"   # end date


base_directory = os.getcwd()
output_folder = os.path.join(base_directory, dataset_name)
os.makedirs(output_folder, exist_ok=True)


# ---- TIMER START ----
start_time = time.time()


print(f"Querying data from {back_to_date} to {up_to_date}")


try:
   df = unifier.get_dataframe(
       name=dataset_name,
       key=None,
       asof_date=None,
       back_to=back_to_date,
       up_to=up_to_date,
       limit=None
   )
  
   if df is None or df.empty:
       print("⚠ No data returned for the given date range")
   else:
       if "timestamp" in df.columns:
           df = df.sort_values("timestamp")
      
       # ---- SAVE AS CSV ----
       filename = f"{dataset_name}_{back_to_date}_to_{up_to_date}.csv"
       filepath = os.path.join(output_folder, filename)
      
       df.to_csv(filepath, index=False)
       print(f"✓ Saved {len(df)} rows to {filename}")
      
       # ---- OUTPUT LINK ----
       abs_path = os.path.abspath(filepath)
       file_url = f"file://{abs_path}"
       print(f"🔗 CSV file link: {file_url}")


except Exception as e:
   print(f"✗ Error querying data: {str(e)}")


# ---- TIMER END ----
elapsed = time.time() - start_time
print(f"\nAll files saved to: {output_folder}")
print(f"⏱ Total runtime: {elapsed:.2f} seconds ({elapsed/60:.2f} minutes)")


Querying data from 2024-03-01 to 2024-03-02
✓ Saved 2345856 rows to lseg_us_equity_flow_1min_2024-03-01_to_2024-03-02.csv
🔗 CSV file link: file:///Users/wyattcanderson06/Desktop/Entitlements/lseg_us_equity_flow_1min/lseg_us_equity_flow_1min_2024-03-01_to_2024-03-02.csv

All files saved to: /Users/wyattcanderson06/Desktop/Entitlements/lseg_us_equity_flow_1min
⏱ Total runtime: 135.78 seconds (2.26 minutes)


##### When this query is run, it will download the LSEG US Equity Flow dataset for the specified date range and save it as a CSV file in the specified output folder. Query is set to download the dataset for the date range of 2020-03-01 to 2020-03-05. We recommend downloading four days at a time so we don't overload the system. Change the back_to_date and up_to_date variables to download a different date range.

### LSEG US Equity Flow Daily Sample Query

In [None]:
import os
import time
import pandas as pd
from datetime import datetime
from unifier import unifier  # make sure this is imported if not already


# ---- CONFIG ----
dataset_name = 'lseg_us_equity_flow_daily'


# ✅ Set the dates you want to query here
back_to_date = "2022-01-01"
up_to_date   = "2025-01-01"


base_directory = os.getcwd()
output_folder = os.path.join(base_directory, dataset_name)
os.makedirs(output_folder, exist_ok=True)


# ---- TIMER START ----
start_time = time.time()


print(f"Querying data from {back_to_date} to {up_to_date}")


try:
   df = unifier.get_dataframe(
       name=dataset_name,
       key=None,
       asof_date=None,
       back_to=back_to_date,
       up_to=up_to_date,
       limit=None
   )
  
   if df is None or df.empty:
       print(f"⚠ No data returned between {back_to_date} and {up_to_date}")
   else:
       if "timestamp" in df.columns:
           df = df.sort_values("timestamp")


       # ---- SAVE AS CSV ----
       filename = f"{dataset_name}_{back_to_date}_to_{up_to_date}.csv"
       filepath = os.path.join(output_folder, filename)


       df.to_csv(filepath, index=False)
       print(f"✓ Saved {len(df)} rows to {filename}")


except Exception as e:
   print(f"✗ Error querying data: {str(e)}")


# ---- TIMER END ----
elapsed = time.time() - start_time
print(f"\nAll files saved to: {output_folder}")
print(f"⏱ Total runtime: {elapsed:.2f} seconds ({elapsed/60:.2f} minutes)")


Querying data from 2022-01-01 to 2023-01-01
✓ Saved 2823381 rows to lseg_us_equity_flow_daily_2022-01-01_to_2023-01-01.csv

All files saved to: /Users/wyattcanderson06/Desktop/Entitlements/lseg_us_equity_flow_daily
⏱ Total runtime: 187.72 seconds (3.13 minutes)


##### When this query is run, it will download the LSEG US Equity Flow dataset for the specified date range and save it as a CSV file in the specified output folder. Query is set to download the dataset for the date range of 2022-01-01 to 2025-01-01. We recommend downloading three years at a time so we don't overload the system. Change the back_to_date and up_to_date variables to download a different date range.