# `Part 3/3: Order Book depth data collection with Binance API`

The purpose of this series of notebooks is to showcase real-time candle, trade and order book data collection methods from Binance exchange.

Data is collected through websockets and then stored in SQL databases.

Possible applications of the stored data include:
* creation of custom alerts for discretionary trading 
* development of a fully automated systematic training system based on a set of predetermined rules

#### Author: Vladislav Semin

## 1. Import libraries

In [1]:
import time
import calendar
from datetime import datetime

import pandas as pd #to create a dataframe

import websocket
from binance.client import Client # Import the Binance Client
from binance.websockets import BinanceSocketManager # Import the Binance Socket Manager

import sqlalchemy as db
from sqlalchemy import create_engine

import itertools # Necessary for the OB data structure we will create

## 2. Setup Binance websocket connection

**IMPORTANT NOTE:** 
This kernel will not run without API keys. Register an account at Binance crypto exchange website to get your API keys:
* https://www.binance.com/en/register

In [2]:
# Input Binance API keys
# PUBLIC = ''
# SECRET = ''

In [3]:
# Setup Binance websocket connection
client = Client(api_key=PUBLIC, api_secret=SECRET)
bm = BinanceSocketManager(client)

## 3. Select cryptocurrency pair

In [4]:
# We will be collecting data for the following cryptocurrency pair
pair = 'ETHBTC'

## 4. Setup SQL database

In [5]:
# Name of SQL database
depth_db = 'binance_depth_{}'.format(pair)

In [6]:
# Set up engine to append the values to the SQL database
engine = create_engine('sqlite:///{}.db'.format(depth_db), echo=False) 

## 5. Set up data collection format

## Create a custom data structure to extract & store OB data

In [7]:
# process_message_depth is a callback function requested by .start_depth_socket method of BinanceSocketManager class
def process_message_depth(msg):
   
    global engine
    global pair
    global depth_db
    
    # Timestamp at the time of depth collection converted to date time with microseconds
    timestamp = datetime.now()
    time_ = timestamp.strftime("%Y-%m-%d %H:%M:%S.%f")

    
    # Creating a data structure to store OB data
    # First we deal with the bids side of P & Q
    bids = msg['bids']
    
    # Extract price and quantity values
    bid_vals = list(itertools.chain.from_iterable(bids))
    
    # Reverse the order of values so that Bid prices are in ascending order 
    # from farthest to closest to mkt value
    bid_vals.reverse()
    
    # Create a list of price-quantity tuples(our preliminary keys)
    b_k_t = [('Bid(-{}) P'.format(str(x)), ('Bid(-{}) Q'.format(str(x)))) for x in range(21) if x > 0]
    
    # Create a list of keys
    bid_keys = list(itertools.chain.from_iterable(b_k_t))
    
    # Reverse the order of values so that key descriptions are in ascending order 
    # from farthest to closest to mkt value
    bid_keys.reverse()
    
    # Zip key and value lists to produce a 20 closest bids dictionary!
    bids_dict = dict(zip(bid_keys, bid_vals))
    
    
    # Secondly, we deal with the asks side of P & Q
    asks = msg['asks']
    
    ask_vals = list(itertools.chain.from_iterable(asks))
    
    # with Asks we do not need to reverse the order of keys and values
    a_k_t = [('Ask(+{}) P'.format(str(x)), ('Ask(+{}) Q'.format(str(x)))) for x in range(21) if x > 0]
    
    ask_keys = list(itertools.chain.from_iterable(a_k_t))
    
    # Zip key and value lists to produce a 20 closest asks dictionary!
    asks_dict = dict(zip(ask_keys, ask_vals))
    
    
    # Thirdly, we concatenate bids and asks dictionaries
    bids_dict.update(asks_dict)
    
    # In-fourth we collect supplementary important time data
    time_depth_dict = {"OB @ Time": time_, "Timestamp": timestamp, 
                 "lastUpdateId": msg['lastUpdateId'], "Pair": pair}
    
    # Finally, we concatenate bids&asks dict with time dict to get the final OB data!
    time_depth_dict.update(bids_dict)
               
        
    depth_stream_df = pd.DataFrame([time_depth_dict]).set_index('OB @ Time')  
    
    # real-time export of streaming dataframes to SQLite database        
    depth_stream_df.to_sql(depth_db, if_exists="append", con=engine)
    
    print('### {} OB depth data updated at {}'.format(pair, time_))

## 6. Collect OB depth data

In [8]:
# Connect to collect data
# Select depth equal to 20: this means we get 20 bid and 20 ask quotes
conn_key_depth = bm.start_depth_socket(pair, process_message_depth, 
                                 depth=BinanceSocketManager.WEBSOCKET_DEPTH_20)
# Start data collection
bm.start()

# For testing purposes, streaming period is set to 15 seconds. Max connection time is 24 hrs.
time.sleep(15) 

bm.stop_socket(conn_key_depth)

bm.close()

### ETHBTC OB depth data updated at 2020-05-07 00:37:04.742808
### ETHBTC OB depth data updated at 2020-05-07 00:37:05.744521
### ETHBTC OB depth data updated at 2020-05-07 00:37:06.743008
### ETHBTC OB depth data updated at 2020-05-07 00:37:07.742335
### ETHBTC OB depth data updated at 2020-05-07 00:37:08.742666
### ETHBTC OB depth data updated at 2020-05-07 00:37:09.742945
### ETHBTC OB depth data updated at 2020-05-07 00:37:10.741422
### ETHBTC OB depth data updated at 2020-05-07 00:37:11.741885
### ETHBTC OB depth data updated at 2020-05-07 00:37:12.743192
### ETHBTC OB depth data updated at 2020-05-07 00:37:13.743545
### ETHBTC OB depth data updated at 2020-05-07 00:37:14.744350
### ETHBTC OB depth data updated at 2020-05-07 00:37:15.743678
### ETHBTC OB depth data updated at 2020-05-07 00:37:16.745036
### ETHBTC OB depth data updated at 2020-05-07 00:37:17.745333


## 7. Check the data appended to SQL database

In [9]:
#select and show data from the dataframe
binance_depth = engine.execute("SELECT * FROM " + depth_db).fetchall()

print(binance_depth)

[('2020-05-07 00:37:04.742808', '2020-05-07 00:37:04.742808', 1195301386, 'ETHBTC', '5.57500000', '0.02220300', '4.21100000', '0.02220400', '272.46300000', '0.02220800', '6.96800000', '0.02220900', '0.84200000', '0.02221100', '52.94900000', '0.02221200', '2.28100000', '0.02221300', '108.00000000', '0.02221400', '7.69900000', '0.02221600', '8.99500000', '0.02221700', '5.23500000', '0.02221800', '0.84200000', '0.02221900', '16.27000000', '0.02222000', '9.63600000', '0.02222100', '9.19700000', '0.02222200', '11.37000000', '0.02222300', '7.52900000', '0.02222400', '8.99500000', '0.02222500', '1.21200000', '0.02222600', '7.01500000', '0.02222700', '0.02222900', '6.45700000', '0.02223400', '0.19100000', '0.02223500', '0.00800000', '0.02223600', '1.14600000', '0.02224100', '6.37600000', '0.02224200', '16.96400000', '0.02224300', '0.63900000', '0.02224500', '104.29300000', '0.02224600', '0.05000000', '0.02224700', '1.83600000', '0.02224800', '18.99100000', '0.02224900', '21.88600000', '0.02225

In [10]:
# Or we can transform SQL data into pandas dataframe
df = pd.DataFrame(binance_depth, columns=["OB @ Time", "Timestamp", "lastUpdateId", "Pair", 
                                          'Bid(-20) Q', 'Bid(-20) P', 'Bid(-19) Q', 'Bid(-19) P',
                                          'Bid(-18) Q', 'Bid(-18) P', 'Bid(-17) Q', 'Bid(-17) P',
                                          'Bid(-16) Q', 'Bid(-16) P', 'Bid(-15) Q', 'Bid(-15) P', 
                                          'Bid(-14) Q', 'Bid(-14) P', 'Bid(-13) Q', 'Bid(-13) P', 
                                          'Bid(-12) Q', 'Bid(-12) P', 'Bid(-11) Q', 'Bid(-11) P', 'Bid(-10) Q',
                                          'Bid(-10) P', 'Bid(-9) Q', 'Bid(-9) P', 'Bid(-8) Q', 
                                          'Bid(-8) P', 'Bid(-7) Q', 'Bid(-7) P', 'Bid(-6) Q',
                                          'Bid(-6) P', 'Bid(-5) Q', 'Bid(-5) P', 'Bid(-4) Q',
                                          'Bid(-4) P', 'Bid(-3) Q', 'Bid(-3) P', 'Bid(-2) Q', 
                                          'Bid(-2) P', 'Bid(-1) Q', 'Bid(-1) P', 'Ask(+1) P', 
                                          'Ask(+1) Q', 'Ask(+2) P', 'Ask(+2) Q', 'Ask(+3) P', 
                                          'Ask(+3) Q', 'Ask(+4) P', 'Ask(+4) Q', 'Ask(+5) P',
                                          'Ask(+5) Q', 'Ask(+6) P', 'Ask(+6) Q', 'Ask(+7) P',
                                          'Ask(+7) Q', 'Ask(+8) P', 'Ask(+8) Q', 'Ask(+9) P',
                                          'Ask(+9) Q', 'Ask(+10) P', 'Ask(+10) Q', 'Ask(+11) P',
                                          'Ask(+11) Q', 'Ask(+12) P', 'Ask(+12) Q', 'Ask(+13) P',
                                          'Ask(+13) Q', 'Ask(+14) P', 'Ask(+14) Q', 'Ask(+15) P',
                                          'Ask(+15) Q', 'Ask(+16) P', 'Ask(+16) Q', 'Ask(+17) P',
                                          'Ask(+17) Q', 'Ask(+18) P', 'Ask(+18) Q', 'Ask(+19) P',
                                          'Ask(+19) Q', 'Ask(+20) P', 'Ask(+20) Q']).set_index('OB @ Time')

df

Unnamed: 0_level_0,Timestamp,lastUpdateId,Pair,Bid(-20) Q,Bid(-20) P,Bid(-19) Q,Bid(-19) P,Bid(-18) Q,Bid(-18) P,Bid(-17) Q,...,Ask(+16) P,Ask(+16) Q,Ask(+17) P,Ask(+17) Q,Ask(+18) P,Ask(+18) Q,Ask(+19) P,Ask(+19) Q,Ask(+20) P,Ask(+20) Q
OB @ Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-05-07 00:37:04.742808,2020-05-07 00:37:04.742808,1195301386,ETHBTC,5.575,0.022203,4.211,0.022204,272.463,0.022208,6.968,...,0.022254,16.719,0.022255,18.117,0.022256,14.31,0.022257,259.158,0.022258,15.027
2020-05-07 00:37:05.744521,2020-05-07 00:37:05.744521,1195301413,ETHBTC,5.775,0.022203,4.211,0.022204,272.463,0.022208,6.968,...,0.022257,259.158,0.022258,15.027,0.022259,39.785,0.02226,7.591,0.022261,0.08
2020-05-07 00:37:06.743008,2020-05-07 00:37:06.743008,1195301435,ETHBTC,5.575,0.022203,4.211,0.022204,272.463,0.022208,2.39,...,0.022257,259.158,0.022258,15.027,0.02226,7.591,0.022261,0.08,0.022262,9.597
2020-05-07 00:37:07.742335,2020-05-07 00:37:07.742335,1195301475,ETHBTC,5.575,0.022203,4.211,0.022204,200.0,0.022206,272.463,...,0.022261,0.08,0.022262,9.597,0.022263,0.039,0.022264,0.293,0.022265,5.533
2020-05-07 00:37:08.742666,2020-05-07 00:37:08.742666,1195301498,ETHBTC,0.2,0.022202,5.575,0.022203,4.211,0.022204,200.0,...,0.02226,7.591,0.022261,0.08,0.022262,9.597,0.022263,0.039,0.022264,0.293
2020-05-07 00:37:09.742945,2020-05-07 00:37:09.742945,1195301538,ETHBTC,14.684,0.0222,14.582,0.022201,0.2,0.022202,5.575,...,0.022259,7.232,0.02226,7.591,0.022261,0.08,0.022262,9.597,0.022263,0.039
2020-05-07 00:37:10.741422,2020-05-07 00:37:10.741422,1195301580,ETHBTC,10.107,0.022201,0.2,0.022202,28.575,0.022203,4.211,...,0.022259,0.856,0.02226,7.591,0.022261,0.08,0.022262,9.597,0.022263,0.039
2020-05-07 00:37:11.741885,2020-05-07 00:37:11.741885,1195301601,ETHBTC,28.575,0.022203,4.211,0.022204,7.285,0.022205,200.0,...,0.022259,0.856,0.02226,7.591,0.022261,0.08,0.022262,9.597,0.022263,0.039
2020-05-07 00:37:12.743192,2020-05-07 00:37:12.743192,1195301610,ETHBTC,4.211,0.022204,7.285,0.022205,200.0,0.022206,272.463,...,0.022259,0.856,0.02226,7.591,0.022261,0.08,0.022262,9.597,0.022263,0.039
2020-05-07 00:37:13.743545,2020-05-07 00:37:13.743545,1195301622,ETHBTC,4.211,0.022204,7.285,0.022205,200.0,0.022206,272.463,...,0.022258,15.027,0.022259,0.856,0.02226,7.591,0.022261,0.08,0.022262,9.597


## References:
#### Binance API Python libraries used in this demonstration:

https://gist.github.com/alexbrillant/961502146a7fc5d03205f9b07b8535f5 - Binance Socket Manager class and its methods

https://github.com/binance-exchange/python-binance