# Pythonic vs classical coding

In this notebook, we demonstrate the power of python's framework Pandas against a "classical" coding. The classical coding consists to code as if Python was a low level language, like C. We demonstrate here that coding correctly with python can accelerate a simple code more than 50 times faster.

In [1]:
import pandas as pd
import numpy as np
import datetime
from IPython.core.debugger import set_trace

In [2]:
PATH_DATA = '/Users/xavier/Projets/crypto/data/'

path_data = PATH_DATA+'binance/4hour/BTCUSDT_4hour.csv'

data = pd.read_csv(path_data)

## Classic way of coding:

In [3]:
def create_labels(df, col_name, window_size=11):
    """
    Data is labeled as per the logic in research paper
    Label code : BUY => 1, SELL => 0, HOLD => 2
    params :
        df => Dataframe with data
        col_name => name of column which should be used to determine strategy
    returns : numpy array with integer codes for labels with
              size = total-(window_size)+1
    """

    row_counter = 0
    total_rows = len(df)
    labels = np.zeros(total_rows)
    labels[:] = np.nan
    print("Calculating labels")

    while row_counter < total_rows:
        if row_counter >= window_size - 1:
            window_begin = row_counter - (window_size - 1)
            window_end = row_counter
            window_middle = (window_begin + window_end) / 2

            min_ = np.inf
            min_index = -1
            max_ = -np.inf
            max_index = -1
            for i in range(window_begin, window_end + 1):
                price = df.iloc[i][col_name]
                if price < min_:
                    min_ = price
                    min_index = i
                if price > max_:
                    max_ = price
                    max_index = i

            if max_index == window_middle:
                labels[int(window_middle)] = 0
            elif min_index == window_middle:
                labels[int(window_middle)] = 1
            else:
                labels[int(window_middle)] = 2

        row_counter = row_counter + 1

    return labels

## "Pythonic" code:

In [4]:
def get_peak(func):
    def algo(data, window):
        max_local = func(data, window)
        max_local = np.asarray(max_local)
        max_local_valid = np.where(max_local==int(window/2))[0]
        return max_local_valid
    return algo
    
@get_peak
def get_max_peak(data, window):
    return data.rolling(window, center=True).apply(lambda x: np.where(x==np.max(x))[0][0], raw=True)

@get_peak
def get_min_peak(data, window):
    return data.rolling(window, center=True).apply(lambda x: np.where(x==np.min(x))[0][0], raw=True)

def return_min_max_peak(data, window=14):
    i_x = get_max_peak(data['close'], window) #i_x stands for Index_maX
    i_n = get_min_peak(data['close'], window) #i_n stands for Index_miN
    return (i_x,i_n)

### We now test the performance for the two algorithms

First we test the classic method to code, using loops:

In [5]:
begin_time = datetime.datetime.now()
label = create_labels(data, 'close') 
end_time = datetime.datetime.now()
time_classic = end_time-begin_time

Calculating labels


In [6]:
time_classic

datetime.timedelta(seconds=6, microseconds=441868)

In [7]:
begin_time = datetime.datetime.now()
label = np.zeros(len(data))
i_x, i_n = return_min_max_peak(data,window=11)
label[i_x]=1
label[i_n]=2
end_time = datetime.datetime.now()
time_pythonic = end_time-begin_time

In [8]:
time_pythonic

datetime.timedelta(microseconds=109017)

In [9]:
time_classic/time_pythonic

59.090490473962774

## Conclusion

Using pandas framework is about more than 50 times faster than using loops.