##### algom/playbooks

# etl

ETL pipeline for asset prices (OHLCV), standard indicators and engineered features. Loads output data to [BigQuery](https://console.cloud.google.com/bigquery?project=algomosaic-nyc&p=algomosaic-nyc&page=project).

#### VERSION i04

Includes
1. Price action data, calculated as log % difference since most recent Open
2. Volume change, calculated as log % change since most recent period


#### Steps

1. Initialize ETL process
2. Specify data and feature libraries (optional)
3. Run ETL process and without loading to BigQuery
4. Run ETL process and load to BigQuery

<br>

In [1]:
from src.extract import ticker_extract

<br><br>

### BTC-USD -- hour -- i01 -- 2016

In [2]:
years = [
    2017,
    2018,
    2019,
    2020,
]

In [4]:
iteration = 'i04'

for year in years:
    print("RUNNING: {}.".format(year))
    model = ticker_extract.run_extract_process(
        ticker='BTC-USDT',
        start_date='{}-01-01'.format(year),
        end_date='{}-01-01'.format(year+1),
        project='algom-trading',
        destination_table='train_features.features_{ticker}_{interval}_{iteration}_{year}0101',
        table_params={
            'ticker': 'BTC-USDT',
            'interval': 'hour',
            'iteration': iteration,
            'year': str(year)
        },
        interval='hour',
        exchange='binance',
        data_library='src.extract.cryptocompare_ticker_data',
        features_library='src.features.algom_trading_v001.get_features_hour_{}'.format(iteration),
        to_bq=True,
    )

model.data.df.tail()

RUNNING: 2017.
RUNNING: algom-trading:train_features.features_{ticker}_{interval}_{iteration}_{year}0101 is being extracted and transformed.
RUNNING: Extracting data using src.extract.cryptocompare_ticker_data.
Extracting 1 of 5: BTC-USDT up to 2018-01-01 00:00:00
Extracting 2 of 5: BTC-USDT up to 2017-10-09 16:00:00
Extracting 3 of 5: BTC-USDT up to 2017-07-18 08:00:00
Extracting 4 of 5: BTC-USDT up to 2017-04-26 00:00:00
Extracting 5 of 5: BTC-USDT up to 2017-02-01 16:00:00
RUNNING: Applying feature engineering using src.features.algom_trading_v001.get_features_hour_i04.


  result = getattr(ufunc, method)(*inputs, **kwargs)


RUNNING: Cleaning final dataset.
SUCCESS: Loaded DataFrame.
RUNNING: loading features into BigQuery.


1it [00:15, 15.24s/it]


SUCCESS: algom-trading:train_features.features_BTC_USDT_hour_i04_20170101 has been loaded to BigQuery. Runtime: 0:00:21.430378.
RUNNING: 2018.
RUNNING: algom-trading:train_features.features_{ticker}_{interval}_{iteration}_{year}0101 is being extracted and transformed.
RUNNING: Extracting data using src.extract.cryptocompare_ticker_data.
Extracting 1 of 5: BTC-USDT up to 2019-01-01 00:00:00
Extracting 2 of 5: BTC-USDT up to 2018-10-09 16:00:00
Extracting 3 of 5: BTC-USDT up to 2018-07-18 08:00:00
Extracting 4 of 5: BTC-USDT up to 2018-04-26 00:00:00
Extracting 5 of 5: BTC-USDT up to 2018-02-01 16:00:00
RUNNING: Applying feature engineering using src.features.algom_trading_v001.get_features_hour_i04.
RUNNING: Cleaning final dataset.
SUCCESS: Loaded DataFrame.
RUNNING: loading features into BigQuery.


1it [01:01, 61.56s/it]


SUCCESS: algom-trading:train_features.features_BTC_USDT_hour_i04_20180101 has been loaded to BigQuery. Runtime: 0:01:05.869798.
RUNNING: 2019.
RUNNING: algom-trading:train_features.features_{ticker}_{interval}_{iteration}_{year}0101 is being extracted and transformed.
RUNNING: Extracting data using src.extract.cryptocompare_ticker_data.
Extracting 1 of 5: BTC-USDT up to 2020-01-01 00:00:00
Extracting 2 of 5: BTC-USDT up to 2019-10-09 16:00:00
Extracting 3 of 5: BTC-USDT up to 2019-07-18 08:00:00
Extracting 4 of 5: BTC-USDT up to 2019-04-26 00:00:00
Extracting 5 of 5: BTC-USDT up to 2019-02-01 16:00:00
RUNNING: Applying feature engineering using src.features.algom_trading_v001.get_features_hour_i04.
RUNNING: Cleaning final dataset.
SUCCESS: Loaded DataFrame.
RUNNING: loading features into BigQuery.


1it [00:33, 33.88s/it]


SUCCESS: algom-trading:train_features.features_BTC_USDT_hour_i04_20190101 has been loaded to BigQuery. Runtime: 0:00:38.412738.
RUNNING: 2020.
RUNNING: algom-trading:train_features.features_{ticker}_{interval}_{iteration}_{year}0101 is being extracted and transformed.
RUNNING: Extracting data using src.extract.cryptocompare_ticker_data.
Extracting 1 of 5: BTC-USDT up to 2021-01-01 00:00:00
Extracting 2 of 5: BTC-USDT up to 2020-10-09 16:00:00
Extracting 3 of 5: BTC-USDT up to 2020-07-18 08:00:00
Extracting 4 of 5: BTC-USDT up to 2020-04-26 00:00:00
Extracting 5 of 5: BTC-USDT up to 2020-02-02 16:00:00
RUNNING: Applying feature engineering using src.features.algom_trading_v001.get_features_hour_i04.
RUNNING: Cleaning final dataset.
SUCCESS: Loaded DataFrame.
RUNNING: loading features into BigQuery.


1it [00:37, 37.59s/it]

SUCCESS: algom-trading:train_features.features_BTC_USDT_hour_i04_20200101 has been loaded to BigQuery. Runtime: 0:00:52.103591.





Unnamed: 0,ticker_time_sec,close,high,low,open,volume_base,volume,conversionType,conversionSymbol,partition_date,...,volume_lag15,volume_lag16,volume_lag17,volume_lag18,volume_lag19,volume_lag20,volume_lag21,volume_lag22,volume_lag23,volume_lag24
4,1609444800,29126.7,29139.65,28862.0,28897.83,1936.48,56103301.54,force_direct,,2021-01-04,...,-0.107137,-0.357195,-0.288114,0.015676,-1.283829,-1.048373,-0.119924,0.010799,-0.606369,-0.791053
3,1609448400,28966.36,29169.55,28900.79,29126.7,2524.47,73351462.94,force_direct,,2021-01-04,...,0.171483,0.158022,-0.092036,-0.022955,0.280835,-1.01867,-0.783214,0.145235,0.275958,-0.341209
2,1609452000,29100.84,29143.73,28910.19,28966.36,1438.51,41807122.89,force_direct,,2021-01-04,...,-0.797065,-0.39094,-0.404401,-0.654459,-0.585379,-0.281588,-1.581093,-1.345637,-0.417188,-0.286465
1,1609455600,28923.63,29110.35,28780.0,29100.84,1976.42,57243040.07,force_direct,,2021-01-04,...,-0.534471,-0.479386,-0.073261,-0.086722,-0.33678,-0.267699,0.036091,-1.263414,-1.027958,-0.099509
0,1609459200,28995.13,29031.34,28690.17,28923.63,2311.81,66768830.34,force_direct,,2021-01-04,...,-0.245201,-0.377727,-0.322642,0.083483,0.070022,-0.180036,-0.110956,0.192835,-1.10667,-0.871214


In [11]:
# list(model.data.df)
# model.data.df.head()