##### algom/playbooks

# etl

ETL pipeline for asset prices (OHLCV), standard indicators and engineered features. Loads output data to [BigQuery](https://console.cloud.google.com/bigquery?project=algomosaic-nyc&p=algomosaic-nyc&page=project).


In [1]:
from src.extract import ticker_extract

In [2]:
#### Add list of tickers
tickers = [
#     'ADA-USDT',
#     'BCH-USDT',
#     'BNB-USDT',
    'BTC-USDT',
#     'ETH-USDT',
#     'EOS-USDT',
#     'LTC-USDT',
#     'LINK-USDT',
#     'NEO-USDT',
#     'OMG-USDT',
#     'TRX-USDT',
#     'XRP-USDT',
#     'XLM-USDT',
#     'ZRX-USDT',
]

# Add list of years to process
years = [
    2017,
    2018,
    2019,
    2020,
    2021,
]


In [3]:
iteration='i05'
interval='hour'

for ticker in tickers:
    for year in years:
        print("RUNNING: {} for {}.".format(ticker, year))
        model = ticker_extract.run_extract_process(
            ticker=ticker,
            start_date='{}-01-01'.format(year),
            end_date='{}-01-01'.format(year+1),
            project_id='algom-trading',
            destination_table='train_features.features_{ticker}_{interval}_{iteration}_{year}0101',
            table_params={
                'ticker': ticker,
                'interval': interval,
                'iteration': iteration,
                'year': str(year)
            },
            interval='hour',
            exchange='binance',
            data_library='src.extract.cryptocompare_ticker_data',
            features_library='src.features.algom_trading_v001.get_features_{}_{}'.format(interval, iteration),
            to_bq=True,
            if_exists='replace'
        )


RUNNING: BTC-USDT for 2017.
RUNNING: algom-trading:train_features.features_{ticker}_{interval}_{iteration}_{year}0101 is being extracted and transformed.
RUNNING: Extracting data using src.extract.cryptocompare_ticker_data.
Extracting 1 of 5: BTC-USDT up to 2018-01-01 00:00:00
Extracting 2 of 5: BTC-USDT up to 2017-10-09 16:00:00
Extracting 3 of 5: BTC-USDT up to 2017-07-18 08:00:00
Extracting 4 of 5: BTC-USDT up to 2017-04-26 00:00:00
Extracting 5 of 5: BTC-USDT up to 2017-02-01 16:00:00
RUNNING: Applying feature engineering using src.features.algom_trading_v001.get_features_hour_i05.
RUNNING: Cleaning final dataset.
SUCCESS: Loaded DataFrame.
RUNNING: loading features into BigQuery.


1it [00:20, 20.62s/it]


SUCCESS: algom-trading:train_features.features_BTC_USDT_hour_i05_20170101 has been loaded to BigQuery. Runtime: 0:00:27.955031.
RUNNING: BTC-USDT for 2018.
RUNNING: algom-trading:train_features.features_{ticker}_{interval}_{iteration}_{year}0101 is being extracted and transformed.
RUNNING: Extracting data using src.extract.cryptocompare_ticker_data.
Extracting 1 of 5: BTC-USDT up to 2019-01-01 00:00:00
Extracting 2 of 5: BTC-USDT up to 2018-10-09 16:00:00
Extracting 3 of 5: BTC-USDT up to 2018-07-18 08:00:00
Extracting 4 of 5: BTC-USDT up to 2018-04-26 00:00:00
Extracting 5 of 5: BTC-USDT up to 2018-02-01 16:00:00
RUNNING: Applying feature engineering using src.features.algom_trading_v001.get_features_hour_i05.
RUNNING: Cleaning final dataset.
SUCCESS: Loaded DataFrame.
RUNNING: loading features into BigQuery.


1it [00:21, 21.01s/it]


SUCCESS: algom-trading:train_features.features_BTC_USDT_hour_i05_20180101 has been loaded to BigQuery. Runtime: 0:00:26.777075.
RUNNING: BTC-USDT for 2019.
RUNNING: algom-trading:train_features.features_{ticker}_{interval}_{iteration}_{year}0101 is being extracted and transformed.
RUNNING: Extracting data using src.extract.cryptocompare_ticker_data.
Extracting 1 of 5: BTC-USDT up to 2020-01-01 00:00:00
Extracting 2 of 5: BTC-USDT up to 2019-10-09 16:00:00
Extracting 3 of 5: BTC-USDT up to 2019-07-18 08:00:00
Extracting 4 of 5: BTC-USDT up to 2019-04-26 00:00:00
Extracting 5 of 5: BTC-USDT up to 2019-02-01 16:00:00
RUNNING: Applying feature engineering using src.features.algom_trading_v001.get_features_hour_i05.
RUNNING: Cleaning final dataset.
SUCCESS: Loaded DataFrame.
RUNNING: loading features into BigQuery.


1it [00:27, 27.15s/it]


SUCCESS: algom-trading:train_features.features_BTC_USDT_hour_i05_20190101 has been loaded to BigQuery. Runtime: 0:00:35.611292.
RUNNING: BTC-USDT for 2020.
RUNNING: algom-trading:train_features.features_{ticker}_{interval}_{iteration}_{year}0101 is being extracted and transformed.
RUNNING: Extracting data using src.extract.cryptocompare_ticker_data.
Extracting 1 of 5: BTC-USDT up to 2021-01-01 00:00:00
Extracting 2 of 5: BTC-USDT up to 2020-10-09 16:00:00
Extracting 3 of 5: BTC-USDT up to 2020-07-18 08:00:00
Extracting 4 of 5: BTC-USDT up to 2020-04-26 00:00:00
Extracting 5 of 5: BTC-USDT up to 2020-02-02 16:00:00
RUNNING: Applying feature engineering using src.features.algom_trading_v001.get_features_hour_i05.
RUNNING: Cleaning final dataset.
SUCCESS: Loaded DataFrame.
RUNNING: loading features into BigQuery.


1it [00:20, 20.56s/it]


SUCCESS: algom-trading:train_features.features_BTC_USDT_hour_i05_20200101 has been loaded to BigQuery. Runtime: 0:00:25.389915.
RUNNING: BTC-USDT for 2021.
RUNNING: algom-trading:train_features.features_{ticker}_{interval}_{iteration}_{year}0101 is being extracted and transformed.
RUNNING: Extracting data using src.extract.cryptocompare_ticker_data.
Extracting 1 of 5: BTC-USDT up to 2022-01-01 00:00:00
Extracting 2 of 5: BTC-USDT up to 2021-10-09 16:00:00
Extracting 3 of 5: BTC-USDT up to 2021-07-18 08:00:00
Extracting 4 of 5: BTC-USDT up to 2021-04-26 00:00:00
Extracting 5 of 5: BTC-USDT up to 2021-02-01 16:00:00
RUNNING: Applying feature engineering using src.features.algom_trading_v001.get_features_hour_i05.
RUNNING: Cleaning final dataset.
SUCCESS: Loaded DataFrame.
RUNNING: loading features into BigQuery.


1it [00:20, 20.40s/it]

SUCCESS: algom-trading:train_features.features_BTC_USDT_hour_i05_20210101 has been loaded to BigQuery. Runtime: 0:00:30.920356.





In [4]:
# model.data.df[[h for h in list(model.data.df) if 'Vortex' in h]]
model.data.df.sample(5)


Unnamed: 0,ticker_time_sec,close,high,low,open,volume_base,volume,conversionType,conversionSymbol,partition_date,...,MIN_48,MIN_72,MIN_168,MIN_240,MAX_12,MAX_24,MAX_48,MAX_72,MAX_168,MAX_240
1577,1605657600,17693.64,17820.0,17601.0,17659.38,5133.74,90948263.3,force_direct,,2021-01-22,...,15935.15,15845.1,15359.49,14931.53,17808.66,17808.66,17808.66,17808.66,17808.66,17808.66
1922,1604415600,13710.38,13798.0,13676.65,13784.59,4314.15,59310310.71,force_direct,,2021-01-22,...,13297.56,13297.56,,,13784.59,13784.59,13817.46,13866.99,,
1361,1606435200,17393.73,17400.0,17012.56,17149.47,4072.21,70145776.53,force_direct,,2021-01-22,...,16516.87,16516.87,16516.87,16516.87,17393.73,18799.32,19291.59,19384.43,19384.43,19384.43
525,1609444800,29126.7,29139.65,28862.0,28897.83,1936.48,56103301.54,force_direct,,2021-01-22,...,26913.12,26356.22,23413.51,22558.42,29136.49,29155.25,29155.25,29155.25,29155.25,29155.25
596,1609189200,26627.3,26958.64,26620.4,26866.97,2273.35,60999372.31,force_direct,,2021-01-22,...,26223.51,24584.0,22558.42,22307.5,27276.11,27276.11,27822.16,27822.16,27822.16,27822.16
