# Model training

#### BTC-USDT for ROR_n25

<br>

#### Performance by Model ID


```
model_id: 1a198235336cc9cd417004b752cc80ffdf7b2705
    - n_estimators = 500
    - max_depth = 10
    - Mean Absolute Error:	 0.01207
    - Mean Absolute Outcome:	 0.03295
    - Mean Absolute Percent Error:	 0.36637
    - Error Variance:	 0.00042
    - R-Squared:		 0.81834

```


```
model_id: 68778f094c78320b4e733e2ed4744de98e76e6cc
    - n_estimators = 500
    - max_depth = 20
    - Mean Absolute Error:	 0.0133
    - Mean Absolute Outcome:	 0.03295
    - Mean Absolute Percent Error:	 0.40354
    - Error Variance:	 0.00051
    - R-Squared:		 0.77968

```


```
model_id: aa877090e5ae618c09838b02e4f398cc4d52d6c9
    - n_estimators = 500
    - max_depth = 30
    - Mean Absolute Error:	 0.01255
    - Mean Absolute Outcome:	 0.03295
    - Mean Absolute Percent Error:	 0.38099
    - Error Variance:	 0.00051
    - R-Squared:		 0.78235

```


<br>

#### Resources
+ [BigQuery](https://console.cloud.google.com/bigquery?folder=&organizationId=&project=algomosaic-nyc)
+ [Storage](https://console.cloud.google.com/storage/browser/algom-trading/models/?project=algomosaic-nyc)



<br> 

#### Requirements


In [1]:
import numpy as np
from algom import configs
from algom.utils.data_object import dataObject
from algom.model_regression import modelRegression
from algom.model_plots import modelPlots

<br>

### Load training data


In [2]:
# Load model data
data = dataObject("""
SELECT DISTINCT * EXCEPT (
  conversionType,
  conversionSymbol,
  partition_date)
FROM `algom-trading.train_features.features_BTC_USDT_hour_i02_*`
WHERE 
  _table_suffix in (
    '20170101',
    '20180101',
    '20190101')
AND year BETWEEN 2017 AND 2019
AND close IS NOT NULL
""")

RUNNING: Querying SQL script.


Downloading: 100%|██████████| 28762/28762 [00:41<00:00, 687.93rows/s] 

SUCCESS: Loaded SQL query.





In [3]:
# Drop nan values
data.df = data.df.replace([np.inf, -np.inf], np.nan).dropna()
print(len(data.df))
data.df.head()

21562


Unnamed: 0,ticker_time_sec,close,high,low,open,volume_base,volume,etl_time,ticker_time,ticker,...,MACDsign_9_12,MACDdiff_9_12,MACD_26_200,MACDsign_26_200,MACDdiff_26_200,MACD_20_200,MACDsign_20_200,MACDdiff_20_200,Mass_Index_9_25,SO_pct_k
0,1504022400,4503.92,4584.0,4503.92,4570.0,30.83,139924.8,2021-01-12 09:13:55.699499+00:00,2017-08-29 16:00:00+00:00,BTC-USDT,...,4470.849428,-4450.499025,341.281305,4470.849428,-4129.568123,358.909066,4470.849428,-4111.940362,24.906926,0.0
1,1504026000,4555.55,4555.55,4496.95,4503.92,36.1,163394.15,2021-01-12 09:13:55.699499+00:00,2017-08-29 17:00:00+00:00,BTC-USDT,...,4487.789542,-4466.660714,348.085887,4487.789542,-4139.703656,367.401453,4487.789542,-4120.38809,25.005133,1.0
2,1504029600,4536.63,4555.55,4511.45,4555.55,25.24,114433.49,2021-01-12 09:13:55.699499+00:00,2017-08-29 18:00:00+00:00,BTC-USDT,...,4497.557634,-4477.425219,352.853952,4497.557634,-4144.703682,373.046776,4497.557634,-4124.510858,25.040033,0.570975
3,1504033200,4555.55,4555.55,4521.01,4536.63,33.56,152388.6,2021-01-12 09:13:55.699499+00:00,2017-08-29 19:00:00+00:00,BTC-USDT,...,4509.156107,-4489.444416,358.178048,4509.156107,-4150.978059,379.363758,4509.156107,-4129.792349,25.012151,1.0
4,1504036800,4577.54,4597.0,4535.09,4555.55,13.59,62093.41,2021-01-12 09:13:55.699499+00:00,2017-08-29 20:00:00+00:00,BTC-USDT,...,4522.832886,-4502.997583,364.2048,4522.832886,-4158.628085,386.538231,4522.832886,-4136.294654,25.022061,0.685673


In [4]:
# list(data.df)

<br>

### Initialize modelRegression class

In [5]:
# Initialize model class
model = modelRegression(
    data, 
    outcome='ROR_n24',
    index_features=configs.INDEX_FEATURES, 
    omit_features=configs.OMIT_FEATURES
)

# Specify regression parameters
from sklearn import ensemble
reg = ensemble.GradientBoostingRegressor(
    loss='ls', 
    learning_rate=0.1,
    n_estimators=500,
    subsample=.9,
    criterion='friedman_mse', 
    min_samples_split=3, 
    min_samples_leaf=3,
    min_weight_fraction_leaf=0.0,
    max_depth=10,
    min_impurity_decrease=0.0, 
    min_impurity_split=None, 
    init=None,
    random_state=None, 
    max_features=None, 
    alpha=0.9, 
    verbose=0,
    max_leaf_nodes=None, 
    warm_start=False, 
    validation_fraction=0.1, 
    n_iter_no_change=None, 
    tol=0.0001
)


# Train model
model.train(reg)


SUCCESS: Loaded dataObject.
Initialized model. As a next step, run self.predict() or self.train().
Training model on ROR_n24.
Model metadata added to `self.metadata.metadata`
Model metadata added to `self.metadata.parameters`
Set feature_importance to `self.feature_importance.feature_importance`
Fit model in 0:00:00.000018.
Get model performance.
Set evaluation to self.evaluations in 0:00:00.000018.
Set R^2 to `self.rsquared`
The following performance measures have been added:
                - self.mean_abs_error
                - self.mean_abs_outcome
                - self.mean_abs_pct_error
                - self.error_var
            
Performance metrics added to `self.performance`

MODEL PERFORMANCE SUMMARY
        - Mean Absolute Error:	 0.01013
        - Mean Absolute Outcome:	 0.03236
        - Mean Absolute Percent Error:	 0.31295
        - Error Variance:	 0.00027
        - R-Squared:		 0.87886
        
PLOT PREDICTIONS: Use the following commandsto view model performance.
 

In [None]:
# Model IDs
print('model_id: ' + model.model_id)
print('model_execution_id: ' + model.model_execution_id)


In [6]:
# Save model (optional)
model.save()


Dumped model to:
	/home/jovyan/algomosaic/data/models/20210112_GradientBoostingRegressor_7959fdd354a37ab43d2786edc7a6b041edb9c5f5.pickle
Uploaded pickle to Google Storage:
	https://storage.googleapis.com/algom-trading-sto/models/20210112_GradientBoostingRegressor_7959fdd354a37ab43d2786edc7a6b041edb9c5f5.pickle
SUCCESS: Loaded DataFrame.


1it [00:05,  5.93s/it]

Uploaded storage metadata to Google BigQuery:
	metadata.model_storage_YYYYMMDD
Saved model to Google Storage:
	models/20210112_GradientBoostingRegressor_7959fdd354a37ab43d2786edc7a6b041edb9c5f5.pickle





<br>

## View performance 

+ Trending predictions vs outcomes
+ Histogram of predictions vs outcomes


In [None]:
start_date='2019-06-01'
end_date='2019-07-01'
%matplotlib inline
model_plot = modelPlots(model)

In [None]:
model_plot.plot_predictions_by_date(start_date, end_date)

In [None]:
model_plot.plot_errors_by_date(start_date, end_date)

In [None]:
model_plot.plot_predictions_histogram(start_date, end_date)

In [None]:
model_plot.plot_errors_histogram(start_date, end_date)

In [None]:
model_plot.plot_predictions_scatterplot(start_date, end_date)

<br>

### Most important features


In [None]:
features = model.feature_importance.feature_importance
features[0:50]


<br>

### Least import features

In [None]:
features[len(features)-20:len(features)]

<br>

### View trending features


In [28]:
start_date = '2016-01-01'
end_date = '2017-01-01'
%matplotlib inline 

# from data_mgmt import data_mgmt as dm

model_plot.plot_features(
    df = model.df, 
    x = 'ticker_time', 
    y = 'ROR_n10',
    start_date=start_date, 
    end_date=end_date
)

AttributeError: 'modelPlots' object has no attribute 'plot_features'

In [None]:
model_plot.plot_features(df = model.df, x = 'ticker_time', y = 'ATR_7',
    start_date=start_date, end_date=end_date, chart_type = 'line')

model_plot.plot_features(df = model.df, x = 'ticker_time', y = 'EMA_20',
    start_date=start_date, end_date=end_date, chart_type='line')
