## 2. Pixel-based ensemble model

This method makes use of individual pixel features (spectral band intensities, vegetative index and time difference values) to train an ensemble of linear (LogisticRegression) and tree-based methods (RandomForest, ExtraTrees, XGBoost).

In [1]:
#Run this cell to automatically reload all modules (if they've been externally edited)
%load_ext autoreload
%autoreload 2

In [2]:
#Run this cell to silence warnings (not recommended!)
#Used here to silence LogReg convergence warning
import warnings
warnings.simplefilter('ignore')

### Load custom modules

In [3]:
from modules.process_data import SelectFeatures, Scale, OneHot
from modules.run_models import ModelEnsemble, PixelToObject, make_submission
from modules.metaclassifiers import UnweightedAverage

### Load python modules

In [4]:
import numpy as np
import pandas as pd
from sklearn.metrics import log_loss
import pickle

In [5]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from sklearn.svm import SVC
from sklearn.metrics import log_loss

### Load processed feature datasets

In [6]:
train_data = pd.read_pickle('extracted_data/train_data.pkl')
expanded_pixels_train = pd.read_pickle('processed_data/train/expanded_pixels.pkl')


test_data = pd.read_pickle('extracted_data/test_data.pkl')
expanded_pixels_test = pd.read_pickle('processed_data/test/expanded_pixels.pkl')

### Select features

Select pixel values for each spectral band (B01-B12) as well as the vegetative indices (apart from ARVI) and time-difference values.

In [7]:
sf = SelectFeatures(drop_cols = ['Field_Id', 'Crop_Id_Ne', 'ARVI'])

fit_A = sf.transform([expanded_pixels_train])
predict_A = sf.transform([expanded_pixels_test])

Selected  120  columns 
Use .cols attribute to see all columns

Selected  120  columns 
Use .cols attribute to see all columns



### Pre-process features

* Fill NaN with their mean column values

In [8]:
fit_A = fit_A.fillna(fit_A.mean())
predict_A = predict_A.fillna(predict_A.mean())

In [9]:
fit_A.head()

Unnamed: 0,Field_Id,0322_B02,0804_B04,0322_B03,0131_B08,0322_B04,0804_B02,0804_B03,0819_B08,0620_B08,...,B02_time_diff_SPRSUM,B02_time_diff_SUMAUT,RVI_time_diff_WINSUM,RVI_time_diff_WINSPR,RVI_time_diff_SPRSUM,RVI_time_diff_SUMAUT,B03_time_diff_WINSUM,B03_time_diff_WINSPR,B03_time_diff_SPRSUM,B03_time_diff_SUMAUT
0,1,942.0,1026.0,949.0,3672.0,875.0,1071.0,943.0,2458.0,1897.0,...,0.12845,0.198495,-0.482028,0.013225,-0.493474,-0.034537,-0.050584,-0.076848,0.005269,0.341719
1,1,938.0,1066.0,936.0,3849.0,833.0,1088.0,960.0,2470.0,1921.0,...,0.164179,0.174908,-0.541511,-0.004492,-0.555929,0.000794,0.030272,-0.055499,0.057692,0.315152
2,1,909.0,1094.0,890.0,3995.0,770.0,1103.0,967.0,2470.0,1933.0,...,0.194719,0.195212,-0.614657,-0.022676,-0.60747,0.017715,0.112311,-0.038877,0.108989,0.325228
3,1,873.0,1100.0,855.0,4048.0,686.0,1119.0,954.0,2555.0,2010.0,...,0.25315,0.179159,-0.671705,-0.069641,-0.65898,0.077821,0.076503,-0.065574,0.139181,0.315195
4,1,842.0,1109.0,828.0,4159.0,614.0,1104.0,966.0,2563.0,1989.0,...,0.30285,0.179581,-0.704121,-0.019632,-0.707013,0.075262,0.13164,-0.04388,0.210145,0.277445


### Define classifiers and metaclassifiers

Use a combination of linear, nearest-neighbour and tree-based models both for classification and model stacking. 
Note: UnweightedAv is a custom estimator which simply takes the combines the predictions by taking their (unweighted) average. A weighted average estimator was also tested, using Nelder-Mead weight optimisation but was prone to overfitting.

In [10]:
classifiers = {
'LogReg': LogisticRegression(solver='lbfgs', multi_class='multinomial'),
'RandomForest': RandomForestClassifier(n_estimators = 1000),
'ExtraTrees': ExtraTreesClassifier(n_estimators = 1000),
'XGB': XGBClassifier(silent=False, 
                    n_estimators=1000, learning_rate=0.3, 
                    scale_pos_weight=1, colsample_bytree = 0.4, subsample = 0.9, objective='multi:softprob', 
                    eval_metric='mlogloss', reg_alpha = 0.3, max_depth=6, gamma=5)}

metaclassifiers = {
'LogReg': LogisticRegression(solver='lbfgs', multi_class='multinomial'),
'RandomForest': RandomForestClassifier(n_estimators = 1000),
'UnweightedAv': UnweightedAverage(n_classes=9)}

### Fit ensemble

In [None]:
ensemble_A = ModelEnsemble(clfs=classifiers, mclfs=metaclassifiers).fit(fit_A, expanded_pixels_train)

Fitting classifiers... 

Classifier                  Fold 1 Score             Fold 2 Score
---------------------------------------------------------------------
LogReg                          1.108                    1.241
RandomForest                    0.784                    0.932


### Predict ensemble

Make ensemble predictions for each pixel, and then transform them to predictions for each field (using `PixelToObject`)

In [None]:
pixel_predictions = ensemble_A.predict(predict_A)

In [None]:
field_predictions = PixelToObject().transform(ensemble_A, train_data)

### Make submissions

In [None]:
make_submission(predictions, 'Ensemble_A')