# Fancy Approach (repeat upsampling)

**Descrition:**

Experiment for trying to combine stats from different resolutions.

Just like the other Fancy approch, but this time a different base is used. In this new base, no bilinear upsampling was used. Pixels were just repeated (apparently this yield better results).

In [1]:
%load_ext autoreload

import sys
sys.path.insert(0, "/store/tveiga/IC2017/utilities")

import pandas as pd
import numpy as np

from sklearn.model_selection import LeaveOneOut
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

import xgboost as xgb

In [2]:
%autoreload
from IC2017 import *

<hr>

In [3]:
# Read database from disk
data = pd.read_csv("../IC2017_DATA/FancyApproach_dataset-Repeat.csv", header = [0, 1])

In [4]:
# Copy non feature data from the base, before removing them for training purpose

y = data["GT"].iloc[:, 0] # iloc transform it into a Series
IMG = data["IMG"].iloc[:, 0]
# BLOCK = data["block_num"]
# solo = data["solo"]
base = data["solo"].iloc[:, 0]

#--------------------------

# Remove non feature data from the base, for training purpose

del data["GT"]
del data["IMG"] # big jump in precision by keeping this feature
del data["solo"]

#--------------------------

solo = base # just for variable name compatibility

<hr>

In [5]:
# Definition of the experiments

LessCompact = [0]
MoreCompact = [1]
Calibrated = [0, 1]
NotCalibrated = [2]
experiments = [[LessCompact, MoreCompact], [MoreCompact, LessCompact],
              [Calibrated, NotCalibrated], [NotCalibrated, Calibrated]]

<hr>

## Individual Feature

In [6]:
loo = LeaveOneOut()

In [7]:
VI_cols = names_VI = data.columns.to_list()

In [9]:
%%time

'''
Just like in the standard experiment, here we want to know how effective each feature is for
generelazing. Therefore we should obtain training, validation and test accuracy, and also the
threshold found.
''' 

VI_scores = [[[] for vi in names_VI] for _ in range(4)] # 4 is the number of experiments
VI_scores_val = [[[] for vi in names_VI] for _ in range(4)]
VI_scores_test = [[0 for vi in names_VI] for _ in range(4)]
THS = [[[] for vi in names_VI] for _ in range(4)]

for e, (left, test) in enumerate(experiments):
    print("Experiment", left, test)
    
    images = np.unique(IMG.loc[solo.isin(left)])
    for vi, col in enumerate(VI_cols):
        print('>', col)
        
        for _in, _out in loo.split(images):
            train_imgs, test_imgs = images[_in], images[_out]
            
            # fit (find th in fit set)
            # frac = .2
            # fit_mask = IMG.isin(train_imgs) & (np.random.rand(len(data)) <= frac)
            fit_mask = IMG.isin(train_imgs)
            prediction = data.loc[fit_mask, col]
            fit_y = y.loc[fit_mask]
            _auc, mean_acc, bestTH, fpr, tpr = getPerformance(fit_y, prediction)
            
            # val score
            val_mask = IMG.isin(test_imgs)
            prediction = data.loc[val_mask, col]
            val_y = y.loc[val_mask]
            val_mean_acc = accuracy_score(val_y, 1 * (prediction > bestTH))
            
            # save results (fit, val and th)
            VI_scores[e][vi].append(mean_acc)
            VI_scores_val[e][vi].append(val_mean_acc)
            THS[e][vi].append(bestTH)
            print(" > > %s\t%s \t Fit vs Val (MAc) = %.3f vs %.3f" % (*col, mean_acc, val_mean_acc))
            
        ths = pd.Series(THS[e][vi])
        print(" > > BEST-TH: %.3f (%.3f)" % (ths.median(), ths.std()))
        
        # test score
        test_mask = solo.isin(test)
        prediction = data.loc[test_mask, col]
        test_y = y.loc[test_mask]
        test_mean_acc = accuracy_score(test_y, 1 * (prediction > bestTH))
        
        VI_scores_test[e][vi] = test_mean_acc
        print(' > > Test Acc: %.3f' % test_mean_acc)
        
        
print("Done.")   

Experiment [0] [1]
> ('8', 'mean')
 > > 8	mean 	 Fit vs Val (MAc) = 0.913 vs 1.000
 > > 8	mean 	 Fit vs Val (MAc) = 0.917 vs 0.978
 > > 8	mean 	 Fit vs Val (MAc) = 0.924 vs 0.808
 > > 8	mean 	 Fit vs Val (MAc) = 0.916 vs 0.942
 > > 8	mean 	 Fit vs Val (MAc) = 0.922 vs 0.956
 > > 8	mean 	 Fit vs Val (MAc) = 0.915 vs 0.924
 > > 8	mean 	 Fit vs Val (MAc) = 0.916 vs 0.938
 > > 8	mean 	 Fit vs Val (MAc) = 0.910 vs 0.742
 > > 8	mean 	 Fit vs Val (MAc) = 0.917 vs 0.919
 > > 8	mean 	 Fit vs Val (MAc) = 0.916 vs 0.984
 > > 8	mean 	 Fit vs Val (MAc) = 0.894 vs 0.857
 > > 8	mean 	 Fit vs Val (MAc) = 0.915 vs nan
 > > 8	mean 	 Fit vs Val (MAc) = 0.913 vs 0.997
 > > 8	mean 	 Fit vs Val (MAc) = 0.923 vs 0.869
 > > 8	mean 	 Fit vs Val (MAc) = 0.913 vs 1.000
 > > 8	mean 	 Fit vs Val (MAc) = 0.920 vs 0.955
 > > 8	mean 	 Fit vs Val (MAc) = 0.910 vs 0.915
 > > BEST-TH: 0.365 (0.004)
 > > Test Acc: 0.838
> ('8', 'max')
 > > 8	max 	 Fit vs Val (MAc) = 0.870 vs 1.000
 > > 8	max 	 Fit vs Val (MAc) = 0.874 vs

 > > 32	max 	 Fit vs Val (MAc) = 0.901 vs 0.971
 > > 32	max 	 Fit vs Val (MAc) = 0.883 vs 0.942
 > > 32	max 	 Fit vs Val (MAc) = 0.905 vs 0.952
 > > 32	max 	 Fit vs Val (MAc) = 0.901 vs 0.980
 > > 32	max 	 Fit vs Val (MAc) = 0.863 vs 0.865
 > > 32	max 	 Fit vs Val (MAc) = 0.899 vs nan
 > > 32	max 	 Fit vs Val (MAc) = 0.899 vs 0.991
 > > 32	max 	 Fit vs Val (MAc) = 0.908 vs 0.860
 > > 32	max 	 Fit vs Val (MAc) = 0.899 vs 0.980
 > > 32	max 	 Fit vs Val (MAc) = 0.907 vs 0.841
 > > 32	max 	 Fit vs Val (MAc) = 0.887 vs 0.937
 > > BEST-TH: 0.569 (0.015)
 > > Test Acc: 0.778
> ('32', 'min')
 > > 32	min 	 Fit vs Val (MAc) = 0.732 vs 0.868
 > > 32	min 	 Fit vs Val (MAc) = 0.734 vs 0.874
 > > 32	min 	 Fit vs Val (MAc) = 0.743 vs 0.647
 > > 32	min 	 Fit vs Val (MAc) = 0.728 vs 0.856
 > > 32	min 	 Fit vs Val (MAc) = 0.738 vs 0.789
 > > 32	min 	 Fit vs Val (MAc) = 0.735 vs 0.765
 > > 32	min 	 Fit vs Val (MAc) = 0.735 vs 0.832
 > > 32	min 	 Fit vs Val (MAc) = 0.695 vs 0.807
 > > 32	min 	 Fit vs Val 

 > > 128	min 	 Fit vs Val (MAc) = 0.772 vs nan
 > > 128	min 	 Fit vs Val (MAc) = 0.773 vs 0.721
 > > 128	min 	 Fit vs Val (MAc) = 0.774 vs 0.753
 > > 128	min 	 Fit vs Val (MAc) = 0.768 vs 0.849
 > > 128	min 	 Fit vs Val (MAc) = 0.770 vs 0.853
 > > 128	min 	 Fit vs Val (MAc) = 0.757 vs 0.775
 > > BEST-TH: 0.252 (0.013)
 > > Test Acc: 0.786
> ('128', 'std')
 > > 128	std 	 Fit vs Val (MAc) = 0.801 vs 0.901
 > > 128	std 	 Fit vs Val (MAc) = 0.804 vs 0.891
 > > 128	std 	 Fit vs Val (MAc) = 0.818 vs 0.815
 > > 128	std 	 Fit vs Val (MAc) = 0.805 vs 0.814
 > > 128	std 	 Fit vs Val (MAc) = 0.806 vs 0.880
 > > 128	std 	 Fit vs Val (MAc) = 0.803 vs 0.892
 > > 128	std 	 Fit vs Val (MAc) = 0.805 vs 0.844
 > > 128	std 	 Fit vs Val (MAc) = 0.785 vs 0.866
 > > 128	std 	 Fit vs Val (MAc) = 0.806 vs 0.885
 > > 128	std 	 Fit vs Val (MAc) = 0.805 vs 0.832
 > > 128	std 	 Fit vs Val (MAc) = 0.784 vs 0.778
 > > 128	std 	 Fit vs Val (MAc) = 0.803 vs nan
 > > 128	std 	 Fit vs Val (MAc) = 0.803 vs 0.859
 > > 12

 > > Test Acc: 0.944
> ('16', 'min')
 > > 16	min 	 Fit vs Val (MAc) = 0.720 vs 0.855
 > > 16	min 	 Fit vs Val (MAc) = 0.722 vs 0.880
 > > 16	min 	 Fit vs Val (MAc) = 0.734 vs 0.608
 > > 16	min 	 Fit vs Val (MAc) = 0.721 vs 0.804
 > > 16	min 	 Fit vs Val (MAc) = 0.726 vs 0.786
 > > 16	min 	 Fit vs Val (MAc) = 0.724 vs 0.719
 > > 16	min 	 Fit vs Val (MAc) = 0.723 vs 0.814
 > > 16	min 	 Fit vs Val (MAc) = 0.693 vs 0.787
 > > 16	min 	 Fit vs Val (MAc) = 0.727 vs 0.647
 > > 16	min 	 Fit vs Val (MAc) = 0.721 vs 0.873
 > > 16	min 	 Fit vs Val (MAc) = 0.776 vs 0.486
 > > 16	min 	 Fit vs Val (MAc) = 0.722 vs nan
 > > 16	min 	 Fit vs Val (MAc) = 0.722 vs 0.825
 > > 16	min 	 Fit vs Val (MAc) = 0.723 vs 0.857
 > > 16	min 	 Fit vs Val (MAc) = 0.718 vs 0.934
 > > 16	min 	 Fit vs Val (MAc) = 0.720 vs 0.943
 > > 16	min 	 Fit vs Val (MAc) = 0.695 vs 0.871
 > > 16	min 	 Fit vs Val (MAc) = 0.723 vs 0.800
 > > 16	min 	 Fit vs Val (MAc) = 0.721 vs 0.562
 > > 16	min 	 Fit vs Val (MAc) = 0.723 vs 0.708
 > > 

 > > Test Acc: 0.935
> ('64', 'min')
 > > 64	min 	 Fit vs Val (MAc) = 0.729 vs 0.829
 > > 64	min 	 Fit vs Val (MAc) = 0.731 vs 0.818
 > > 64	min 	 Fit vs Val (MAc) = 0.737 vs 0.634
 > > 64	min 	 Fit vs Val (MAc) = 0.722 vs 0.864
 > > 64	min 	 Fit vs Val (MAc) = 0.733 vs 0.755
 > > 64	min 	 Fit vs Val (MAc) = 0.730 vs 0.753
 > > 64	min 	 Fit vs Val (MAc) = 0.730 vs 0.803
 > > 64	min 	 Fit vs Val (MAc) = 0.708 vs 0.702
 > > 64	min 	 Fit vs Val (MAc) = 0.733 vs 0.690
 > > 64	min 	 Fit vs Val (MAc) = 0.728 vs 0.895
 > > 64	min 	 Fit vs Val (MAc) = 0.789 vs 0.490
 > > 64	min 	 Fit vs Val (MAc) = 0.730 vs nan
 > > 64	min 	 Fit vs Val (MAc) = 0.730 vs 0.810
 > > 64	min 	 Fit vs Val (MAc) = 0.730 vs 0.852
 > > 64	min 	 Fit vs Val (MAc) = 0.726 vs 0.926
 > > 64	min 	 Fit vs Val (MAc) = 0.728 vs 0.925
 > > 64	min 	 Fit vs Val (MAc) = 0.708 vs 0.779
 > > 64	min 	 Fit vs Val (MAc) = 0.731 vs 0.757
 > > 64	min 	 Fit vs Val (MAc) = 0.732 vs 0.711
 > > 64	min 	 Fit vs Val (MAc) = 0.737 vs 0.794
 > > 

 > > 8	mean 	 Fit vs Val (MAc) = 0.851 vs 0.770
 > > 8	mean 	 Fit vs Val (MAc) = 0.851 vs 0.848
 > > 8	mean 	 Fit vs Val (MAc) = 0.848 vs nan
 > > 8	mean 	 Fit vs Val (MAc) = 0.853 vs 0.753
 > > 8	mean 	 Fit vs Val (MAc) = 0.852 vs 0.791
 > > 8	mean 	 Fit vs Val (MAc) = 0.853 vs 0.816
 > > 8	mean 	 Fit vs Val (MAc) = 0.851 vs 0.800
 > > 8	mean 	 Fit vs Val (MAc) = 0.849 vs 0.876
 > > 8	mean 	 Fit vs Val (MAc) = 0.846 vs 0.879
 > > 8	mean 	 Fit vs Val (MAc) = 0.853 vs 0.688
 > > 8	mean 	 Fit vs Val (MAc) = 0.837 vs 0.915
 > > 8	mean 	 Fit vs Val (MAc) = 0.844 vs 0.860
 > > 8	mean 	 Fit vs Val (MAc) = 0.848 vs nan
 > > 8	mean 	 Fit vs Val (MAc) = 0.852 vs 0.842
 > > 8	mean 	 Fit vs Val (MAc) = 0.854 vs 0.840
 > > 8	mean 	 Fit vs Val (MAc) = 0.853 vs 0.628
 > > 8	mean 	 Fit vs Val (MAc) = 0.847 vs 0.882
 > > 8	mean 	 Fit vs Val (MAc) = 0.852 vs 0.819
 > > 8	mean 	 Fit vs Val (MAc) = 0.838 vs 0.927
 > > BEST-TH: 0.370 (0.001)
 > > Test Acc: 0.839
> ('8', 'max')
 > > 8	max 	 Fit vs Val (MAc

 > > Test Acc: 0.854
> ('16', 'max')
 > > 16	max 	 Fit vs Val (MAc) = 0.834 vs 1.000
 > > 16	max 	 Fit vs Val (MAc) = 0.835 vs 0.978
 > > 16	max 	 Fit vs Val (MAc) = 0.840 vs 0.878
 > > 16	max 	 Fit vs Val (MAc) = 0.835 vs 0.945
 > > 16	max 	 Fit vs Val (MAc) = 0.837 vs 0.956
 > > 16	max 	 Fit vs Val (MAc) = 0.836 vs 0.959
 > > 16	max 	 Fit vs Val (MAc) = 0.835 vs 0.983
 > > 16	max 	 Fit vs Val (MAc) = 0.829 vs 0.977
 > > 16	max 	 Fit vs Val (MAc) = 0.836 vs 0.956
 > > 16	max 	 Fit vs Val (MAc) = 0.835 vs 0.984
 > > 16	max 	 Fit vs Val (MAc) = 0.824 vs 0.931
 > > 16	max 	 Fit vs Val (MAc) = 0.836 vs nan
 > > 16	max 	 Fit vs Val (MAc) = 0.834 vs 0.997
 > > 16	max 	 Fit vs Val (MAc) = 0.837 vs 0.949
 > > 16	max 	 Fit vs Val (MAc) = 0.834 vs 1.000
 > > 16	max 	 Fit vs Val (MAc) = 0.837 vs 0.935
 > > 16	max 	 Fit vs Val (MAc) = 0.831 vs 0.955
 > > 16	max 	 Fit vs Val (MAc) = 0.836 vs 0.962
 > > 16	max 	 Fit vs Val (MAc) = 0.842 vs 0.746
 > > 16	max 	 Fit vs Val (MAc) = 0.840 vs 0.737
 > > 

 > > 32	max 	 Fit vs Val (MAc) = 0.847 vs 0.967
 > > 32	max 	 Fit vs Val (MAc) = 0.852 vs 0.785
 > > 32	max 	 Fit vs Val (MAc) = 0.851 vs 0.803
 > > 32	max 	 Fit vs Val (MAc) = 0.847 vs nan
 > > 32	max 	 Fit vs Val (MAc) = 0.852 vs 0.769
 > > 32	max 	 Fit vs Val (MAc) = 0.851 vs 0.793
 > > 32	max 	 Fit vs Val (MAc) = 0.853 vs 0.751
 > > 32	max 	 Fit vs Val (MAc) = 0.852 vs 0.811
 > > 32	max 	 Fit vs Val (MAc) = 0.848 vs 0.837
 > > 32	max 	 Fit vs Val (MAc) = 0.844 vs 0.840
 > > 32	max 	 Fit vs Val (MAc) = 0.852 vs 0.749
 > > 32	max 	 Fit vs Val (MAc) = 0.833 vs 0.948
 > > 32	max 	 Fit vs Val (MAc) = 0.842 vs 0.911
 > > 32	max 	 Fit vs Val (MAc) = 0.847 vs nan
 > > 32	max 	 Fit vs Val (MAc) = 0.850 vs 0.815
 > > 32	max 	 Fit vs Val (MAc) = 0.853 vs 0.765
 > > 32	max 	 Fit vs Val (MAc) = 0.850 vs 0.806
 > > 32	max 	 Fit vs Val (MAc) = 0.844 vs 0.887
 > > 32	max 	 Fit vs Val (MAc) = 0.852 vs 0.832
 > > 32	max 	 Fit vs Val (MAc) = 0.834 vs 0.950
 > > BEST-TH: 0.560 (0.003)
 > > Test Acc: 0

 > > 64	max 	 Fit vs Val (MAc) = 0.848 vs 0.852
 > > 64	max 	 Fit vs Val (MAc) = 0.829 vs 0.973
 > > BEST-TH: 0.453 (0.002)
 > > Test Acc: 0.855
> ('64', 'min')
 > > 64	min 	 Fit vs Val (MAc) = 0.746 vs 0.726
 > > 64	min 	 Fit vs Val (MAc) = 0.747 vs 0.708
 > > 64	min 	 Fit vs Val (MAc) = 0.751 vs 0.504
 > > 64	min 	 Fit vs Val (MAc) = 0.744 vs 0.797
 > > 64	min 	 Fit vs Val (MAc) = 0.748 vs 0.634
 > > 64	min 	 Fit vs Val (MAc) = 0.748 vs 0.639
 > > 64	min 	 Fit vs Val (MAc) = 0.747 vs 0.676
 > > 64	min 	 Fit vs Val (MAc) = 0.742 vs 0.726
 > > 64	min 	 Fit vs Val (MAc) = 0.750 vs 0.547
 > > 64	min 	 Fit vs Val (MAc) = 0.744 vs 0.821
 > > 64	min 	 Fit vs Val (MAc) = 0.759 vs 0.570
 > > 64	min 	 Fit vs Val (MAc) = 0.746 vs nan
 > > 64	min 	 Fit vs Val (MAc) = 0.747 vs 0.692
 > > 64	min 	 Fit vs Val (MAc) = 0.746 vs 0.756
 > > 64	min 	 Fit vs Val (MAc) = 0.744 vs 0.843
 > > 64	min 	 Fit vs Val (MAc) = 0.744 vs 0.861
 > > 64	min 	 Fit vs Val (MAc) = 0.741 vs 0.802
 > > 64	min 	 Fit vs Val 

 > > 128	min 	 Fit vs Val (MAc) = 0.769 vs 0.733
 > > 128	min 	 Fit vs Val (MAc) = 0.769 vs 0.763
 > > 128	min 	 Fit vs Val (MAc) = 0.767 vs 0.846
 > > 128	min 	 Fit vs Val (MAc) = 0.767 vs 0.852
 > > 128	min 	 Fit vs Val (MAc) = 0.763 vs 0.814
 > > 128	min 	 Fit vs Val (MAc) = 0.769 vs 0.703
 > > 128	min 	 Fit vs Val (MAc) = 0.771 vs 0.701
 > > 128	min 	 Fit vs Val (MAc) = 0.771 vs 0.793
 > > 128	min 	 Fit vs Val (MAc) = 0.768 vs nan
 > > 128	min 	 Fit vs Val (MAc) = 0.770 vs 0.769
 > > 128	min 	 Fit vs Val (MAc) = 0.770 vs 0.781
 > > 128	min 	 Fit vs Val (MAc) = 0.770 vs 0.818
 > > 128	min 	 Fit vs Val (MAc) = 0.770 vs 0.736
 > > 128	min 	 Fit vs Val (MAc) = 0.770 vs 0.797
 > > 128	min 	 Fit vs Val (MAc) = 0.767 vs 0.844
 > > 128	min 	 Fit vs Val (MAc) = 0.770 vs 0.737
 > > 128	min 	 Fit vs Val (MAc) = 0.755 vs 0.945
 > > 128	min 	 Fit vs Val (MAc) = 0.764 vs 0.818
 > > 128	min 	 Fit vs Val (MAc) = 0.768 vs nan
 > > 128	min 	 Fit vs Val (MAc) = 0.770 vs 0.828
 > > 128	min 	 Fit vs Va

 > > 8	min 	 Fit vs Val (MAc) = 0.708 vs nan
 > > 8	min 	 Fit vs Val (MAc) = 0.708 vs 0.624
 > > BEST-TH: 0.052 (0.003)
 > > Test Acc: 0.582
> ('8', 'std')
 > > 8	std 	 Fit vs Val (MAc) = 0.770 vs 1.000
 > > 8	std 	 Fit vs Val (MAc) = 0.772 vs 0.978
 > > 8	std 	 Fit vs Val (MAc) = 0.784 vs 0.872
 > > 8	std 	 Fit vs Val (MAc) = 0.772 vs 0.898
 > > 8	std 	 Fit vs Val (MAc) = 0.775 vs 0.956
 > > 8	std 	 Fit vs Val (MAc) = 0.776 vs 0.940
 > > 8	std 	 Fit vs Val (MAc) = 0.772 vs 0.983
 > > 8	std 	 Fit vs Val (MAc) = 0.761 vs 0.746
 > > 8	std 	 Fit vs Val (MAc) = 0.775 vs 0.952
 > > 8	std 	 Fit vs Val (MAc) = 0.772 vs 0.984
 > > 8	std 	 Fit vs Val (MAc) = 0.745 vs 0.764
 > > 8	std 	 Fit vs Val (MAc) = 0.772 vs nan
 > > 8	std 	 Fit vs Val (MAc) = 0.770 vs 0.997
 > > 8	std 	 Fit vs Val (MAc) = 0.786 vs 0.300
 > > 8	std 	 Fit vs Val (MAc) = 0.773 vs 0.625
 > > 8	std 	 Fit vs Val (MAc) = 0.789 vs 0.155
 > > 8	std 	 Fit vs Val (MAc) = 0.763 vs 0.736
 > > 8	std 	 Fit vs Val (MAc) = 0.774 vs 0.962


 > > 32	min 	 Fit vs Val (MAc) = 0.722 vs 0.740
 > > 32	min 	 Fit vs Val (MAc) = 0.722 vs 0.836
 > > 32	min 	 Fit vs Val (MAc) = 0.692 vs 0.807
 > > 32	min 	 Fit vs Val (MAc) = 0.725 vs 0.658
 > > 32	min 	 Fit vs Val (MAc) = 0.718 vs 0.917
 > > 32	min 	 Fit vs Val (MAc) = 0.776 vs 0.457
 > > 32	min 	 Fit vs Val (MAc) = 0.721 vs nan
 > > 32	min 	 Fit vs Val (MAc) = 0.721 vs 0.820
 > > 32	min 	 Fit vs Val (MAc) = 0.721 vs 0.879
 > > 32	min 	 Fit vs Val (MAc) = 0.718 vs 0.938
 > > 32	min 	 Fit vs Val (MAc) = 0.720 vs 0.945
 > > 32	min 	 Fit vs Val (MAc) = 0.695 vs 0.867
 > > 32	min 	 Fit vs Val (MAc) = 0.724 vs 0.752
 > > 32	min 	 Fit vs Val (MAc) = 0.720 vs 0.650
 > > 32	min 	 Fit vs Val (MAc) = 0.726 vs 0.764
 > > 32	min 	 Fit vs Val (MAc) = 0.721 vs nan
 > > 32	min 	 Fit vs Val (MAc) = 0.725 vs 0.792
 > > BEST-TH: 0.116 (0.004)
 > > Test Acc: 0.739
> ('32', 'std')
 > > 32	std 	 Fit vs Val (MAc) = 0.794 vs 0.998
 > > 32	std 	 Fit vs Val (MAc) = 0.796 vs 0.975
 > > 32	std 	 Fit vs Val (M

 > > 128	max 	 Fit vs Val (MAc) = 0.822 vs 0.892
 > > 128	max 	 Fit vs Val (MAc) = 0.825 vs 0.853
 > > 128	max 	 Fit vs Val (MAc) = 0.821 vs 0.918
 > > 128	max 	 Fit vs Val (MAc) = 0.823 vs 0.877
 > > 128	max 	 Fit vs Val (MAc) = 0.805 vs 0.888
 > > 128	max 	 Fit vs Val (MAc) = 0.822 vs 0.899
 > > 128	max 	 Fit vs Val (MAc) = 0.836 vs 0.811
 > > 128	max 	 Fit vs Val (MAc) = 0.836 vs 0.815
 > > 128	max 	 Fit vs Val (MAc) = 0.821 vs nan
 > > 128	max 	 Fit vs Val (MAc) = 0.833 vs 0.809
 > > BEST-TH: 0.380 (0.003)
 > > Test Acc: 0.871
> ('128', 'min')
 > > 128	min 	 Fit vs Val (MAc) = 0.749 vs 0.754
 > > 128	min 	 Fit vs Val (MAc) = 0.750 vs 0.744
 > > 128	min 	 Fit vs Val (MAc) = 0.756 vs 0.565
 > > 128	min 	 Fit vs Val (MAc) = 0.744 vs 0.813
 > > 128	min 	 Fit vs Val (MAc) = 0.752 vs 0.673
 > > 128	min 	 Fit vs Val (MAc) = 0.750 vs 0.692
 > > 128	min 	 Fit vs Val (MAc) = 0.751 vs 0.695
 > > 128	min 	 Fit vs Val (MAc) = 0.733 vs 0.720
 > > 128	min 	 Fit vs Val (MAc) = 0.753 vs 0.612
 > > 

In [10]:
for e, (left, test) in enumerate(experiments):
    print("Experiment", left, test)
    exp_fit = VI_scores[e]
    exp_val = VI_scores_val[e]
    for vi, (r, f) in enumerate(VI_cols):
        ar_fit = pd.Series(exp_fit[vi])
        ar_val = pd.Series(exp_val[vi])
        
        results = (r, f , ar_fit.mean(), ar_fit.std(), ar_val.mean(), ar_val.std(), VI_scores_test[e][vi])
        print(" > > %-3s %s \t Fit vs Val vs Test (MAc) = %.3f (%.3f) vs %.3f (%.3f) vs %.3f" % results)
        if vi % 4 == 3:
            print()
    print()
    
# TODO: show generalization after getting mean th

Experiment [0] [1]
 > > 8   mean 	 Fit vs Val vs Test (MAc) = 0.915 (0.007) vs 0.924 (0.073) vs 0.838
 > > 8   max 	 Fit vs Val vs Test (MAc) = 0.869 (0.018) vs 0.939 (0.065) vs 0.606
 > > 8   min 	 Fit vs Val vs Test (MAc) = 0.713 (0.018) vs 0.693 (0.147) vs 0.445
 > > 8   std 	 Fit vs Val vs Test (MAc) = 0.852 (0.019) vs 0.921 (0.111) vs 0.676

 > > 16  mean 	 Fit vs Val vs Test (MAc) = 0.931 (0.007) vs 0.950 (0.040) vs 0.881
 > > 16  max 	 Fit vs Val vs Test (MAc) = 0.886 (0.013) vs 0.932 (0.067) vs 0.439
 > > 16  min 	 Fit vs Val vs Test (MAc) = 0.725 (0.021) vs 0.779 (0.126) vs 0.559
 > > 16  std 	 Fit vs Val vs Test (MAc) = 0.863 (0.016) vs 0.917 (0.107) vs 0.585

 > > 32  mean 	 Fit vs Val vs Test (MAc) = 0.925 (0.007) vs 0.946 (0.035) vs 0.894
 > > 32  max 	 Fit vs Val vs Test (MAc) = 0.898 (0.012) vs 0.941 (0.051) vs 0.778
 > > 32  min 	 Fit vs Val vs Test (MAc) = 0.734 (0.024) vs 0.812 (0.122) vs 0.682
 > > 32  std 	 Fit vs Val vs Test (MAc) = 0.867 (0.014) vs 0.918 (0.084) v

<hr>

## Ensemble

In [122]:
def XGBTrain2(X_train, y_train, n_trees, verbose = False, val_imgs = None):

#     n = len(X_train)
#     f = .02
#     ix = X_train.index.values.copy()
#     np.random.shuffle(ix)
#     X_train = X_train.loc[ix[:int(n * f)]]
#     y_train = y_train.loc[ix[:int(n * f)]]
    
    
    ratio = float(np.sum(y_train == 1)) / np.sum(y_train==0)

    clf = xgb.XGBClassifier(
                    max_depth = 2,
                    n_estimators=n_trees,
                    learning_rate=0.1, 
                    nthread=2,
                    subsample=1,
                    colsample_bytree=1,
                    scale_pos_weight = ratio,
                    reg_alpha=0,
                    seed=1301)
    
    if type(val_imgs) != np.ndarray:
        eval_set = [
            (X_train, y_train),
        ]
    else:
        X_val, y_val = data.loc[IMG.isin(val_imgs)], y.loc[IMG.isin(val_imgs)]
        eval_set = [
            (X_train, y_train),
            (X_val, y_val),
        ]
        
    clf.fit(X_train, y_train, early_stopping_rounds=100 , eval_metric="auc",
            eval_set=eval_set,
            verbose = verbose,
           )
    
    return clf

In [30]:
fit_imgs = IMG.loc[solo.isin([0, 1])].unique()
val_imgs = IMG.loc[solo.isin([2])].unique()
XGBTrain2(data, y, 100, verbose = True, val_imgs = val_imgs)

[0]	validation_0-auc:0.875605	validation_1-auc:0.894875
Multiple eval metrics have been passed: 'validation_1-auc' will be used for early stopping.

Will train until validation_1-auc hasn't improved in 100 rounds.
[1]	validation_0-auc:0.894564	validation_1-auc:0.898531
[2]	validation_0-auc:0.895076	validation_1-auc:0.905602
[3]	validation_0-auc:0.898477	validation_1-auc:0.906858
[4]	validation_0-auc:0.898575	validation_1-auc:0.906693
[5]	validation_0-auc:0.898919	validation_1-auc:0.907464
[6]	validation_0-auc:0.906975	validation_1-auc:0.91042
[7]	validation_0-auc:0.906987	validation_1-auc:0.91061
[8]	validation_0-auc:0.908277	validation_1-auc:0.915593
[9]	validation_0-auc:0.912474	validation_1-auc:0.916657
[10]	validation_0-auc:0.912519	validation_1-auc:0.917569
[11]	validation_0-auc:0.913016	validation_1-auc:0.917981
[12]	validation_0-auc:0.914175	validation_1-auc:0.921947
[13]	validation_0-auc:0.914177	validation_1-auc:0.921822
[14]	validation_0-auc:0.914825	validation_1-auc:0.923953

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=2, min_child_weight=1, missing=None, n_estimators=100,
       n_jobs=1, nthread=6, objective='binary:logistic', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=3.2863203482334225,
       seed=1301, silent=True, subsample=1)

In [31]:
data.columns.to_list()

[('8', 'mean'),
 ('8', 'max'),
 ('8', 'min'),
 ('8', 'std'),
 ('16', 'mean'),
 ('16', 'max'),
 ('16', 'min'),
 ('16', 'std'),
 ('32', 'mean'),
 ('32', 'max'),
 ('32', 'min'),
 ('32', 'std'),
 ('64', 'mean'),
 ('64', 'max'),
 ('64', 'min'),
 ('64', 'std'),
 ('128', 'mean'),
 ('128', 'max'),
 ('128', 'min'),
 ('128', 'std'),
 ('512', 'ori')]

In [124]:
subset = data[[
#     ('64', 'mean'),
    ('32', 'mean'),
    ('16', 'mean'),
#     ('8', 'mean'),
    ('32', 'max'),
    ('64', 'max'),
#     ('128', 'min'),
#     ('32', 'std'),
#     ('16', 'std'),
    ('512', 'ori')
]]

In [125]:
# %%time
'''
Here we want to obtain and ensemble of theses features
''' 

# tree_list = list(range(30, 60, 10))
tree_list = [35]

VI_scores = [[[] for vi in tree_list] for _ in range(4)] # 4 is the number of experiments
VI_scores_val = [[[] for t in tree_list] for _ in range(4)]
VI_scores_test = [[0 for t in tree_list] for _ in range(4)]
THS = [[[] for vi in names_VI] for _ in range(4)]

for e, (left, test) in enumerate(experiments):
    print("Experiment", left, test)
    
    images = np.unique(IMG.loc[solo.isin(left)])
    for t, trees in enumerate(tree_list):

        print('>', trees)
        
        for _in, _out in loo.split(images):
            train_imgs, test_imgs = images[_in], images[_out]
            
            # fit score
            frac = .01
            fit_mask = IMG.isin(train_imgs) & (np.random.rand(len(data)) <= frac)
            fit_y = y.loc[fit_mask]
            clf = XGBTrain2(subset.loc[fit_mask], fit_y, trees)
            prediction = clf.predict_proba(subset.loc[fit_mask])[:,-1]
            _auc, mean_acc, bestTH, fpr, tpr = getPerformance(fit_y, prediction)
            
            # val score
            val_mask = IMG.isin(test_imgs)
            prediction = clf.predict_proba(subset.loc[val_mask])[:,-1]
            val_y = y.loc[val_mask]
            val_mean_acc = accuracy_score(val_y, 1 * (prediction > bestTH))
            
            # save
            VI_scores[e][t].append(mean_acc)
            VI_scores_val[e][t].append(val_mean_acc)
            THS[e][t].append(bestTH)
            print(" > > %2d \t Fit vs Val (MAc) = %.3f vs %.3f" % (test_imgs[0], mean_acc, val_mean_acc))
        
        # test score
        ar = np.array(THS[e][t])
        bestTH = np.median(ar)
        print(" > > BEST-TH: %.3f (%.3f)" % (bestTH, ar.std()))
        
        test_mask = solo.isin(test)
        prediction = clf.predict_proba(subset.loc[test_mask])[:,-1]
        test_y = y.loc[test_mask]
        test_mean_acc = accuracy_score(test_y, 1 * (prediction > bestTH))
        
        VI_scores_test[e][t] = test_mean_acc
        print(" > > TEST Score : %.3f" % test_mean_acc)
        
print("Done.")   

Experiment [0] [1]
> 35
 > >  0 	 Fit vs Val (MAc) = 0.934 vs 0.995
 > >  1 	 Fit vs Val (MAc) = 0.935 vs 0.980
 > >  2 	 Fit vs Val (MAc) = 0.946 vs 0.846
 > >  3 	 Fit vs Val (MAc) = 0.936 vs 0.955
 > >  4 	 Fit vs Val (MAc) = 0.937 vs 0.957
 > >  5 	 Fit vs Val (MAc) = 0.938 vs 0.969
 > >  6 	 Fit vs Val (MAc) = 0.935 vs 0.967
 > >  7 	 Fit vs Val (MAc) = 0.931 vs 0.856
 > >  8 	 Fit vs Val (MAc) = 0.937 vs 0.933
 > >  9 	 Fit vs Val (MAc) = 0.936 vs 0.985
 > > 10 	 Fit vs Val (MAc) = 0.917 vs 0.926
 > > 12 	 Fit vs Val (MAc) = 0.934 vs 0.985
 > > 13 	 Fit vs Val (MAc) = 0.939 vs 0.928
 > > 14 	 Fit vs Val (MAc) = 0.933 vs 1.000
 > > 15 	 Fit vs Val (MAc) = 0.937 vs 0.950
 > > 16 	 Fit vs Val (MAc) = 0.928 vs 0.930
 > > 17 	 Fit vs Val (MAc) = 0.936 vs 0.974
 > > BEST-TH: 0.963 (0.006)
 > > TEST Score : 0.892
Experiment [1] [0]
> 35
 > > 18 	 Fit vs Val (MAc) = 0.871 vs 0.863
 > > 19 	 Fit vs Val (MAc) = 0.873 vs 0.872
 > > 21 	 Fit vs Val (MAc) = 0.874 vs 0.876
 > > 22 	 Fit vs Val

KeyboardInterrupt: 

In [117]:
for e, (left, test) in enumerate(experiments):
    print("Experiment", left, test)
    exp_fit = VI_scores[e]
    exp_val = VI_scores_val[e]
    for t, trees in enumerate(tree_list):
        ar_fit = pd.Series(exp_fit[t])
        ar_val = pd.Series(exp_val[t])
        
        results = (trees, ar_fit.mean(), ar_fit.std(), ar_val.mean(), ar_val.std(), VI_scores_test[e][t])
        print(" > > %d \t Fit vs Val vs Test (MAc) = %.3f (%.3f) vs %.3f (%.3f) vs %.3f" % results)

Experiment [0] [1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.934 (0.007) vs 0.945 (0.050) vs 0.878
Experiment [1] [0]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.869 (0.006) vs 0.889 (0.042) vs 0.935
Experiment [0, 1] [2]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.885 (0.003) vs 0.912 (0.052) vs 0.857
Experiment [2] [0, 1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.870 (0.010) vs 0.838 (0.194) vs 0.915


In [None]:
"""
0.894
0.937
0.851
0.924 ?


#subset = data[[
#     ('64', 'mean'),
    ('32', 'mean'),
    ('16', 'mean'),
#     ('8', 'mean'),
    ('32', 'max'),
    ('64', 'max'),
#     ('128', 'min'),
#     ('32', 'std'),
#     ('8', 'std'),
    ('512', 'ori')
]]
max_depth = 2,
n_estimators=n_trees,
learning_rate=0.1, 
nthread=6,
subsample=1,
colsample_bytree=1,
scale_pos_weight = ratio,
reg_alpha=0,
seed=1301
Experiment [0] [1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.935 (0.005) vs 0.951 (0.040) vs 0.894
Experiment [1] [0]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.868 (0.006) vs 0.889 (0.043) vs 0.945
Experiment [0, 1] [2]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.885 (0.004) vs 0.915 (0.048) vs 0.858
Experiment [2] [0, 1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.865 (0.010) vs 0.835 (0.195) vs 0.917
 

# 5 feat
Experiment [0] [1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.934 (0.006) vs 0.952 (0.041) vs 0.893
Experiment [1] [0]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.868 (0.006) vs 0.889 (0.040) vs 0.937
Experiment [0, 1] [2]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.885 (0.004) vs 0.913 (0.052) vs 0.856
Experiment [2] [0, 1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.867 (0.010) vs 0.834 (0.198) vs 0.914

# below but LOO fixed
Experiment [0] [1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.928 (0.007) vs 0.940 (0.074) vs 0.882
Experiment [1] [0]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.861 (0.006) vs 0.886 (0.041) vs 0.951
Experiment [0, 1] [2]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.881 (0.004) vs 0.912 (0.056) vs 0.858
Experiment [2] [0, 1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.867 (0.009) vs 0.834 (0.196) vs 0.915


# Best of each function + original + no subcolumn in xgb
Experiment [0] [1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.931 (0.007) vs 0.951 (0.042) vs 0.890
Experiment [1] [0]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.902 (0.008) vs 0.930 (0.057) vs 0.943
Experiment [0, 1] [2]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.880 (0.004) vs 0.912 (0.059) vs 0.859
Experiment [2] [0, 1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.894 (0.008) vs 0.927 (0.059) vs 0.909
 

# Best of each function + original
Experiment [0] [1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.929 (0.008) vs 0.952 (0.037) vs 0.875
Experiment [1] [0]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.901 (0.008) vs 0.932 (0.056) vs 0.956
Experiment [0, 1] [2]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.879 (0.004) vs 0.909 (0.061) vs 0.859
Experiment [2] [0, 1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.892 (0.007) vs 0.926 (0.061) vs 0.901
 

Experiment [0] [1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.935 (0.006) vs 0.948 (0.043) vs 0.860
Experiment [1] [0]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.914 (0.006) vs 0.932 (0.062) vs 0.958
Experiment [0, 1] [2]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.891 (0.003) vs 0.910 (0.063) vs 0.853
Experiment [2] [0, 1]
 > > 35 	 Fit vs Val vs Test (MAc) = 0.906 (0.006) vs 0.925 (0.063) vs 0.900


trees = 25

Experiment [0] [1]
 > > 25 	 Fit vs Val vs Test (MAc) = 0.935 (0.007) vs 0.947 (0.047) vs 0.854
Experiment [1] [0]
 > > 25 	 Fit vs Val vs Test (MAc) = 0.911 (0.007) vs 0.923 (0.061) vs 0.947
Experiment [0, 1] [2]
 > > 25 	 Fit vs Val vs Test (MAc) = 0.887 (0.004) vs 0.915 (0.055) vs 0.860
Experiment [2] [0, 1]
 > > 25 	 Fit vs Val vs Test (MAc) = 0.902 (0.006) vs 0.919 (0.064) vs 0.897

"""