# Part 5

### Problem Statement
Choose one of the topics we will cover in week 6. For your chosen topic, experiment with it and document your experiments, findings and code in a Python Notebook. If you choose model interpretability, then include some text near the
beginning discussing your findings and comparing the results of the different methods. If you choose another model type (such as stacked ensembles or auto ML), then fit the models and document your model fitting process in a similar
way to part 4. 

If, for this part, you have chosen to fit another model, then include a screenshot of your competition entry as usual.

Make sure that your notebook is neat and well commented and uses markdown as appropriate to make it easy to follow.

Some suggestions for you to consider are below:

#### vowpal wabbit

- Adjust the code provided in the notebooks to create data for vw in the correct format.
- Train vw with some reasonable default parameters.
- Make a note of the validation loss.
- Create predictions on test and submit on Kaggle. Compare your Kaggle result to your validation loss. It should be similar. If it is not - it may be hard to understand why but you should at least note the difference.
- There are various ways you could do h-p training. You could turn the above process into a loop and do a grid-search over some of the key parameters (the speed of vw should make this possible). Alternatively there is at least one set of code online which claims to use hyperopt - getting this to work would be impressive.
- How does your best vw model compare to your Part 3 model? Which is better? Why is it better? - after all, they are both penalised logistic regression models.

*Disclaimer: The preprocessing process is the same as that of Part 3 since we are going to compare the two framework

## Set-up

### Import relevant libraries

In [1]:
import os
import re
import csv
import numpy as np
import pandas as pd
import pickle

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from category_encoders import TargetEncoder

import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
from h2o.grid.grid_search import H2OGridSearch

### Set directories and paths

In [2]:
# set directories
print(os.getcwd())
dirRawData = os.path.join('..', 'input')
dirPData   = os.path.join('..', 'PData')
dirPOutput = os.path.join('..', 'POutput')

/home/jovyan/smm284-aml/assignment/PCode


### Load data

In [3]:
# load data: 250k train data
f_name = '01_df_250k.pickle'
f_path = os.path.join(dirPData, f_name)
PData = pickle.load(open(f_path, "rb"))

# separate data into train/test set
train_set = PData['df_train']
test_set = PData['df_test']

In [4]:
# load variable metadata 
f_name = '01_vars.pickle'
f_path = os.path.join(dirPData, f_name)
var_meta = pickle.load(open(f_path, "rb"))

# extract lists from metadata
var_idx_num = var_meta['vars_ind_numeric']
var_idx_cat = var_meta['vars_ind_categorical']
var_idx_hccv = var_meta['vars_ind_hccv']
var_idx_id = var_meta['vars_notToUse']
var_idx_response = var_meta['var_dep']

### Clean missing values
#### - Deal appropriately with missings (for all numeric variables, -99 means missing).

In [5]:
# replace -99 with np.nan
train_set = train_set.replace(-99, np.nan)
test_set = test_set.replace(-99, np.nan)

In [6]:
# count number of nulls in each column -- train_set
np.array(train_set.isnull().sum(axis = 0))

array([     0,      0,      0,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0,      0, 102984,
        23788,  23788,  23788,   1852,   2090,  23898,  24321,  11673,
        32806,  33764, 197803,   8895,   6151,  55544,  57869,  57136,
            2,      0,      0,      0,      0,      0,      0,   1092,
            0,      0,      0,      0,      0,      0,      0,      1,
            1,      0,      0,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0,      0,      0,
            0])

In [7]:
# count number of nulls in each column -- test_set
np.array(test_set.isnull().sum(axis = 0))

array([     0,      0,      0,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0,      0, 283601,
            0,      0,      0,      0,      0,      2,      0,      1,
            0,      0,      0,      0,      0,      1,      0,      9,
          226,  10870,      9,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,     17, 131970,  26261,
        26261,  26261,   3429,   3504,  26298,  26356,  18076,   2261,
         2356,   2756,  10130,   6363,  18816,  22520,  20432,      2,
            0,      0,      0,      0,      0,      0,   3560,      0,
            0,      0,      0,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0,      0,      0,
            0,      0,      0,      0,      0,      0,      0,      0])

Strategy:
- remove the feature if the number of null values in the `test_set` exceed 250
- drop rows that have leftover null values in the `train_set`
- for leftover null values in the `test_set` impute with `mean` for numerical features and `most frequent` for categorical features

Through data exploration, we decided to remove features that have over 250 null values in the test set. That is because imputation can introduce biases to the data as it is impossible to perfectly predict the true values that are missing. Still, it is not possible to remove observations from the test set concerning the competition. We want to preserve as much information as possible and thus, for features with fewer than 250 null values (less than 0.1% of the data) imputation for mean / most frequent for numeric / categorical features is incorporated as imputing small number of observations with these values does not have a large impact on the overall distribution.

For the train set, after removing features with over 250 null values in the test set, very small numbers of null values are observed (5 in total). We decided to remove those observations and thus we only lose 3 observations in total.

In [8]:
# number of null values in the test set exceed 250
to_remove = test_set.columns[np.array(test_set.isnull().sum(axis = 0) > 250)]

In [9]:
# remove the variables
train_set = train_set.drop(columns=to_remove)
test_set = test_set.drop(columns=to_remove)

In [10]:
# display number of leftover missing values in `train_set`
np.array(train_set.isnull().sum(axis = 0))

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [11]:
# drop rows that have leftover null values in the `train_set`
train_set = train_set.dropna(axis=0)

In [12]:
# display number of leftover missing values in `test_set`
np.array(test_set.isnull().sum(axis = 0))

array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   2,   0,   1,   0,   0,   0,
         0,   0,   1,   0,   9, 226,   9,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,  17,   2,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0])

In [13]:
# get columns in the test_set with missing values
impute_col = test_set.columns[np.array(test_set.isnull().sum(axis = 0) > 0)]
impute_col

Index(['c09', 'e03', 'e24', 'e17', 'e18', 'e20', 'f10', 'e02'], dtype='object')

In [14]:
# display data for columns that are going to impute
to_impute = test_set[impute_col]
to_impute.head()

Unnamed: 0,c09,e03,e24,e17,e18,e20,f10,e02
0,B,A,A,920BD,11250,A4893,CKG,6.0
1,B,A,A,F33BC,5614E,D53C5,AIR,15.0
2,B,A,A,F33BC,5614E,D53C5,AEL,15.0
3,A,A,A,861C8,9076B,A4893,CUO,27.0
4,A,A,A,861C8,9076B,A4893,AWV,27.0


In [15]:
# get column indices for numerical and categorical features that need imputation
num_col = to_impute.select_dtypes(include='number').columns
cat_col = to_impute.select_dtypes(include='category').columns
# create imputation pipeline
# -- parameters = List of (name, transformer, columns)
pipeline = ColumnTransformer([
    ('cat_impute', SimpleImputer(strategy='most_frequent'), cat_col),
    ('num_impute', SimpleImputer(strategy='mean'), num_col)
])
# impute missing values
imputed_test = pipeline.fit_transform(test_set)

In [16]:
# assign values back to the `test_set`
test_set[impute_col] = imputed_test

Check again the number of missing values

In [17]:
print(f'Number of missing values in `train_set`: {np.sum(np.sum(train_set.isna()))}')
print(f'Number of missing values in `test_set`: {np.sum(np.sum(test_set.isna()))}')

Number of missing values in `train_set`: 0
Number of missing values in `test_set`: 0


As some features were removed, we adjust the metadata

In [18]:
var_idx_num = list(set(var_idx_num) - set(to_remove))
var_idx_cat = list(set(var_idx_cat) - set(to_remove))
var_idx_hccv = list(set(var_idx_hccv) - set(to_remove))

There is also a feature with only 1 unique value: `'e16'`, we removed it from the data.

In [19]:
# column with only 1 unique value
train_set.columns[(train_set.nunique() == 1)]

Index(['e16'], dtype='object')

In [20]:
# remove from the dataframes
train_set = train_set.drop(columns='e16')
test_set = test_set.drop(columns='e16')

# update metadata
var_idx_num.remove('e16')

### Feature engineering

#### - Deal with numerics - i.e. for at least some try linear splines (or another method of your choice to deal with non-linear effects).
    - Apply linear splines to numeric features with more than 20 unique values
    - Spline using the percentiles from the train set and applies the same transformation to the test set. 
Similar logic as other preprocessings (e.g. standardisation), the transformation is fitted using the training set and applied to the test. The test set should be kept as unseen (i.e., not knowing the distribution during the training process) and thus the linear spline should be applied base on the distribution of the train set. 

In [21]:
# extract numeric features from train/test sets
df_train_num = train_set[var_idx_num]
df_test_num = test_set[var_idx_num]

In [22]:
# select only features with over 20 unique values to spline
var_to_spline = list(df_train_num.columns[df_train_num.nunique().values > 20])
print('var_to_spline:\n',
      var_to_spline)

var_to_spline:
 ['f02', 'e15', 'e04', 'e02', 'e08', 'e05']


In [23]:
# define a spline function that transform both train/test set
def fn_spline_train_test(var, x_train, x_test, n_spline):
    # define percentile step size
    step = 100 // n_spline
    # get the percentiles from the train set
    ptiles = np.percentile(x_train, np.arange(step, 100+step, step))
    # initialise dataframes
    train_ptiles = pd.DataFrame({var: x_train})
    test_ptiles = pd.DataFrame({var: x_test})
    # spline the variable
    for idx, ptile in enumerate(ptiles):
        train_ptiles[f'{var}_{str(idx)}'] = np.maximum(0, x_train - ptiles[idx])
        test_ptiles[f'{var}_{str(idx)}'] = np.maximum(0, x_test - ptiles[idx])
    return [train_ptiles, test_ptiles]

In [24]:
# applies linear spline to both train/test sets
for var in var_to_spline:
    # generate a dataframe for spline variables
    train_ptiles, test_ptiles = fn_spline_train_test(var=var, 
                                                     x_train=df_train_num[var],
                                                     x_test=df_test_num[var],
                                                     n_spline=5)
    # drop the variable that were transformed
    df_train_num = df_train_num.drop(columns=[var])
    df_test_num = df_test_num.drop(columns=[var])
    # concat the spline variables
    df_train_num = pd.concat([df_train_num, train_ptiles], axis=1, sort=False)
    df_test_num = pd.concat([df_test_num, test_ptiles], axis=1, sort=False)

In [25]:
# display data
df_test_num.head()

Unnamed: 0,f26,f06,e06,f21,f20,f28,f11,f16,f31,f25,...,e08_1,e08_2,e08_3,e08_4,e05,e05_0,e05_1,e05_2,e05_3,e05_4
0,0,60,20,0,0,5,10,0,0,0,...,0.0,0.0,0.0,0.0,58,39.0,19.0,0.0,0.0,0.0
1,0,60,76,0,0,0,15,0,0,0,...,0.0,0.0,0.0,0.0,76,57.0,37.0,17.0,0.0,0.0
2,0,60,76,0,1,5,8,0,-3,0,...,0.0,0.0,0.0,0.0,76,57.0,37.0,17.0,0.0,0.0
3,0,60,76,0,0,5,13,0,0,0,...,0.0,0.0,0.0,0.0,77,58.0,38.0,18.0,0.0,0.0
4,0,60,76,0,0,5,12,0,0,0,...,0.0,0.0,0.0,0.0,77,58.0,38.0,18.0,0.0,0.0


In [26]:
# create metadata for numeric features after spline
var_idx_spline = df_train_num.columns.tolist()

#### - Deal with hccvs (eg using the feature encoding library that we looked at in lecture) (You do not need to deal with low cardinality categorical features since H2O will one-hot them for you).
For this part, we used Target Encoder to get the mean of the response variable within each class. 

*It performs better than the other encoders.

In [27]:
# extract hccv features from train/test sets
df_train_hccv = train_set[var_idx_hccv]
df_test_hccv = test_set[var_idx_hccv]

In [28]:
# encode hccv with target encoder
enc = TargetEncoder()
# apply target encoder
df_train_hccv = enc.fit_transform(df_train_hccv, train_set[var_idx_response])
df_test_hccv = enc.transform(df_test_hccv)

  elif pd.api.types.is_categorical(cols):


#### - Try out some interactions.
We decided to try applying interaction among 6 top categorical variables based on the feature importances performed in part 2. (We used 6 as there is a gap between 6th and 7th features.

The variables are as follows: `['e21', 'b04', 'f09', 'e20', 'e11', 'a03']`

#### - Try out some other features (eg division of numerics).
We decided to try the division of numeric features. Considering the features we used to apply linear spline, they are deemed to be higher important features with larger varities (i.e., unique values > 20). 

The variables are as follows: `['e04', 'e05', 'e02', 'e08', 'e15', 'f02']`

From the below table, we noticed that many `e04, e05, etc` have quite close distribution (mean and std). As well as `'f02', 'f01'`;
Therefore, we decided to perform division of these pairs:
`[('f01', 'f02'), ('e04', 'e05'), ('e02', 'e15')]`

In [29]:
# display distribution of numeric features
train_set[var_idx_num].describe().T.sort_values(by='mean', ascending=False).head(15)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
e09,249997.0,50.191974,27.948693,18.0,18.0,55.0,76.0,99.0
e08,249997.0,50.031268,28.215267,0.0,16.0,53.0,75.0,99.0
e04,249997.0,49.824678,28.819172,1.0,24.0,49.0,76.0,99.0
f02,249997.0,49.641864,28.872189,2.0,25.0,49.0,74.0,99.0
f01,249997.0,49.588539,28.139675,13.0,13.0,51.0,75.0,99.0
e02,249997.0,49.431249,28.883891,0.0,23.0,50.0,75.0,99.0
e15,249997.0,49.427393,28.604578,0.0,22.0,51.0,74.0,99.0
e06,249997.0,49.387841,27.694698,20.0,20.0,49.0,76.0,99.0
f06,249997.0,49.367396,20.585375,0.0,60.0,60.0,60.0,60.0
f08,249997.0,49.18751,9.329284,0.0,51.0,51.0,51.0,51.0


In [30]:
# division features
# initialise dataframes
df_train_div = pd.DataFrame()
df_test_div = pd.DataFrame()
# specify pairs
div_pairs = [('f01', 'f02'), ('e04', 'e05'), ('e02', 'e15')]
# iterate
for pair in div_pairs:
    # division feature: added by '1e-6' to avoid `0/0` case
    div_train = np.divide((train_set[pair[0]]+1e-6), (train_set[pair[1]]+1e-6))
    div_test = np.divide((test_set[pair[0]]+1e-6), (test_set[pair[1]]+1e-6))
    # name the series
    div_train = div_train.rename(f'{pair[0]}_{pair[1]}')
    div_test = div_test.rename(f'{pair[0]}_{pair[1]}')
    # append data to the initilised dataframes
    df_train_div = pd.concat([df_train_div, pd.DataFrame(div_train)], axis=1, sort=False)
    df_test_div = pd.concat([df_test_div, pd.DataFrame(div_test)], axis=1, sort=False)

# create a metadata for these features
var_idx_div =  df_train_div.columns.tolist()
print(f'Division features: {var_idx_div}')

Division features: ['f01_f02', 'e04_e05', 'e02_e15']


Interaction among categorical pairs causes error in our training which we suspected it was due to some interactions in the val/test sets not exists in the train/design sets. Therefore, for this assignment, we try the interaction term between categorical variables (`['e21', 'b04', 'f09', 'e20', 'e11', 'a03']`) and non-categorical variables that attained the largest normalized coefficients, including target encoded hccv (`'f10'`) and a numeric variable (`'f02'`).

In [31]:
# specify the interaction pairs
for_pairs = ['e21', 'b04', 'f09', 'e20', 'e11', 'a03']
# create interaction pairs
interaction_pairs = []
for i in range(len(for_pairs)):
    for j in range(len(for_pairs)):
        if i < j:
            interaction_pairs.append(tuple([for_pairs[i], for_pairs[j]]))

In [32]:
print(f'Interaction pairs: {interaction_pairs}')

Interaction pairs: [('e21', 'b04'), ('e21', 'f09'), ('e21', 'e20'), ('e21', 'e11'), ('e21', 'a03'), ('b04', 'f09'), ('b04', 'e20'), ('b04', 'e11'), ('b04', 'a03'), ('f09', 'e20'), ('f09', 'e11'), ('f09', 'a03'), ('e20', 'e11'), ('e20', 'a03'), ('e11', 'a03')]


#### Create interaction terms in h2o
Since in Part 3 we decided to utilise the h2o built-in interaction terms creation in the `H2OGeneralizedLinearEstimator()` function, for the preprocessing this time, we have to create them with `h2o.interaction()` function.

In [33]:
# connect to h2o via localhost
h2o.init(ip="localhost", port=54323)

Checking whether there is an H2O instance running at http://localhost:54323 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "1.8.0_292"; OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10); OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)
  Starting server from /opt/conda/lib/python3.9/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp1ovsexpy
  JVM stdout: /tmp/tmp1ovsexpy/h2o_jovyan_started_from_python.out
  JVM stderr: /tmp/tmp1ovsexpy/h2o_jovyan_started_from_python.err
  Server is running at http://127.0.0.1:54323
Connecting to H2O server at http://127.0.0.1:54323 ... successful.


0,1
H2O_cluster_uptime:,02 secs
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.32.1.2
H2O_cluster_version_age:,2 months and 16 days
H2O_cluster_name:,H2O_from_python_jovyan_kpe4fl
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,6.971 Gb
H2O_cluster_total_cores:,8
H2O_cluster_allowed_cores:,8


In [34]:
# load data to h2o java vm
h2o_df_train = h2o.H2OFrame(train_set[for_pairs], destination_frame='df_train')
h2o_df_test = h2o.H2OFrame(test_set[for_pairs], destination_frame='df_test')

Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [35]:
# generate interaction terms for train_set
h2o.interaction(h2o_df_train, 
                pairwise=True,
                factors=for_pairs,
                max_factors=1000,
                min_occurrence=1,
                destination_frame='train_interaction')

# generate interaction terms for test_set
h2o.interaction(h2o_df_test, 
                pairwise=True,
                factors=for_pairs,
                max_factors=1000,
                min_occurrence=1,
                destination_frame='test_interaction')


Interactions progress: |██████████████████████████████████████████████████| 100%
Interactions progress: |██████████████████████████████████████████████████| 100%


e21_b04,e21_f09,e21_e20,e21_e11,e21_a03,b04_f09,b04_e20,b04_e11,b04_a03,f09_e20,f09_e11,f09_a03,e20_e11,e20_a03,e11_a03
C_F,C_B,C_A4893,C_A,C_B,F_B,F_A4893,F_A,F_B,B_A4893,B_A,B_B,A4893_A,A4893_B,A_B
O_B,O_F,O_D53C5,O_B,O_H,B_F,B_D53C5,B_B,B_H,F_D53C5,F_B,F_H,D53C5_B,D53C5_H,B_H
O_B,O_B,O_D53C5,O_B,O_H,B_B,B_D53C5,B_B,B_H,B_D53C5,B_B,B_H,D53C5_B,D53C5_H,B_H
C_E,C_G,C_A4893,C_B,C_B,E_G,E_A4893,E_B,E_B,G_A4893,G_B,G_B,A4893_B,A4893_B,B_B
C_E,C_B,C_A4893,C_B,C_B,E_B,E_A4893,E_B,E_B,B_A4893,B_B,B_B,A4893_B,A4893_B,B_B
N_E,N_B,N_A4893,N_B,N_H,E_B,E_A4893,E_B,E_H,B_A4893,B_B,B_H,A4893_B,A4893_H,B_H
N_E,N_B,N_A4893,N_B,N_H,E_B,E_A4893,E_B,E_H,B_A4893,B_B,B_H,A4893_B,A4893_H,B_H
F_B,F_G,F_D913A,F_A,F_B,B_G,B_D913A,B_A,B_B,G_D913A,G_A,G_B,D913A_A,D913A_B,A_B
Q_F,Q_B,Q_A4893,Q_B,Q_H,F_B,F_A4893,F_B,F_H,B_A4893,B_B,B_H,A4893_B,A4893_H,B_H
Q_F,Q_B,Q_A4893,Q_B,Q_H,F_B,F_A4893,F_B,F_H,B_A4893,B_B,B_H,A4893_B,A4893_H,B_H




In [36]:
# convert h2o frames to pandas DataFrame
train_interaction = h2o.get_frame('train_interaction').as_data_frame()
test_interaction = h2o.get_frame('test_interaction').as_data_frame()

#### Concatenate all the preprocessed dataframes
Prepare dataframes for model training: train/val/test/design sets

Case 4 was used for our final prediction for Part 3

In [37]:
"""
Case 4: Add interaction terms (categorical variables)
-- based on variable importances with tree-based selection in Part 2.
"""
# exclude hccv from cat variables metadata
var_idx_cat_not_hccv = list(set(var_idx_cat) - set(var_idx_hccv))
# components = ['numeric and splines', 'categorical', 'hccv', 'division']
df_design_all = pd.concat([df_train_num, train_set[var_idx_cat_not_hccv], train_interaction, df_train_hccv,
                           df_train_div, train_set[var_idx_response]], axis=1, sort=False)
df_test_all = pd.concat([df_test_num, test_set[var_idx_cat_not_hccv], test_interaction, df_test_hccv,
                        df_test_div], axis=1, sort=False)

# there seem to be two rows with missing values in the design set caused through the interaction term
df_design_all = df_design_all.dropna(axis=0)

# split design into train/val sets
df_train_all, df_val_all = train_test_split(df_design_all, test_size=0.2, random_state=888)

#### Standardise numeric features

- Standardise train/val set for model selection

In [38]:
# get column indices for numerical features that need to standardise
# -- index intersection/union are used so we can apply the same function on all the Cases above
all_num_col = train_set[var_idx_num].columns.union(var_idx_spline).union(var_idx_div)
num_col = df_train_all.columns.intersection(all_num_col)

# create standardise pipeline
# -- parameters = List of (name, transformer, columns)
pipeline = ColumnTransformer([
    ('standardise', StandardScaler(), num_col)
])

# fit standardisation on the train set
train_scale = pipeline.fit_transform(df_train_all)
# apply the standardisation to the val set
val_scale = pipeline.transform(df_val_all)

# replace the values with scaled values
df_train_all[num_col] = train_scale
df_val_all[num_col] = val_scale

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_train_all[num_col] = train_scale
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value[:, i].tolist(), pi)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_val_all[num_col] = val_scale


- Standardise design/test set for test set prediction

In [39]:
# get column indices for numerical features that need to standardise
# -- index intersection/union are used so we can apply the same function on all the Cases above
all_num_col = train_set[var_idx_num].columns.union(var_idx_spline).union(var_idx_div)
num_col = df_design_all.columns.intersection(all_num_col)

# create standardise pipeline
# -- parameters = List of (name, transformer, columns)
pipeline = ColumnTransformer([
    ('standardise', StandardScaler(), num_col)
])

# fit standardisation on the design set
# -- need to remove the response during transform so the column index match with the test set
design_scale = pipeline.fit_transform(df_design_all.iloc[:,:-1])
# apply the standardisation to the test set
test_scale = pipeline.transform(df_test_all)

# replace the values with scaled values
df_design_all[num_col] = design_scale
df_test_all[num_col] = test_scale

### Vowpal Wabbit - Online Learning

- Adjust the code provided in the notebooks to create data for vw in the correct format.

In [54]:
# export preprocessed data to csv
df_train_all.to_csv(os.path.join(dirPData, 'train.csv'))
df_val_all.to_csv(os.path.join(dirPData, 'val.csv'))
df_design_all.to_csv(os.path.join(dirPData, 'design.csv'))
df_test_all.to_csv(os.path.join(dirPData, 'test.csv'))

Use loop to create data for vw; the `.csv` files will be removed after created `.vw` files to save spaces

In [56]:
# set file names
f_names = ['train', 'val', 'design', 'test']
for f_name in f_names:
    # setup parameters
    
    # label is in the last column
    label_index = 123
    # ignore column 0 which is index numbers
    ignore_columns = [0]
    # header exist
    skip_headers = True
    # True for all files except for `test.csv`
    if f_name == 'test':
        label_present = False
    else:
        label_present = True
    # iterate to create .vw files
    idx_row = 0
    # setup file paths
    csv_path = os.path.join(dirPData, f'{f_name}.csv')
    vw_path = os.path.join(dirPData, f'{f_name}.vw')
    with open(csv_path, 'r') as infile, \
         open(vw_path, 'w') as outfile:
        reader = csv.reader(infile)
        if skip_headers:
            headers = next(reader)
        for line in reader:
            idx_row += 1
            if label_present:
                label = line.pop(label_index)
            else:
                label = 1
            label = int(float(label))
            if label == 0:
                label = -1
            new_line = []
            new_line.append( "{} |n".format(label))
            for idx_col, item in enumerate(line):
                if idx_col in ignore_columns:
                    continue
                else:
                    categorical = False
                    try:
                        item_float = float(item)
                        if item_float == 0.0:
                            continue    # sparse format
                    except ValueError:
                        if item:
                            categorical = True
                        else:
                            continue

                    if categorical:
                        new_item =  "c{}_{}".format(idx_col + 1, item)
                    else:
                        new_item = "{}:{}".format(idx_col + 1, item )
                    new_line.append(new_item)
            new_line = " ".join( new_line )
            new_line += "\n"

            outfile.write(new_line)
    # remove file after finish writing
    os.remove(csv_path)
    # print to verify the removed file
    print(f'{csv_path} has been removed.')

../PData/train.csv has been removed.
../PData/val.csv has been removed.
../PData/design.csv has been removed.
../PData/test.csv has been removed.


Let's check the first line of each `.vw` file 

The data format for `vw` takes 1 for positive class and -1 for negative class.

For the test set, we noticed that in the code, 1 was assigned as a dummy lable when label does not exist.

In [57]:
for f_name in f_names:
    f_path = os.path.join(dirPData, f'{f_name}.vw')
    with open(f_path) as f:
        print(f_path + ':')
        print('--------------------')
        print(f.readline())

../PData/train.vw:
--------------------
-1 |n 2:-0.07964552000756642 3:0.5155896151647581 4:0.2754280848752945 5:-0.08559508295513285 6:-0.24875170031876934 7:0.18253989794928338 8:0.8291774496356544 9:-0.2506097557315126 10:0.11121335874123105 11:-0.0806597280741413 12:-0.0471813260628615 13:-0.07959543545878676 14:-1.5630780766640213 15:-0.2098414743395872 16:-0.256631940986384 17:0.19499434279462513 18:3.3644558799357234 19:0.36918470723810554 20:-0.10769034674949253 21:-0.19806986035078505 22:0.09929746756195929 23:-0.33187812977814507 24:-0.30517520359007255 25:-0.0844078250023573 26:0.9836573514444525 27:1.0084021975551538 28:1.0227884811276466 29:0.8794636206529151 30:-0.43115732291510916 32:1.524541759091392 33:1.624899154957833 34:1.754898604056127 35:2.1103193631537764 36:2.478390318934751 38:-1.3131986435507443 39:-1.2658449989889278 40:-0.9178095755360637 41:-0.6520502307639899 42:-0.3995208991068253 44:-0.18766456411536775 45:-0.28976869160655744 46:-0.6302357120284311 47:

### Model Training: Elastic Net

#### Hyperparameter tuning: hyperopt
To tune the hyperparameter for vowpal wabbit with hyperopt, we loaded the module from https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/utl/vw-hyperopt.py

We downloaded the file and stored in the `PCode` directory.
- Hyperparameters tune ranges (algorithm = "Stochastic Gradient Descent" (SGD)): 
  - l2: [1e-8, 1e-4] --- Optional (LO)
  - l1: [1e-8, 1e-4] --- Optional (LO)
  - learning_rate: [0.01, 10]
  - passes: [1, 10] --- epoch
- loss function: logistic loss
- link: logistic
- random_seed = 888 --- hyperopt does not provide seed setting in the module; we have added seed to the command line part yet the results are not the same each time the code is run.
- bit = 18 is used

In [78]:
# tune the hyperparameter with hyperopt
!python vw-hyperopt.py --train ../PData/train.vw \
--holdout ../PData/val.vw --max_evals 200 --outer_loss_function logistic \
--vw_space ' --algorithms=sgd --l2=1e-8..1e-4~LO --l1=1e-8..1e-4~LO -l=0.01..10~L --passes=1..10~I --loss_function=logistic --link=logistic --random_seed=888 -b=18'

# # expected outcome
# vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 9.977644028429442 

2021-07-16 11:08:31,076 INFO     [root/vw-hyperopt:249]: loading true holdout class labels...
2021-07-16 11:08:31,549 INFO     [root/vw-hyperopt:260]: holdout length: 49999
2021-07-16 11:08:31,550 DEBUG    [root/vw-hyperopt:344]: starting hypersearch...
  0%|                                   | 0/200 [00:00<?, ?trial/s, best loss=?]2021-07-16 11:08:31,567 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.003958 seconds
2021-07-16 11:08:31,567 INFO     [hyperopt.tpe/tpe:909]: TPE using 0 trials
2021-07-16 11:08:31,569 INFO     [root/vw-hyperopt:315]: 

Starting trial no.1
2021-07-16 11:08:31,569 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --l1 1.839252732728083e-05 --l2 1.8328017739283907e-05 --link logistic --loss_function logistic --passes 10 --random_seed 888 -b 18 -l 6.767267767708859 
using l1 regularization = 1.83925e-05
using l2 regularization = 1.8328e-05
final_regressor 

only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.008391 0.008391            1            1.0   1.0000   0.7487      117
0.005359 0.002327            2            2.0   1.0000   0.7404      117
0.315627 0.625894            4            4.0  -1.0000   0.1269      117
0.527892 0.740158            8            8.0   1.0000   0.6310      117
0.659132 0.790372           16           16.0   1.0000   0.2784      117
0.898096 1.137061           32           32.0  -1.0000   0.2054      117
1.080178 1.262260           64           64.0  -1.0000   0.6076      117
0.931571 0.782964          128          128.0  -1.0000   0.3665      117
0.841142 0.750712          256          256.0   1.0000   0

2.370752 2.273486         2048         2048.0  -1.0000   0.0811      117
2.422702 2.474651         4096         4096.0   1.0000   0.9278      117
2.401679 2.380656         8192         8192.0  -1.0000   0.1223      117
2.366365 2.331052        16384        16384.0   1.0000   0.7139      117
2.364677 2.362989        32768        32768.0  -1.0000   0.0486      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.366145
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:09:05,806 INFO     [root/vw-hyperopt:305]: parameter suffix: --l2 2.424204205571114e-08 --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 7.854537526703206 
2021-07-16 11:09:05,807 INFO     [root/vw-hyperopt:306]: loss value: 0.650065
2021-07-16 11:09:05,809 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:03.004346
  2%|▏        | 5/200 [00:34<18

0.616914 0.595761        16384        16384.0   1.0000   0.5268      117
0.597293 0.577672        32768        32768.0   1.0000   0.5352      117
0.577540 0.557787        65536        65536.0   1.0000   0.8106      117
0.556031 0.534523       131072       131072.0  -1.0000   0.4838      117
0.536790 0.517548       262144       262144.0   1.0000   0.8220      117
0.520381 0.503971       524288       524288.0   1.0000   0.7593      117
0.508090 0.495799      1048576      1048576.0  -1.0000   0.6681      117

finished run
number of examples per pass = 199995
passes used = 6
weighted example sum = 1199970.000000
weighted label sum = -20898.000000
average loss = 0.506151
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 140384214
2021-07-16 11:09:16,928 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weig

using l1 regularization = 7.0486e-08
using l2 regularization = 6.63646e-06
final_regressor = ./current.model
Num weight bits = 18
learning rate = 0.0558393
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.610789 0.528430            2            2.0  -1.0000   0.4105      117
0.620477 0.630166            4            4.0  -1.0000   0.4153      117
0.741906 0.863335            8            8.0   1.0000   0.5177      117
0.744583 0.747260           16           16.0   1.0000   0.4471      117
0.711872 0.679161           32           32.0  -1.0000   0.4130      117
0.679860 0.647847           64           64.0  -1.0000   0.3227      117
0.720875 0.761891          128  

2.196665 2.226322         4096         4096.0   1.0000   0.9248      117
2.174819 2.152973         8192         8192.0  -1.0000   0.1192      117
2.144630 2.114441        16384        16384.0   1.0000   0.6798      117
2.138505 2.132381        32768        32768.0  -1.0000   0.0645      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.136444
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:09:34,357 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 5 --random_seed 888 -b 18 -l 3.8140046211248255 
2021-07-16 11:09:34,357 INFO     [root/vw-hyperopt:306]: loss value: 0.652573
2021-07-16 11:09:34,359 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:05.044425
  5%|▍       | 10/200 [01:02<18:23,  5.81s/trial, best loss: 0.6500654619881554]2021-07-16 11:09:34,379 INFO     [hyperopt.tpe/tpe

0.597680 0.579542        65536        65536.0   1.0000   0.7676      117
0.576746 0.555813       131072       131072.0  -1.0000   0.4790      117
0.556117 0.535487       262144       262144.0   1.0000   0.7810      117
0.536629 0.517141       524288       524288.0   1.0000   0.7032      117

finished run
number of examples per pass = 199995
passes used = 4
weighted example sum = 799980.000000
weighted label sum = -13932.000000
average loss = 0.525961
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 93589476
2021-07-16 11:09:40,845 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  cur

0.630958 0.598118         4096         4096.0   1.0000   0.5435      117
0.598528 0.566099         8192         8192.0   1.0000   0.6339      117
0.566455 0.534381        16384        16384.0   1.0000   0.5996      117
0.544245 0.522036        32768        32768.0   1.0000   0.5578      117
0.526403 0.508561        65536        65536.0   1.0000   0.8989      117
0.510020 0.493637       131072       131072.0  -1.0000   0.5218      117
0.499474 0.488929       262144       262144.0   1.0000   0.9164      117
0.491806 0.484138       524288       524288.0   1.0000   0.8999      117

finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.490730
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:09:48,232 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pr

using l1 regularization = 7.04285e-05
using l2 regularization = 2.1093e-06
final_regressor = ./current.model
Num weight bits = 18
learning rate = 6.29917
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.363218 0.033288            2            2.0  -1.0000   0.0327      117
0.359737 0.356256            4            4.0  -1.0000   0.0265      117
1.206136 2.052535            8            8.0   1.0000   0.0865      117
1.394097 1.582059           16           16.0   1.0000   0.9032      117
1.199993 1.005889           32           32.0  -1.0000   0.0246      117
1.054039 0.908085           64           64.0  -1.0000   0.0063      117
0.993319 0.932600          128    

0.722506 0.800947           32           32.0  -1.0000   0.3779      117
0.757731 0.792957           64           64.0  -1.0000   0.5791      117
0.685707 0.613684          128          128.0  -1.0000   0.4398      117
0.653526 0.621344          256          256.0   1.0000   0.5883      117
0.690945 0.728364          512          512.0  -1.0000   0.3569      117
0.716991 0.743037         1024         1024.0   1.0000   0.6206      117
0.690982 0.664974         2048         2048.0  -1.0000   0.4752      117
0.706307 0.721632         4096         4096.0   1.0000   0.7552      117
0.714801 0.723296         8192         8192.0  -1.0000   0.4015      117
0.712890 0.710979        16384        16384.0   1.0000   0.5839      117
0.713075 0.713261        32768        32768.0  -1.0000   0.4074      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 0.714051
best constant = -0.017060
best constant's loss = 0.999709
total

0.662432 0.653290        16384        16384.0   1.0000   0.5000      117
0.651832 0.641232        32768        32768.0   1.0000   0.4985      117
0.639962 0.628092        65536        65536.0   1.0000   0.6612      117
0.625089 0.610215       131072       131072.0  -1.0000   0.4694      117
0.608119 0.591149       262144       262144.0   1.0000   0.6549      117
0.589358 0.570598       524288       524288.0   1.0000   0.5503      117

finished run
number of examples per pass = 199995
passes used = 4
weighted example sum = 799980.000000
weighted label sum = -13932.000000
average loss = 0.577332
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 93589476
2021-07-16 11:10:17,539 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_fi

final_regressor = ./current.model
Num weight bits = 18
learning rate = 3.26553
initial_t = 0
power_t = 0.5
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.372850 0.052553            2            2.0  -1.0000   0.0512      117
0.376980 0.381109            4            4.0  -1.0000   0.0433      117
1.102197 1.827414            8            8.0   1.0000   0.1456      117
1.245377 1.388557           16           16.0   1.0000   0.8569      117
1.096303 0.947230           32           32.0  -1.0000   0.0565      117
0.967792 0.839280           64           64.0  -1.0000   0.0138      117
0.936166 0.904541          128          128.0  -1.0000   0.0685      117
0.898533 0.860900          256          256.0  -1.0000   0.2

2.140728 2.119876         8192         8192.0  -1.0000   0.1172      117
2.115850 2.090972        16384        16384.0   1.0000   0.6559      117
2.109164 2.102479        32768        32768.0  -1.0000   0.0623      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.106797
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:10:29,984 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 2.040748166510708 
2021-07-16 11:10:29,985 INFO     [root/vw-hyperopt:306]: loss value: 0.652168
2021-07-16 11:10:29,986 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:01.620033
 11%|▉       | 22/200 [01:58<11:25,  3.85s/trial, best loss: 0.6500654619881554]2021-07-16 11:10:30,003 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004336 seconds
2021-07-16 11:10:30,0

2.144007 1.947795          256          256.0   1.0000   0.6070      117
2.271635 2.399263          512          512.0  -1.0000   0.0102      117
2.269519 2.267404         1024         1024.0   1.0000   0.6511      117
2.145124 2.020728         2048         2048.0  -1.0000   0.1432      117
2.185832 2.226541         4096         4096.0   1.0000   0.9272      117
2.163488 2.141143         8192         8192.0  -1.0000   0.1173      117
2.136283 2.109078        16384        16384.0   1.0000   0.6755      117
2.129490 2.122697        32768        32768.0  -1.0000   0.0633      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.127525
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:10:34,400 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 2.735706920496881 
2021-07-16 11:10:34,

0.599111 0.558190       131072       131072.0  -1.0000   0.6573      117
0.568330 0.537548       262144       262144.0   1.0000   0.9483      117
0.543843 0.519355       524288       524288.0   1.0000   0.9079      117

finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.539909
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:10:38,462 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  pre

0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.388274 0.083400            2            2.0  -1.0000   0.0800      117
0.399363 0.410452            4            4.0  -1.0000   0.0708      117
1.015510 1.631657            8            8.0   1.0000   0.2276      117
1.091935 1.168360           16           16.0   1.0000   0.7909      117
0.987798 0.883662           32           32.0  -1.0000   0.1208      117
0.880818 0.773838           64           64.0  -1.0000   0.0359      117
0.857830 0.834842          128          128.0  -1.0000   0.1202      117
0.819595 0.781360          256          256.0  -1.0000   0.3399      117
0.798808 0.778021          512          512.0   1.0000   0.5492      117
0.760061 0.721313         1024         1024.0  -1.0000   0.2203      117
0.699678 0.639295         2048         2048.0   1.0000   0.8121      117
0.644642 0.589606         4096         4096.0   1.0000   0.3717      117
0.601937 0.559233         8192         8192.0   1.0

2021-07-16 11:10:45,698 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 5.730319986473719 
2021-07-16 11:10:45,698 INFO     [root/vw-hyperopt:306]: loss value: 0.648898
2021-07-16 11:10:45,700 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:01.566375
 14%|█▏      | 29/200 [02:14<06:37,  2.32s/trial, best loss: 0.6467645901202126]2021-07-16 11:10:45,722 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004311 seconds
2021-07-16 11:10:45,722 INFO     [hyperopt.tpe/tpe:909]: TPE using 29/29 trials with best loss 0.646765
2021-07-16 11:10:45,741 INFO     [root/vw-hyperopt:315]: 

Starting trial no.30
2021-07-16 11:10:45,741 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 0.37937081362968283 
final_regressor 

only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.618035 1.618035            1            1.0   1.0000   0.9065      117
1.430083 1.242131            2            2.0   1.0000   0.8923      117
1.454300 1.478518            4            4.0  -1.0000   0.0664      117
1.317511 1.180721            8            8.0   1.0000   0.6247      117
1.908953 2.500396           16           16.0   1.0000   0.1208      117
2.244157 2.579360           32           32.0  -1.0000   0.1178      117
2.397994 2.551832           64           64.0  -1.0000   0.7267      117
2.426454 2.454914          128          128.0  -1.0000   0.3322      117
2.258437 2.090419          256          256.0   1.0000   0

0.682412 0.625753         8192         8192.0   1.0000   0.9227      117
0.630169 0.577927        16384        16384.0   1.0000   0.7703      117
0.596661 0.563153        32768        32768.0   1.0000   0.5849      117
0.567409 0.538157        65536        65536.0   1.0000   0.9515      117
0.540680 0.513952       131072       131072.0  -1.0000   0.5840      117

finished run
number of examples = 199995
weighted example sum = 199995.000000
weighted label sum = -3483.000000
average loss = 0.528369
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 23397369
2021-07-16 11:11:02,116 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  sin

2021-07-16 11:11:08,152 INFO     [root/vw-hyperopt:315]: 

Starting trial no.35
2021-07-16 11:11:08,153 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 0.22942625546675013 
final_regressor = ./current.model
Num weight bits = 18
learning rate = 0.229426
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.498223 0.303300            2            2.0  -1.0000   0.2616      117
0.515956 0.533688            4            4.0  -1.0000   0.2617      117
0.843973 1.171989            8            8.0   1.0000   

1.554282 1.563664         8192         8192.0  -1.0000   0.1795      117
1.531569 1.508856        16384        16384.0   1.0000   0.6360      117
1.520590 1.509611        32768        32768.0  -1.0000   0.1040      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 1.518184
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:11:19,206 INFO     [root/vw-hyperopt:305]: parameter suffix: --l1 8.942337427062612e-07 --link logistic --loss_function logistic --passes 7 --random_seed 888 -b 18 -l 2.0932106326049778 
2021-07-16 11:11:19,206 INFO     [root/vw-hyperopt:306]: loss value: 0.657739
2021-07-16 11:11:19,208 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:07.777682
 18%|█▍      | 36/200 [02:47<13:51,  5.07s/trial, best loss: 0.6467645901202126]2021-07-16 11:11:19,222 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.003019 s

2.740823 2.659958          256          256.0   1.0000   0.5929      117
2.738702 2.736580          512          512.0  -1.0000   0.0034      117
2.731244 2.723787         1024         1024.0   1.0000   0.5916      117
2.637452 2.543661         2048         2048.0  -1.0000   0.0722      117
2.715116 2.792779         4096         4096.0   1.0000   0.9349      117
2.694884 2.674653         8192         8192.0  -1.0000   0.1262      117
2.657378 2.619872        16384        16384.0   1.0000   0.7214      117
2.658782 2.660186        32768        32768.0  -1.0000   0.0390      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.661791
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:11:23,245 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 6.900683145448005 
2021-07-16 11:11:23,

0.678349 0.609642         4096         4096.0   1.0000   0.3452      117
0.629505 0.580662         8192         8192.0   1.0000   0.8608      117
0.585539 0.541574        16384        16384.0   1.0000   0.6983      117
0.560205 0.534870        32768        32768.0   1.0000   0.5677      117
0.540668 0.521132        65536        65536.0   1.0000   0.9458      117
0.522904 0.505140       131072       131072.0  -1.0000   0.5702      117
0.511085 0.499265       262144       262144.0   1.0000   0.9294      117
0.501496 0.491908       524288       524288.0   1.0000   0.9301      117

finished run
number of examples per pass = 199995
passes used = 4
weighted example sum = 799980.000000
weighted label sum = -13932.000000
average loss = 0.497135
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 93589476
2021-07-16 11:11:33,179 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pr

2021-07-16 11:11:41,282 INFO     [root/vw-hyperopt:315]: 

Starting trial no.42
2021-07-16 11:11:41,282 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 1.0652610307991914 
final_regressor = ./current.model
Num weight bits = 18
learning rate = 1.06526
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.403751 0.114355            2            2.0  -1.0000   0.1081      117
0.418575 0.433398            4            4.0  -1.0000   0.0987      117
0.967664 1.516753            8            8.0   1.0000   0.

1.077579 1.046474         2048         2048.0  -1.0000   0.1655      117
1.095161 1.112744         4096         4096.0   1.0000   0.9168      117
1.101308 1.107454         8192         8192.0  -1.0000   0.2524      117
1.089251 1.077193        16384        16384.0   1.0000   0.5881      117
1.080221 1.071192        32768        32768.0  -1.0000   0.1604      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 1.078420
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:11:48,027 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 0.21040243041069798 
2021-07-16 11:11:48,027 INFO     [root/vw-hyperopt:306]: loss value: 0.666392
2021-07-16 11:11:48,028 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:01.687676
 22%|█▋      | 43/200 [03:16<10:53,  4.16s/trial, best l

0.663948 0.655443        16384        16384.0   1.0000   0.5017      117
0.654397 0.644847        32768        32768.0   1.0000   0.4908      117
0.644593 0.634789        65536        65536.0   1.0000   0.6361      117
0.634105 0.623617       131072       131072.0  -1.0000   0.4712      117
0.625806 0.617506       262144       262144.0   1.0000   0.5958      117
0.623012 0.620219       524288       524288.0   1.0000   0.5100      117

finished run
number of examples per pass = 199995
passes used = 5
weighted example sum = 999975.000000
weighted label sum = -17415.000000
average loss = 0.627272
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 116986845
2021-07-16 11:12:02,384 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_f

final_regressor = ./current.model
Num weight bits = 18
learning rate = 4.36732
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.368029 0.042911            2            2.0  -1.0000   0.0420      117
0.368720 0.369410            4            4.0  -1.0000   0.0348      117
1.146234 1.923748            8            8.0   1.0000   0.1166      117
1.313569 1.480905           16           16.0   1.0000   0.8808      117
1.145211 0.976853           32           32.0  -1.0000   0.0388      117
1.008332 0.871453           64           64.0  -1.0000   0.0090      117
0.973869 0.939407          128          128.0  -1.0000   0.0533      117
0.938858 0.903847          256      

1.042552 1.084496         1024         1024.0   1.0000   0.7436      117
1.018855 0.995159         2048         2048.0  -1.0000   0.1622      117
1.034155 1.049454         4096         4096.0   1.0000   0.9117      117
1.043147 1.052139         8192         8192.0  -1.0000   0.2519      117
1.032464 1.021782        16384        16384.0   1.0000   0.5995      117
1.024505 1.016545        32768        32768.0  -1.0000   0.1695      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 1.022398
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:12:13,778 INFO     [root/vw-hyperopt:305]: parameter suffix: --l1 2.0526853428908174e-07 --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 0.1388809503839132 
2021-07-16 11:12:13,779 INFO     [root/vw-hyperopt:306]: loss value: 0.666719
2021-07-16 11:12:13,784 INFO     [root/vw-hyperopt:323]: ev

0.576638 0.546772        16384        16384.0   1.0000   0.5798      117
0.554210 0.531783        32768        32768.0   1.0000   0.5536      117
0.534943 0.515676        65536        65536.0   1.0000   0.8843      117
0.516603 0.498263       131072       131072.0  -1.0000   0.5103      117

finished run
number of examples = 199995
weighted example sum = 199995.000000
weighted label sum = -3483.000000
average loss = 0.508446
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 23397369
2021-07-16 11:12:22,493 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     las

final_regressor = ./current.model
Num weight bits = 18
learning rate = 2.33914
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.379729 0.066312            2            2.0  -1.0000   0.0642      117
0.387538 0.395347            4            4.0  -1.0000   0.0555      117
1.056074 1.724609            8            8.0   1.0000   0.1843      117
1.167276 1.278478           16           16.0   1.0000   0.8256      117
1.040948 0.914619           32           32.0  -1.0000   0.0845      117
0.922802 0.804657           64           64.0  -1.0000   0.0226      117
0.895022 0.867241          128          128.0  -1.0000   0.0911      117
0.856344 0.817667          256      

2.090663 2.071272         8192         8192.0  -1.0000   0.1302      117
2.065252 2.039840        16384        16384.0   1.0000   0.6429      117
2.055004 2.044757        32768        32768.0  -1.0000   0.0725      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.051983
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:12:37,450 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 1.5259552389911828 
2021-07-16 11:12:37,451 INFO     [root/vw-hyperopt:306]: loss value: 0.653224
2021-07-16 11:12:37,452 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:02.892395
 26%|██      | 53/200 [04:05<10:48,  4.41s/trial, best loss: 0.6467645901202126]2021-07-16 11:12:37,471 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004308 seconds
2021-07-16 11:12:37,

0.507539 0.493625       131072       131072.0  -1.0000   0.5448      117
0.499188 0.490837       262144       262144.0   1.0000   0.9303      117

finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.494766
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:12:45,480 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.546042 1.546042            1            1.0   1.0000   0.9

1.144104 1.919116            8            8.0   1.0000   0.1179      117
1.310289 1.476475           16           16.0   1.0000   0.8798      117
1.142875 0.975461           32           32.0  -1.0000   0.0395      117
1.006392 0.869909           64           64.0  -1.0000   0.0092      117
0.972222 0.938053          128          128.0  -1.0000   0.0540      117
0.936990 0.901757          256          256.0  -1.0000   0.2481      117
0.920894 0.904799          512          512.0   1.0000   0.6373      117
0.886676 0.852457         1024         1024.0  -1.0000   0.0766      117
0.802099 0.717522         2048         2048.0   1.0000   0.7958      117
0.722588 0.643077         4096         4096.0   1.0000   0.3578      117
0.668272 0.613957         8192         8192.0   1.0000   0.9089      117
0.618751 0.569229        16384        16384.0   1.0000   0.7544      117
0.588576 0.558401        32768        32768.0   1.0000   0.5904      117
0.563156 0.537737        65536        65536.0   1.0

2021-07-16 11:12:57,628 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 4 --random_seed 888 -b 18 -l 0.2627398372671927 
2021-07-16 11:12:57,628 INFO     [root/vw-hyperopt:306]: loss value: 0.654586
2021-07-16 11:12:57,629 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:04.139494
 29%|██▎     | 58/200 [04:26<09:11,  3.88s/trial, best loss: 0.6467645901202126]2021-07-16 11:12:57,651 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004492 seconds
2021-07-16 11:12:57,652 INFO     [hyperopt.tpe/tpe:909]: TPE using 58/58 trials with best loss 0.646765
2021-07-16 11:12:57,680 INFO     [root/vw-hyperopt:315]: 

Starting trial no.59
2021-07-16 11:12:57,680 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 7.165056822728783 
final_regressor =

0.755065 0.766678         4096         4096.0   1.0000   0.7342      117
0.758152 0.761240         8192         8192.0  -1.0000   0.3882      117
0.755860 0.753568        16384        16384.0   1.0000   0.5911      117
0.755941 0.756022        32768        32768.0  -1.0000   0.3278      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 0.754649
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:13:01,598 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 0.0285497212864552 
2021-07-16 11:13:01,598 INFO     [root/vw-hyperopt:306]: loss value: 0.696073
2021-07-16 11:13:01,600 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:02.335532
 30%|██▍     | 60/200 [04:30<06:52,  2.95s/trial, best loss: 0.6467645901202126]2021-07-16 11:13:01,616 INFO     [hyperopt.tpe/tpe

0.539090 0.520141        65536        65536.0   1.0000   0.9458      117
0.521961 0.504833       131072       131072.0  -1.0000   0.5708      117

finished run
number of examples = 199995
weighted example sum = 199995.000000
weighted label sum = -3483.000000
average loss = 0.514716
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 23397369
2021-07-16 11:13:06,455 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.698403 1.698403            1            1.0   1.0000   0.9091      117
1.584133 1.4

0.801552 0.720719           64           64.0  -1.0000   0.0873      117
0.792486 0.783421          128          128.0  -1.0000   0.2098      117
0.758368 0.724249          256          256.0  -1.0000   0.3972      117
0.742040 0.725713          512          512.0   1.0000   0.4953      117
0.709449 0.676858         1024         1024.0  -1.0000   0.2981      117
0.668038 0.626627         2048         2048.0   1.0000   0.7791      117
0.629721 0.591405         4096         4096.0   1.0000   0.4836      117
0.594331 0.558941         8192         8192.0   1.0000   0.6743      117
0.560503 0.526675        16384        16384.0   1.0000   0.6158      117
0.538657 0.516811        32768        32768.0   1.0000   0.5595      117
0.522041 0.505425        65536        65536.0   1.0000   0.9101      117
0.507083 0.492125       131072       131072.0  -1.0000   0.5317      117
0.497944 0.488805       262144       262144.0   1.0000   0.9246      117
0.491163 0.484382       524288       524288.0   1.0


finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 0.766028
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:13:24,137 INFO     [root/vw-hyperopt:305]: parameter suffix: --l1 5.260333411030193e-07 --l2 9.283700385422963e-08 --link logistic --loss_function logistic --passes 5 --random_seed 888 -b 18 -l 0.013398143515849276 
2021-07-16 11:13:24,137 INFO     [root/vw-hyperopt:306]: loss value: 0.702389
2021-07-16 11:13:24,139 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:05.460924
 32%|██▌     | 65/200 [04:52<10:58,  4.88s/trial, best loss: 0.6467645901202126]2021-07-16 11:13:24,157 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004338 seconds
2021-07-16 11:13:24,158 INFO     [hyperopt.tpe/tpe:909]: TPE using 65/65 trials with best loss 0.646765
2021-07-16 11:13:24,178 INFO     [root/vw-hyperopt:315]: 

Starting trial no.66

2.377892 2.244082          256          256.0   1.0000   0.5936      117
2.446543 2.515194          512          512.0  -1.0000   0.0061      117
2.435966 2.425389         1024         1024.0   1.0000   0.5941      117
2.326154 2.216342         2048         2048.0  -1.0000   0.1076      117
2.384672 2.443190         4096         4096.0   1.0000   0.9302      117
2.361396 2.338120         8192         8192.0  -1.0000   0.1175      117
2.331303 2.301210        16384        16384.0   1.0000   0.6998      117
2.328131 2.324960        32768        32768.0  -1.0000   0.0494      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.328821
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:13:27,360 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 4.140683752314998 
2021-07-16 11:13:27,

0.663757 0.621283        32768        32768.0   1.0000   0.6502      117
0.622554 0.581352        65536        65536.0   1.0000   0.9688      117
0.585578 0.548602       131072       131072.0  -1.0000   0.6470      117
0.558088 0.530597       262144       262144.0   1.0000   0.9450      117
0.536221 0.514355       524288       524288.0   1.0000   0.9145      117

finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.532722
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:13:32,432 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
nu

1.030641 0.889496           64           64.0  -1.0000   0.0071      117
0.995040 0.959439          128          128.0  -1.0000   0.0466      117
0.962079 0.929117          256          256.0  -1.0000   0.2298      117
0.949165 0.936250          512          512.0   1.0000   0.6530      117
0.918040 0.886915         1024         1024.0  -1.0000   0.0585      117
0.829648 0.741257         2048         2048.0   1.0000   0.7872      117
0.745864 0.662079         4096         4096.0   1.0000   0.3710      117
0.689212 0.632560         8192         8192.0   1.0000   0.9250      117
0.637062 0.584912        16384        16384.0   1.0000   0.7848      117
0.604175 0.571287        32768        32768.0   1.0000   0.6028      117
0.575393 0.546612        65536        65536.0   1.0000   0.9583      117
0.549552 0.523710       131072       131072.0  -1.0000   0.6105      117
0.531123 0.512695       262144       262144.0   1.0000   0.9355      117

finished run
number of examples per pass = 199995


2021-07-16 11:13:38,509 INFO     [root/vw-hyperopt:315]: 

Starting trial no.73
2021-07-16 11:13:38,510 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 6.685777132680136 
final_regressor = ./current.model
Num weight bits = 18
learning rate = 6.68578
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.362520 0.031892            2            2.0  -1.0000   0.0314      117
0.357893 0.353266            4            4.0  -1.0000   0.0252      117
1.216224 2.074556            8            8.0   1.0000   0.0

2.016975 1.799017          256          256.0   1.0000   0.6260      117
2.161909 2.306844          512          512.0  -1.0000   0.0131      117
2.181069 2.200228         1024         1024.0   1.0000   0.7622      117
2.061233 1.941398         2048         2048.0  -1.0000   0.1480      117
2.088805 2.116377         4096         4096.0   1.0000   0.9258      117
2.073358 2.057911         8192         8192.0  -1.0000   0.1323      117
2.047283 2.021208        16384        16384.0   1.0000   0.6316      117
2.036889 2.026496        32768        32768.0  -1.0000   0.0744      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.033841
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:13:48,386 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 7 --random_seed 888 -b 18 -l 1.2706083589176138 
2021-07-16 11:13:48

0.526663 0.508758        65536        65536.0   1.0000   0.8985      117
0.510200 0.493737       131072       131072.0  -1.0000   0.5214      117

finished run
number of examples = 199995
weighted example sum = 199995.000000
weighted label sum = -3483.000000
average loss = 0.503398
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 23397369
2021-07-16 11:13:52,039 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.828249 0.828249            1            1.0   1.0000   0.8710      117
0.902857 0.9

final_regressor = ./current.model
Num weight bits = 18
learning rate = 8.27341
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.360321 0.027494            2            2.0  -1.0000   0.0271      117
0.352910 0.345500            4            4.0  -1.0000   0.0215      117
1.253223 2.153536            8            8.0   1.0000   0.0686      117
1.460643 1.668063           16           16.0   1.0000   0.9234      117
1.253087 1.045530           32           32.0  -1.0000   0.0161      117
1.099362 0.945637           64           64.0  -1.0000   0.0035      117
1.060159 1.020957          128          128.0  -1.0000   0.0309      117
1.036118 1.012076          256      

2.534170 2.499395        16384        16384.0   1.0000   0.7169      117
2.533770 2.533370        32768        32768.0  -1.0000   0.0424      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.536112
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:14:00,944 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 5.911659556169528 
2021-07-16 11:14:00,944 INFO     [root/vw-hyperopt:306]: loss value: 0.648771
2021-07-16 11:14:00,945 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:01.515606
 40%|███▏    | 79/200 [05:29<05:14,  2.60s/trial, best loss: 0.6467645901202126]2021-07-16 11:14:00,969 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.010348 seconds
2021-07-16 11:14:00,970 INFO     [hyperopt.tpe/tpe:909]: TPE using 79/79 trials with best loss

0.489030 0.489030            1            1.0   1.0000   0.5746      117
0.415423 0.341816            2            2.0   1.0000   0.6024      117
0.300827 0.186230            4            4.0  -1.0000   0.2710      117
0.533178 0.765530            8            8.0   1.0000   0.6352      117
0.604316 0.675453           16           16.0   1.0000   0.3464      117
0.738013 0.871709           32           32.0  -1.0000   0.3678      117
0.851577 0.965141           64           64.0  -1.0000   0.6119      117
0.768150 0.684723          128          128.0  -1.0000   0.4759      117
0.689214 0.610278          256          256.0   1.0000   0.6004      117
0.691510 0.693806          512          512.0  -1.0000   0.2968      117
0.726048 0.760586         1024         1024.0   1.0000   0.6437      117
0.708384 0.690720         2048         2048.0  -1.0000   0.2602      117
0.720383 0.732382         4096         4096.0   1.0000   0.7828      117
0.725158 0.729933         8192         8192.0  -1.0

0.857715 0.760220         4096         4096.0   1.0000   0.4588      117
0.793067 0.728419         8192         8192.0   1.0000   0.9659      117
0.730285 0.667502        16384        16384.0   1.0000   0.9145      117
0.683216 0.636147        32768        32768.0   1.0000   0.6717      117
0.636806 0.590396        65536        65536.0   1.0000   0.9685      117
0.595377 0.553949       131072       131072.0  -1.0000   0.6571      117
0.565094 0.534811       262144       262144.0   1.0000   0.9423      117

finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.549635
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:14:11,562 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight 

0.687419 0.671091         1024         1024.0  -1.0000   0.3424      117
0.673618 0.659818         2048         2048.0   1.0000   0.6343      117
0.657835 0.642052         4096         4096.0   1.0000   0.6968      117
0.640884 0.623934         8192         8192.0   1.0000   0.5046      117
0.620603 0.600321        16384        16384.0   1.0000   0.5232      117
0.601429 0.582255        32768        32768.0   1.0000   0.5336      117
0.581835 0.562241        65536        65536.0   1.0000   0.8027      117
0.560101 0.538367       131072       131072.0  -1.0000   0.4824      117
0.540017 0.519933       262144       262144.0   1.0000   0.8176      117
0.522187 0.504357       524288       524288.0   1.0000   0.7592      117
0.507925 0.493662      1048576      1048576.0  -1.0000   0.6697      117

finished run
number of examples per pass = 199995
passes used = 9
weighted example sum = 1799955.000000
weighted label sum = -31347.000000
average loss = 0.499336
best constant = -0.034834
best co

 43%|███▍    | 86/200 [05:56<08:26,  4.44s/trial, best loss: 0.6467645901202126]2021-07-16 11:14:27,812 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004206 seconds
2021-07-16 11:14:27,813 INFO     [hyperopt.tpe/tpe:909]: TPE using 86/86 trials with best loss 0.646765
2021-07-16 11:14:27,834 INFO     [root/vw-hyperopt:315]: 

Starting trial no.87
2021-07-16 11:14:27,835 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 7.598884501784334 
final_regressor = ./current.model
Num weight bits = 18
learning rate = 7.59888
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict f

2.189777 2.076920         2048         2048.0  -1.0000   0.1109      117
2.214085 2.238394         4096         4096.0   1.0000   0.9230      117
2.192369 2.170652         8192         8192.0  -1.0000   0.1186      117
2.160101 2.127833        16384        16384.0   1.0000   0.6882      117
2.155939 2.151778        32768        32768.0  -1.0000   0.0626      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.153826
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:14:37,597 INFO     [root/vw-hyperopt:305]: parameter suffix: --l2 1.0337232653015542e-08 --link logistic --loss_function logistic --passes 8 --random_seed 888 -b 18 -l 5.623029501648992 
2021-07-16 11:14:37,598 INFO     [root/vw-hyperopt:306]: loss value: 0.652382
2021-07-16 11:14:37,599 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:07.336684
 44%|███▌    | 88/200 [06:06<0

0.549212 0.566206       524288       524288.0   1.0000   0.6730      117

finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.554209
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:14:42,379 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.068871 0.068871            1            1.0   1.0000   0.6765      117
0.082366 0.095861            2            2.0   1.0000   0.

using l2 regularization = 1.07717e-05
final_regressor = ./current.model
Num weight bits = 18
learning rate = 0.202578
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.508590 0.324032            2            2.0  -1.0000   0.2768      117
0.525879 0.543168            4            4.0  -1.0000   0.2779      117
0.833381 1.140884            8            8.0   1.0000   0.4951      117
0.813046 0.792710           16           16.0   1.0000   0.4782      117
0.765385 0.717725           32           32.0  -1.0000   0.3495      117
0.715671 0.665957           64           64.0  -1.0000   0.2184      117
0.734537 0.753402          128          128.0  -1.0000   0.3958      1

0.781760 0.762681         2048         2048.0  -1.0000   0.3033      117
0.792388 0.803017         4096         4096.0   1.0000   0.6547      117
0.790867 0.789345         8192         8192.0  -1.0000   0.4157      117
0.789803 0.788740        16384        16384.0   1.0000   0.5788      117
0.790152 0.790500        32768        32768.0  -1.0000   0.3706      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 0.789121
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:14:52,826 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 0.025143655119843487 
2021-07-16 11:14:52,826 INFO     [root/vw-hyperopt:306]: loss value: 0.704070
2021-07-16 11:14:52,828 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:01.549038
 46%|███▋    | 93/200 [06:21<05:36,  3.14s/trial, best 

0.626312 0.584183        65536        65536.0   1.0000   0.9696      117
0.588479 0.550646       131072       131072.0  -1.0000   0.6493      117
0.560278 0.532076       262144       262144.0   1.0000   0.9457      117
0.537848 0.515419       524288       524288.0   1.0000   0.9131      117

finished run
number of examples per pass = 199995
passes used = 5
weighted example sum = 999975.000000
weighted label sum = -17415.000000
average loss = 0.522539
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 116986845
2021-07-16 11:15:01,977 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  cu

0.834472 0.796402         1024         1024.0  -1.0000   0.1205      117
0.757710 0.680948         2048         2048.0   1.0000   0.8084      117
0.686636 0.615561         4096         4096.0   1.0000   0.3458      117
0.636673 0.586711         8192         8192.0   1.0000   0.8723      117
0.591651 0.546629        16384        16384.0   1.0000   0.7085      117
0.565532 0.539413        32768        32768.0   1.0000   0.5734      117
0.545051 0.524569        65536        65536.0   1.0000   0.9487      117
0.526534 0.508018       131072       131072.0  -1.0000   0.5779      117
0.514018 0.501503       262144       262144.0   1.0000   0.9311      117
0.503736 0.493454       524288       524288.0   1.0000   0.9301      117

finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.502126
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:15:

using l1 regularization = 4.0955e-08
final_regressor = ./current.model
Num weight bits = 18
learning rate = 5.0038
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.366088 0.039028            2            2.0  -1.0000   0.0383      117
0.365111 0.364135            4            4.0  -1.0000   0.0314      117
1.167591 1.970071            8            8.0   1.0000   0.1049      117
1.344788 1.521986           16           16.0   1.0000   0.8912      117
1.167924 0.991060           32           32.0  -1.0000   0.0324      117
1.027432 0.886939           64           64.0  -1.0000   0.0073      117
0.991986 0.956541          128          128.0  -1.0000   0.0475      117


2.403572 2.422705         1024         1024.0   1.0000   0.5943      117
2.298521 2.193469         2048         2048.0  -1.0000   0.0924      117
2.338023 2.377525         4096         4096.0   1.0000   0.9254      117
2.315597 2.293171         8192         8192.0  -1.0000   0.1169      117
2.281448 2.247298        16384        16384.0   1.0000   0.7053      117
2.278796 2.276144        32768        32768.0  -1.0000   0.0533      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.278768
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:15:23,771 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 4 --random_seed 888 -b 18 -l 6.717121903337417 
2021-07-16 11:15:23,772 INFO     [root/vw-hyperopt:306]: loss value: 0.650936
2021-07-16 11:15:23,774 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 

0.552491 0.528577        32768        32768.0   1.0000   0.5654      117
0.534627 0.516763        65536        65536.0   1.0000   0.9430      117
0.518498 0.502370       131072       131072.0  -1.0000   0.5654      117
0.507983 0.497468       262144       262144.0   1.0000   0.9308      117

finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.502225
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:15:29,446 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  curr

using l1 regularization = 3.96463e-07
final_regressor = ./current.model
Num weight bits = 18
learning rate = 0.356302
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.464098 0.235048            2            2.0  -1.0000   0.2095      117
0.482601 0.501104            4            4.0  -1.0000   0.2056      117
0.879726 1.276852            8            8.0   1.0000   0.4469      117
0.848868 0.818009           16           16.0   1.0000   0.5567      117
0.800580 0.752293           32           32.0  -1.0000   0.3119      117
0.742057 0.683533           64           64.0  -1.0000   0.1672      117
0.749472 0.756887          128          128.0  -1.0000   0.3307      1

0.935467 0.913458         2048         2048.0  -1.0000   0.1755      117
0.950642 0.965817         4096         4096.0   1.0000   0.8997      117
0.959490 0.968339         8192         8192.0  -1.0000   0.2722      117
0.950075 0.940660        16384        16384.0   1.0000   0.6060      117
0.943980 0.937885        32768        32768.0  -1.0000   0.1893      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 0.942189
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:15:39,876 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 0.10923973493970691 
2021-07-16 11:15:39,876 INFO     [root/vw-hyperopt:306]: loss value: 0.670284
2021-07-16 11:15:39,878 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:02.476582
 52%|███▋   | 105/200 [07:08<05:31,  3.49s/trial, best l


finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.493745
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:15:43,590 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.217212 1.217212            1            1.0   1.0000   0.8912      117
1.289951 1.362691            2            2.0   1.0000   0.8973      117
1.225660 1.161369            4            4.0  -1.0000   0.0

0.949734 0.904886         1024         1024.0  -1.0000   0.0394      117
0.832297 0.714860         2048         2048.0   1.0000   0.8272      117
0.726968 0.621638         4096         4096.0   1.0000   0.2328      117
0.646082 0.565197         8192         8192.0   1.0000   0.9444      117
0.581588 0.517094        16384        16384.0   1.0000   0.7792      117
0.545548 0.509508        32768        32768.0   1.0000   0.4794      117
0.525399 0.505250        65536        65536.0   1.0000   0.8494      117
0.512896 0.500393       131072       131072.0  -1.0000   0.5244      117
0.510995 0.509094       262144       262144.0   1.0000   0.8574      117
0.513877 0.516759       524288       524288.0   1.0000   0.7993      117

finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.515072
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:15:

2021-07-16 11:15:55,995 INFO     [root/vw-hyperopt:315]: 

Starting trial no.111
2021-07-16 11:15:55,996 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --l2 1.0822644546314917e-05 --link logistic --loss_function logistic --passes 7 --random_seed 888 -b 18 -l 9.775561663468062 
using l2 regularization = 1.08226e-05
final_regressor = ./current.model
Num weight bits = 18
learning rate = 9.77556
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.358814 0.024480            2            2.0  -1.0000   0.0242      117
0.349200 0.339586            4            4.0  -1.0000   0.0189   

1.307624 1.339988         1024         1024.0   1.0000   0.7600      117
1.267370 1.227117         2048         2048.0  -1.0000   0.1472      117
1.285259 1.303147         4096         4096.0   1.0000   0.9278      117
1.291170 1.297081         8192         8192.0  -1.0000   0.2066      117
1.276608 1.262046        16384        16384.0   1.0000   0.6010      117
1.267405 1.258202        32768        32768.0  -1.0000   0.1308      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 1.264735
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:16:06,587 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 0.17355203481629272 
2021-07-16 11:16:06,588 INFO     [root/vw-hyperopt:306]: loss value: 0.659862
2021-07-16 11:16:06,589 INFO     [root/vw-hyperopt:323]: evaluation time for this step

0.534879 0.527405        32768        32768.0   1.0000   0.4915      117
0.541081 0.547283        65536        65536.0   1.0000   0.7051      117
0.555726 0.570371       131072       131072.0  -1.0000   0.5570      117
0.593021 0.630316       262144       262144.0   1.0000   0.5993      117

finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.620802
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:16:12,645 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  curr

0.708029 0.678051           32           32.0  -1.0000   0.4207      117
0.678162 0.648294           64           64.0  -1.0000   0.3360      117
0.719047 0.759932          128          128.0  -1.0000   0.4325      117
0.709466 0.699885          256          256.0  -1.0000   0.5193      117
0.701784 0.694102          512          512.0   1.0000   0.4893      117
0.692312 0.682840         1024         1024.0  -1.0000   0.4184      117
0.685546 0.678781         2048         2048.0   1.0000   0.5578      117
0.676849 0.668153         4096         4096.0   1.0000   0.6280      117
0.668230 0.659611         8192         8192.0   1.0000   0.4858      117
0.657806 0.647381        16384        16384.0   1.0000   0.5013      117
0.646018 0.634230        32768        32768.0   1.0000   0.5043      117
0.632907 0.619796        65536        65536.0   1.0000   0.6797      117
0.616635 0.600363       131072       131072.0  -1.0000   0.4715      117
0.598499 0.580364       262144       262144.0   1.0

2021-07-16 11:16:25,615 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 3.774580503321054 
2021-07-16 11:16:25,615 INFO     [root/vw-hyperopt:306]: loss value: 0.650409
2021-07-16 11:16:25,617 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:01.979037
 58%|████   | 117/200 [07:54<05:26,  3.93s/trial, best loss: 0.6467645901202126]2021-07-16 11:16:25,637 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004616 seconds
2021-07-16 11:16:25,638 INFO     [hyperopt.tpe/tpe:909]: TPE using 117/117 trials with best loss 0.646765
2021-07-16 11:16:25,665 INFO     [root/vw-hyperopt:315]: 

Starting trial no.118
2021-07-16 11:16:25,666 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 0.01025896964932436 
final_regress

0.894850 0.765729          128          128.0  -1.0000   0.3501      117
0.820933 0.747015          256          256.0   1.0000   0.6616      117
0.911220 1.001508          512          512.0  -1.0000   0.1469      117
0.934182 0.957143         1024         1024.0   1.0000   0.7270      117
0.892915 0.851648         2048         2048.0  -1.0000   0.4027      117
0.922491 0.952067         4096         4096.0   1.0000   0.8606      117
0.930213 0.937935         8192         8192.0  -1.0000   0.2697      117
0.919885 0.909557        16384        16384.0   1.0000   0.6504      117
0.914586 0.909288        32768        32768.0  -1.0000   0.2693      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 0.916088
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:16:31,651 INFO     [root/vw-hyperopt:305]: parameter suffix: --l1 9.769073809890414e-06 --link logistic --

0.626589 0.584391        65536        65536.0   1.0000   0.9696      117
0.588693 0.550796       131072       131072.0  -1.0000   0.6495      117
0.560439 0.532185       262144       262144.0   1.0000   0.9458      117

finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.545782
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:16:35,437 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  pred

final_regressor = ./current.model
Num weight bits = 18
learning rate = 5.4408
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.364981 0.036816            2            2.0  -1.0000   0.0361      117
0.362962 0.360943            4            4.0  -1.0000   0.0295      117
1.181530 2.000098            8            8.0   1.0000   0.0978      117
1.364383 1.547236           16           16.0   1.0000   0.8971      117
1.182119 0.999855           32           32.0  -1.0000   0.0289      117
1.039334 0.896548           64           64.0  -1.0000   0.0065      117
1.003225 0.967117          128          128.0  -1.0000   0.0442      117
0.971213 0.939201          256       

2.021937 1.801582          256          256.0   1.0000   0.6243      117
2.164574 2.307210          512          512.0  -1.0000   0.0130      117
2.190840 2.217107         1024         1024.0   1.0000   0.7599      117
2.073682 1.956525         2048         2048.0  -1.0000   0.1379      117
2.097659 2.121635         4096         4096.0   1.0000   0.9248      117
2.082552 2.067444         8192         8192.0  -1.0000   0.1315      117
2.055405 2.028258        16384        16384.0   1.0000   0.6344      117
2.045556 2.035707        32768        32768.0  -1.0000   0.0739      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.042632
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:16:56,586 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 10 --random_seed 888 -b 18 -l 1.4389371235559327 
2021-07-16 11:16:5

0.521322 0.505024        65536        65536.0   1.0000   0.9133      117
0.506706 0.492090       131072       131072.0  -1.0000   0.5345      117
0.497868 0.489031       262144       262144.0   1.0000   0.9263      117

finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.493420
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:17:00,231 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  pred

final_regressor = ./current.model
Num weight bits = 18
learning rate = 3.35616
initial_t = 0
power_t = 0.5
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.372353 0.051560            2            2.0  -1.0000   0.0503      117
0.376182 0.380010            4            4.0  -1.0000   0.0424      117
1.106151 1.836120            8            8.0   1.0000   0.1427      117
1.251697 1.397243           16           16.0   1.0000   0.8593      117
1.100855 0.950012           32           32.0  -1.0000   0.0546      117
0.971547 0.842240           64           64.0  -1.0000   0.0133      117
0.939725 0.907902          128          128.0  -1.0000   0.0669      117
0.902242 0.864760          256          256.0  -1.0000   0.2

2.594507 2.652142         4096         4096.0   1.0000   0.9302      117
2.573641 2.552775         8192         8192.0  -1.0000   0.1245      117
2.534030 2.494419        16384        16384.0   1.0000   0.7244      117
2.534257 2.534484        32768        32768.0  -1.0000   0.0432      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.536669
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:17:13,044 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 9.895706372436933 
2021-07-16 11:17:13,044 INFO     [root/vw-hyperopt:306]: loss value: 0.649071
2021-07-16 11:17:13,045 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:03.457649
 64%|████▌  | 129/200 [08:41<04:30,  3.82s/trial, best loss: 0.6467645901202126]2021-07-16 11:17:13,166 INFO     [hyperopt.tpe/tpe:

0.615534 0.576091        65536        65536.0   1.0000   0.9673      117
0.580172 0.544811       131072       131072.0  -1.0000   0.6425      117
0.554015 0.527857       262144       262144.0   1.0000   0.9436      117
0.533200 0.512385       524288       524288.0   1.0000   0.9170      117
0.518244 0.503288      1048576      1048576.0  -1.0000   0.8309      117

finished run
number of examples per pass = 199995
passes used = 9
weighted example sum = 1799955.000000
weighted label sum = -31347.000000
average loss = 0.509308
best constant = -0.034834
best constant's loss = 0.692996
total feature number = 210576321
2021-07-16 11:17:27,821 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input


2021-07-16 11:17:31,216 INFO     [root/vw-hyperopt:315]: 

Starting trial no.133
2021-07-16 11:17:31,216 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 0.030289383016527234 
final_regressor = ./current.model
Num weight bits = 18
learning rate = 0.0302894
initial_t = 0
power_t = 0.5
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.643496 0.593846            2            2.0  -1.0000   0.4478      117
0.649676 0.655856            4            4.0  -1.0000   0.4514      117
0.719580 0.789484            8            8.0   1.0000   0.5129      117
0.722

1.934438 1.823592         2048         2048.0  -1.0000   0.1532      117
1.961890 1.989343         4096         4096.0   1.0000   0.9285      117
1.951482 1.941073         8192         8192.0  -1.0000   0.1409      117
1.928538 1.905593        16384        16384.0   1.0000   0.6036      117
1.917620 1.906702        32768        32768.0  -1.0000   0.0789      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 1.913899
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:17:36,140 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 0.6807565574473159 
2021-07-16 11:17:36,141 INFO     [root/vw-hyperopt:306]: loss value: 0.653009
2021-07-16 11:17:36,142 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:03.011033
 67%|████▋  | 134/200 [09:04<04:06,  3.74s/trial, best lo

0.530807 0.513805        65536        65536.0   1.0000   0.9401      117
0.515490 0.500173       131072       131072.0  -1.0000   0.5606      117
0.505686 0.495883       262144       262144.0   1.0000   0.9309      117
0.497577 0.489467       524288       524288.0   1.0000   0.9270      117
0.492049 0.486521      1048576      1048576.0  -1.0000   0.7606      117

finished run
number of examples per pass = 199995
passes used = 10
weighted example sum = 1999950.000000
weighted label sum = -34830.000000
average loss = 0.488150
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 233973690
2021-07-16 11:17:49,560 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input

2021-07-16 11:17:52,927 INFO     [root/vw-hyperopt:315]: 

Starting trial no.138
2021-07-16 11:17:52,927 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 5.526112748913707 
final_regressor = ./current.model
Num weight bits = 18
learning rate = 5.52611
initial_t = 0
power_t = 0.5
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.364783 0.036419            2            2.0  -1.0000   0.0358      117
0.362575 0.360368            4            4.0  -1.0000   0.0291      117
1.184042 2.005509            8            8.0   1.0000   0.0966      117
1.367999 1

2.165455 2.194679         4096         4096.0   1.0000   0.9255      117
2.144171 2.122886         8192         8192.0  -1.0000   0.1235      117
2.115350 2.086529        16384        16384.0   1.0000   0.6687      117
2.107547 2.099744        32768        32768.0  -1.0000   0.0681      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.105116
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:17:59,772 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 5 --random_seed 888 -b 18 -l 2.9066581787598333 
2021-07-16 11:17:59,772 INFO     [root/vw-hyperopt:306]: loss value: 0.652978
2021-07-16 11:17:59,774 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:05.222650
 70%|████▊  | 139/200 [09:28<04:20,  4.26s/trial, best loss: 0.6467645901202126]2021-07-16 11:17:59,790 INFO     [hyperopt.tpe/tpe

0.648051 0.637564       131072       131072.0  -1.0000   0.4646      117
0.635436 0.622820       262144       262144.0   1.0000   0.5935      117
0.620568 0.605700       524288       524288.0   1.0000   0.5043      117

finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.617350
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:18:07,652 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  pre

0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.476496 0.259846            2            2.0  -1.0000   0.2288      117
0.494852 0.513208            4            4.0  -1.0000   0.2265      117
0.866515 1.238177            8            8.0   1.0000   0.4643      117
0.836836 0.807157           16           16.0   1.0000   0.5295      117
0.788889 0.740943           32           32.0  -1.0000   0.3248      117
0.733341 0.677793           64           64.0  -1.0000   0.1830      117
0.743994 0.754646          128          128.0  -1.0000   0.3529      117
0.719502 0.695011          256          256.0  -1.0000   0.4778      117
0.709723 0.699944          512          512.0   1.0000   0.4750      117
0.687351 0.664979         1024         1024.0  -1.0000   0.3164      117
0.665046 0.642741         2048         2048.0   1.0000   0.6982      117
0.642351 0.619656         4096         4096.0   1.0000   0.6611      117
0.617593 0.592835         8192         8192.0   1.0

1.862771 1.840616        16384        16384.0   1.0000   0.5979      117
1.852057 1.841344        32768        32768.0  -1.0000   0.0819      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 1.848249
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:18:15,884 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 0.5552740710122246 
2021-07-16 11:18:15,885 INFO     [root/vw-hyperopt:306]: loss value: 0.653143
2021-07-16 11:18:15,887 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:03.196010
 72%|█████  | 144/200 [09:44<03:00,  3.22s/trial, best loss: 0.6467645901202126]2021-07-16 11:18:15,914 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004452 seconds
2021-07-16 11:18:15,916 INFO     [hyperopt.tpe/tpe:909]: TPE using 144/144 trials with best l

only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
2.648876 2.648876            1            1.0   1.0000   0.9326      117
1.814604 0.980332            2            2.0   1.0000   0.8798      117
2.033459 2.252313            4            4.0  -1.0000   0.0423      117
1.695528 1.357598            8            8.0   1.0000   0.5232      117
2.618973 3.542418           16           16.0   1.0000   0.0507      117
2.995191 3.371410           32           32.0  -1.0000   0.1151      117
2.773497 2.551803           64           64.0  -1.0000   0.7465      117
2.843045 2.912593          128          128.0  -1.0000   0.3258      117
2.765553 2.688061          256          256.0   1.0000   0

0.739887 0.678941         8192         8192.0   1.0000   0.9507      117
0.682398 0.624908        16384        16384.0   1.0000   0.8542      117
0.643065 0.603733        32768        32768.0   1.0000   0.6341      117
0.606054 0.569043        65536        65536.0   1.0000   0.9653      117
0.572899 0.539744       131072       131072.0  -1.0000   0.6359      117

finished run
number of examples = 199995
weighted example sum = 199995.000000
weighted label sum = -3483.000000
average loss = 0.557287
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 23397369
2021-07-16 11:18:23,185 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  sin

final_regressor = ./current.model
Num weight bits = 18
learning rate = 3.31189
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.372593 0.052039            2            2.0  -1.0000   0.0507      117
0.376567 0.380540            4            4.0  -1.0000   0.0428      117
1.104217 1.831868            8            8.0   1.0000   0.1441      117
1.248658 1.393099           16           16.0   1.0000   0.8581      117
1.098664 0.948671           32           32.0  -1.0000   0.0555      117
0.969736 0.840807           64           64.0  -1.0000   0.0135      117
0.937999 0.906263          128          128.0  -1.0000   0.0677      117
0.900476 0.862953          256      

2.386619 2.445264         4096         4096.0   1.0000   0.9302      117
2.363352 2.340086         8192         8192.0  -1.0000   0.1176      117
2.333212 2.303072        16384        16384.0   1.0000   0.7000      117
2.330067 2.326922        32768        32768.0  -1.0000   0.0493      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.330779
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:18:30,020 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 4.158502749237367 
2021-07-16 11:18:30,020 INFO     [root/vw-hyperopt:306]: loss value: 0.650088
2021-07-16 11:18:30,025 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:01.707501
 76%|█████▎ | 151/200 [09:58<01:42,  2.09s/trial, best loss: 0.6467645901202126]2021-07-16 11:18:30,055 INFO     [hyperopt.tpe/tpe:

only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
3.248506 3.248506            1            1.0   1.0000   0.9428      117
2.041334 0.834162            2            2.0   1.0000   0.8714      117
2.459496 2.877658            4            4.0  -1.0000   0.0325      117
1.909727 1.359959            8            8.0   1.0000   0.4725      117
2.952299 3.994872           16           16.0   1.0000   0.0355      117
3.472532 3.992765           32           32.0  -1.0000   0.1048      117
3.092107 2.711681           64           64.0  -1.0000   0.7209      117
3.229709 3.367312          128          128.0  -1.0000   0.3019      117
3.207317 3.184925          256          256.0   1.0000   0

0.950507 0.851014         2048         2048.0   1.0000   0.7540      117
0.853751 0.756994         4096         4096.0   1.0000   0.4522      117
0.790080 0.726409         8192         8192.0   1.0000   0.9660      117
0.728694 0.667309        16384        16384.0   1.0000   0.9100      117
0.683451 0.638207        32768        32768.0   1.0000   0.6650      117
0.638407 0.593364        65536        65536.0   1.0000   0.9719      117
0.597849 0.557290       131072       131072.0  -1.0000   0.6564      117
0.567371 0.536894       262144       262144.0   1.0000   0.9480      117

finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.551574
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:18:35,373 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pre

2021-07-16 11:18:36,602 INFO     [root/vw-hyperopt:315]: 

Starting trial no.157
2021-07-16 11:18:36,602 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 6.390634548251761 
final_regressor = ./current.model
Num weight bits = 18
learning rate = 6.39063
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.363029 0.032911            2            2.0  -1.0000   0.0324      117
0.358979 0.354929            4            4.0  -1.0000   0.0261      117
1.208404 2.057830            8            8.0   1.0000   0.

2.095480 2.076067         8192         8192.0  -1.0000   0.1212      117
2.071478 2.047477        16384        16384.0   1.0000   0.6437      117
2.063860 2.056242        32768        32768.0  -1.0000   0.0659      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.060970
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:18:38,807 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 1.6930934522961316 
2021-07-16 11:18:38,807 INFO     [root/vw-hyperopt:306]: loss value: 0.652666
2021-07-16 11:18:38,809 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:00.934304
 79%|█████▌ | 158/200 [10:07<00:50,  1.19s/trial, best loss: 0.6464570960668778]2021-07-16 11:18:38,822 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004078 seconds
2021-07-16 11:18:38,

0.501964 0.492854      1048576      1048576.0  -1.0000   0.7869      117

finished run
number of examples per pass = 199995
passes used = 8
weighted example sum = 1599960.000000
weighted label sum = -27864.000000
average loss = 0.497557
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 187178952
2021-07-16 11:18:43,895 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.705087 1.705087            1            1.0   1.0000   0.9094      117
1.636941 1.568795            2            2.0   1.0000   

0.631661 0.580264        16384        16384.0   1.0000   0.7759      117
0.599575 0.567488        32768        32768.0   1.0000   0.5991      117
0.571789 0.544002        65536        65536.0   1.0000   0.9574      117
0.546820 0.521852       131072       131072.0  -1.0000   0.6070      117
0.529094 0.511367       262144       262144.0   1.0000   0.9349      117
0.514813 0.500532       524288       524288.0   1.0000   0.9286      117

finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.512552
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:18:46,648 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_fi

0.650304 0.596725         8192         8192.0   1.0000   0.8824      117
0.601707 0.553111        16384        16384.0   1.0000   0.7323      117
0.572139 0.542571        32768        32768.0   1.0000   0.5936      117
0.549951 0.527763        65536        65536.0   1.0000   0.9074      117
0.533057 0.516162       131072       131072.0  -1.0000   0.5837      117
0.525441 0.517825       262144       262144.0   1.0000   0.8515      117

finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.523113
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:18:48,998 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_fil

0.422126 0.437462            4            4.0  -1.0000   0.1042      117
0.960694 1.499261            8            8.0   1.0000   0.3049      117
0.981633 1.002573           16           16.0   1.0000   0.7248      117
0.908634 0.835634           32           32.0  -1.0000   0.1924      117
0.820809 0.732983           64           64.0  -1.0000   0.0704      117
0.807605 0.794401          128          128.0  -1.0000   0.1823      117
0.772122 0.736639          256          256.0  -1.0000   0.3821      117
0.754190 0.736257          512          512.0   1.0000   0.5074      117
0.719606 0.685022         1024         1024.0  -1.0000   0.2844      117
0.673329 0.627053         2048         2048.0   1.0000   0.7917      117
0.630851 0.588372         4096         4096.0   1.0000   0.4457      117
0.593504 0.556158         8192         8192.0   1.0000   0.7035      117
0.558491 0.523477        16384        16384.0   1.0000   0.6263      117
0.536837 0.515184        32768        32768.0   1.0

2021-07-16 11:18:53,704 INFO     [root/vw-hyperopt:305]: parameter suffix: --l1 2.2484316962207905e-07 --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 0.08139568311514289 
2021-07-16 11:18:53,704 INFO     [root/vw-hyperopt:306]: loss value: 0.683560
2021-07-16 11:18:53,706 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:00.984142
 84%|█████▊ | 167/200 [10:22<00:46,  1.42s/trial, best loss: 0.6464570960668778]2021-07-16 11:18:53,719 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.003532 seconds
2021-07-16 11:18:53,720 INFO     [hyperopt.tpe/tpe:909]: TPE using 167/167 trials with best loss 0.646457
2021-07-16 11:18:53,741 INFO     [root/vw-hyperopt:315]: 

Starting trial no.168
2021-07-16 11:18:53,741 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 5.95

2.119377 1.996789         2048         2048.0  -1.0000   0.1343      117
2.144487 2.169598         4096         4096.0   1.0000   0.9251      117
2.124588 2.104688         8192         8192.0  -1.0000   0.1288      117
2.096002 2.067417        16384        16384.0   1.0000   0.6561      117
2.087075 2.078147        32768        32768.0  -1.0000   0.0712      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.084420
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:18:58,640 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 7 --random_seed 888 -b 18 -l 2.411805660461879 
2021-07-16 11:18:58,640 INFO     [root/vw-hyperopt:306]: loss value: 0.653189
2021-07-16 11:18:58,641 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:03.489566
 84%|█████▉ | 169/200 [10:27<01:03,  2.05s/trial, best los

0.559602 0.539093        65536        65536.0   1.0000   0.8451      117
0.538273 0.516944       131072       131072.0  -1.0000   0.4913      117

finished run
number of examples = 199995
weighted example sum = 199995.000000
weighted label sum = -3483.000000
average loss = 0.527224
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 23397369
2021-07-16 11:19:00,796 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.040872 0.040872            1            1.0   1.0000   0.7689      117
0.067285 0.0

0.586377 0.536599        16384        16384.0   1.0000   0.7234      117
0.555998 0.525618        32768        32768.0   1.0000   0.5556      117
0.532595 0.509193        65536        65536.0   1.0000   0.9091      117
0.512972 0.493348       131072       131072.0  -1.0000   0.5501      117

finished run
number of examples = 199995
weighted example sum = 199995.000000
weighted label sum = -3483.000000
average loss = 0.505549
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 23397369
2021-07-16 11:19:03,222 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     las

using l2 regularization = 7.33254e-07
final_regressor = ./current.model
Num weight bits = 18
learning rate = 1.88145
initial_t = 0
power_t = 0.5
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.385154 0.077161            2            2.0  -1.0000   0.0743      117
0.395189 0.405224            4            4.0  -1.0000   0.0652      117
1.028780 1.662371            8            8.0   1.0000   0.2122      117
1.117326 1.205871           16           16.0   1.0000   0.8031      117
1.005708 0.894091           32           32.0  -1.0000   0.1075      117
0.894805 0.783902           64           64.0  -1.0000   0.0308      117
0.870031 0.845257          128          128.0  -1.0000   0.1096      117
0.831486 0.792942     

2.117316 2.096745         8192         8192.0  -1.0000   0.1270      117
2.090285 2.063253        16384        16384.0   1.0000   0.6571      117
2.081002 2.071719        32768        32768.0  -1.0000   0.0707      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.078312
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:19:16,958 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 4 --random_seed 888 -b 18 -l 2.105484436957253 
2021-07-16 11:19:16,960 INFO     [root/vw-hyperopt:306]: loss value: 0.653161
2021-07-16 11:19:16,963 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:04.399105
 88%|██████▏| 176/200 [10:45<01:17,  3.23s/trial, best loss: 0.6464570960668778]2021-07-16 11:19:16,983 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.003855 seconds
2021-07-16 11:19:16,9


finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.515095
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:19:23,128 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.818957 1.818957            1            1.0   1.0000   0.9128      117
1.564393 1.309829            2            2.0   1.0000   0.8952      117
1.471540 1.378687            4            4.0  -1.0000   0.

0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.367846 0.042545            2            2.0  -1.0000   0.0417      117
0.368393 0.368940            4            4.0  -1.0000   0.0345      117
1.148112 1.927830            8            8.0   1.0000   0.1156      117
1.316088 1.484065           16           16.0   1.0000   0.8819      117
1.147093 0.978098           32           32.0  -1.0000   0.0382      117
1.009936 0.872779           64           64.0  -1.0000   0.0088      117
0.975596 0.941255          128          128.0  -1.0000   0.0528      117
0.940666 0.905737          256          256.0  -1.0000   0.2454      117
0.924957 0.909248          512          512.0   1.0000   0.6397      117
0.891179 0.857400         1024         1024.0  -1.0000   0.0736      117
0.806034 0.720890         2048         2048.0   1.0000   0.7947      117
0.725883 0.645731         4096         4096.0   1.0000   0.3595      117
0.671224 0.616565         8192         8192.0   1.0

0.817164 0.819358        32768        32768.0  -1.0000   0.2367      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 0.818695
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:19:33,383 INFO     [root/vw-hyperopt:305]: parameter suffix: --l2 2.7974669053162753e-05 --link logistic --loss_function logistic --passes 3 --random_seed 888 -b 18 -l 6.784214638339272 
2021-07-16 11:19:33,384 INFO     [root/vw-hyperopt:306]: loss value: 0.691041
2021-07-16 11:19:33,385 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:04.019437
 90%|██████▎| 181/200 [11:01<01:02,  3.30s/trial, best loss: 0.6464570960668778]2021-07-16 11:19:33,411 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.007284 seconds
2021-07-16 11:19:33,415 INFO     [hyperopt.tpe/tpe:909]: TPE using 181/181 trials with best loss 0.646457
2021-07-16 11:19:33,509 INFO     

2.973371 2.961051         1024         1024.0   1.0000   0.5991      117
2.886179 2.798987         2048         2048.0  -1.0000   0.0588      117
2.975975 3.065770         4096         4096.0   1.0000   0.9370      117
2.960273 2.944571         8192         8192.0  -1.0000   0.1294      117
2.917182 2.874091        16384        16384.0   1.0000   0.7235      117
2.922689 2.928196        32768        32768.0  -1.0000   0.0333      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 2.926774
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:19:37,529 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 8.8535572458518 
2021-07-16 11:19:37,531 INFO     [root/vw-hyperopt:306]: loss value: 0.647001
2021-07-16 11:19:37,533 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:

0.578892 0.560422        65536        65536.0   1.0000   0.7572      117
0.559600 0.540308       131072       131072.0  -1.0000   0.4925      117
0.543696 0.527793       262144       262144.0   1.0000   0.7866      117
0.531341 0.518987       524288       524288.0   1.0000   0.7061      117

finished run
number of examples per pass = 199995
passes used = 5
weighted example sum = 999975.000000
weighted label sum = -17415.000000
average loss = 0.524571
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 116986845
2021-07-16 11:19:45,478 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  cu

1.286897 2.225080            8            8.0   1.0000   0.0582      117
1.503016 1.719134           16           16.0   1.0000   0.9334      117
1.284915 1.066814           32           32.0  -1.0000   0.0123      117
1.126429 0.967942           64           64.0  -1.0000   0.0026      117
1.086252 1.046075          128          128.0  -1.0000   0.0263      117
1.066539 1.046826          256          256.0  -1.0000   0.1604      117
1.072304 1.078070          512          512.0   1.0000   0.7079      117
1.059971 1.047638         1024         1024.0  -1.0000   0.0178      117
0.959868 0.859764         2048         2048.0   1.0000   0.7521      117
0.862398 0.764929         4096         4096.0   1.0000   0.4590      117
0.798398 0.734397         8192         8192.0   1.0000   0.9679      117
0.736494 0.674591        16384        16384.0   1.0000   0.9177      117
0.690333 0.644172        32768        32768.0   1.0000   0.6700      117
0.643984 0.597636        65536        65536.0   1.0

1.900042 1.890066        32768        32768.0  -1.0000   0.0806      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 1.897546
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:19:57,336 INFO     [root/vw-hyperopt:305]: parameter suffix: --l2 1.9303247480440872e-07 --link logistic --loss_function logistic --passes 4 --random_seed 888 -b 18 -l 1.578905608212171 
2021-07-16 11:19:57,337 INFO     [root/vw-hyperopt:306]: loss value: 0.654353
2021-07-16 11:19:57,338 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:05.347424
 94%|██████▌| 188/200 [11:25<00:47,  3.98s/trial, best loss: 0.6464570960668778]2021-07-16 11:19:57,371 INFO     [hyperopt.tpe/tpe:873]: build_posterior_wrapper took 0.004459 seconds
2021-07-16 11:19:57,372 INFO     [hyperopt.tpe/tpe:909]: TPE using 188/188 trials with best loss 0.646457
2021-07-16 11:19:57,419 INFO     

only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.748287 1.748287            1            1.0   1.0000   0.9107      117
1.581585 1.414884            2            2.0   1.0000   0.8993      117
1.496479 1.411373            4            4.0  -1.0000   0.0716      117
1.381771 1.267063            8            8.0   1.0000   0.6194      117
2.030908 2.680046           16           16.0   1.0000   0.1026      117
2.331082 2.631255           32           32.0  -1.0000   0.1197      117
2.399609 2.468136           64           64.0  -1.0000   0.7274      117
2.406666 2.413723          128          128.0  -1.0000   0.3567      117
2.251510 2.096353          256          256.0   1.0000   0

0.713003 0.654127         8192         8192.0   1.0000   0.9389      117
0.658176 0.603349        16384        16384.0   1.0000   0.8185      117
0.622224 0.586273        32768        32768.0   1.0000   0.6174      117
0.589577 0.556930        65536        65536.0   1.0000   0.9617      117
0.560324 0.531070       131072       131072.0  -1.0000   0.6232      117
0.539145 0.517967       262144       262144.0   1.0000   0.9383      117
0.522211 0.505276       524288       524288.0   1.0000   0.9248      117

finished run
number of examples per pass = 199995
passes used = 3
weighted example sum = 599985.000000
weighted label sum = -10449.000000
average loss = 0.519519
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 70192107
2021-07-16 11:20:07,386 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight

2021-07-16 11:20:11,001 INFO     [root/vw-hyperopt:315]: 

Starting trial no.194
2021-07-16 11:20:11,004 INFO     [root/vw-hyperopt:240]: executing the following command (training): vw  -d ../PData/train.vw -f ./current.model --holdout_off -c  --l2 2.6800455908272362e-08 --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 0.13153577807217737 
using l2 regularization = 2.68005e-08
final_regressor = ./current.model
Num weight bits = 18
learning rate = 0.131536
initial_t = 0
power_t = 0.5
using cache_file = ../PData/train.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0  -1.0000   0.5000      117
0.545414 0.397681            2            2.0  -1.0000   0.3281      117
0.560601 0.575788            4            4.0  -1.0000   0.3319      117
0.797324 1.034

2.144666 2.138511         1024         1024.0   1.0000   0.7555      117
2.019798 1.894931         2048         2048.0  -1.0000   0.1622      117
2.051307 2.082815         4096         4096.0   1.0000   0.9287      117
2.034556 2.017805         8192         8192.0  -1.0000   0.1355      117
2.010780 1.987005        16384        16384.0   1.0000   0.6223      117
2.000038 1.989296        32768        32768.0  -1.0000   0.0747      117

finished run
number of examples = 49999
weighted example sum = 49999.000000
weighted label sum = -853.000000
average loss = 1.996616
best constant = -0.017060
best constant's loss = 0.999709
total feature number = 5849387
2021-07-16 11:20:15,857 INFO     [root/vw-hyperopt:305]: parameter suffix: --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 1.096238550493158 
2021-07-16 11:20:15,858 INFO     [root/vw-hyperopt:306]: loss value: 0.653428
2021-07-16 11:20:15,860 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 

0.616367 0.644529       262144       262144.0   1.0000   0.5618      117

finished run
number of examples per pass = 199995
passes used = 2
weighted example sum = 399990.000000
weighted label sum = -6966.000000
average loss = 0.640335
best constant = -0.034834
best constant's loss = 0.692995
total feature number = 46794738
2021-07-16 11:20:19,993 INFO     [root/vw-hyperopt:245]: executing the following command (validation): vw  -t -d ../PData/val.vw -i ./current.model -p ./holdout.pred --holdout_off -c 
only testing
predictions = ./holdout.pred
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using cache_file = ../PData/val.vw.cache
ignoring text input in favor of cache input
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0   1.0000   0.5000      117
1.000000 1.000000            2            2.0   1.0000   0.5

0.352032 0.344109            4            4.0  -1.0000   0.0208      117
1.259927 2.167822            8            8.0   1.0000   0.0664      117
1.469184 1.678441           16           16.0   1.0000   0.9255      117
1.259514 1.049844           32           32.0  -1.0000   0.0153      117
1.104857 0.950200           64           64.0  -1.0000   0.0033      117
1.065441 1.026025          128          128.0  -1.0000   0.0299      117
1.042281 1.019121          256          256.0  -1.0000   0.1753      117
1.043032 1.043782          512          512.0   1.0000   0.6963      117
1.025670 1.008309         1024         1024.0  -1.0000   0.0237      117
0.927768 0.829867         2048         2048.0   1.0000   0.7591      117
0.832895 0.738021         4096         4096.0   1.0000   0.4354      117
0.770142 0.707390         8192         8192.0   1.0000   0.9607      117
0.710138 0.650133        16384        16384.0   1.0000   0.8898      117
0.667168 0.624199        32768        32768.0   1.0

2021-07-16 11:20:30,985 INFO     [root/vw-hyperopt:305]: parameter suffix: --l2 1.1171604187758875e-05 --link logistic --loss_function logistic --passes 2 --random_seed 888 -b 18 -l 6.779012122831262 
2021-07-16 11:20:30,985 INFO     [root/vw-hyperopt:306]: loss value: 0.673348
2021-07-16 11:20:30,987 INFO     [root/vw-hyperopt:323]: evaluation time for this step: 0:00:02.745868
100%|███████| 200/200 [11:59<00:00,  2.99s/trial, best loss: 0.6464570960668778]100%|███████| 200/200 [11:59<00:00,  3.60s/trial, best loss: 0.6464570960668778]
2021-07-16 11:20:31,004 DEBUG    [root/vw-hyperopt:346]: the best hyperopt parameters: {'algorithm': 0, 'sgd_b': 0, 'sgd_l': 9.977644028429442, 'sgd_l1_outer': 0, 'sgd_l2_outer': 0, 'sgd_link': 0, 'sgd_loss_function': 0, 'sgd_passes': 1.0, 'sgd_random_seed': 0}
2021-07-16 11:20:31,008 INFO     [root/vw-hyperopt:349]: All the trials results are saved at ./trials.json
2021-07-16 11:20:31,009 INFO     [root/vw-hyperopt:353]: 

A full training com

#### Best model Hyperparameters:

Even though in the log of the hyperopt h-p tuning, we noticed varieties of models including Ridge regression, Lasso Regression, Elastic Net Regression were fitted. The final model we obtained do not have any penalised coefficient and thus is a logistic regression. The learning rate tuned through hyperopt is `9.977644028429442` and the number of epoch (passes) used in the best model is `1`. The seed is set to `888`.

In [80]:
# fit the best hyperparameter on the design set
!vw  -d ../PData/design.vw -f ../PData/model.vw --holdout_off -c  --link logistic --loss_function logistic --passes 1 --random_seed 888 -b 18 -l 9.977644028429442 

final_regressor = ../PData/model.vw
Num weight bits = 18
learning rate = 9.97764
initial_t = 0
power_t = 0.5
creating cache_file = ../PData/design.vw.cache
Reading datafile = ../PData/design.vw
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.693147 0.693147            1            1.0   1.0000   0.5000      117
0.350281 0.007414            2            2.0   1.0000   0.9926      117
1.787172 3.224063            4            4.0   1.0000   0.7460      117
1.939960 2.092748            8            8.0  -1.0000   0.7150      117
2.060769 2.181578           16           16.0   1.0000   0.0119      117
1.901272 1.741776           32           32.0   1.0000   0.0526      117
1.643163 1.385053           64           64.0   1.0000   0.8564      117
1.607780 1.572397          128          128.0  -1.0000   0.3009      117
1.558459 1.509137          256          256.0  -1.0000   0.9

#### Make prediction on the test set
As we have provided a dummy label (1) for the test set, the loss shown in the prediction result is meaningless because we do not possess labels. 

In [81]:
!vw -d ../PData/test.vw -t -i ../PData/model.vw -p ../PData/part5_pred.txt

only testing
predictions = ../PData/part5_pred.txt
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = ../PData/test.vw
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
16.378593 16.378593            1            1.0   1.0000   0.0453      117
10.409762 4.440930            2            2.0   1.0000   0.9572      117
6.542706 2.675650            4            4.0   1.0000   0.5587      117
10.192379 13.842051            8            8.0   1.0000   0.0173      117
7.013160 3.833942           16           16.0   1.0000   0.7781      117
5.098898 3.184635           32           32.0   1.0000   0.1465      117
4.083545 3.068193           64           64.0   1.0000   0.5566      117
4.555026 5.026507          128          128.0   1.0000   0.7111      117
5.593367 6.631709          256          256.0   1.0000   0.5136      117
6.126

#### Create submission file

In [97]:
# get file path
f_path = os.path.join('..', 'PData', 'part5_pred.txt')
# initialise a list
model_pred_test = pd.read_csv(f_path, delimiter = '\n', header=None)

In [105]:
# create submission file
submission = pd.DataFrame({
    'unique_id': test_set['unique_id'],
    'Predicted': model_pred_test[0]
})
# save the submission file
f_name = 'Group_8_part5.csv'
f_path = os.path.join(dirPOutput, f_name)
submission.to_csv(f_path, index=False)

### Note on Part 5

#### Kaggle score: 0.83073
<img src="../screenshots/Group_8_part5_screenshot.png">

For Part 5, the logical processes for our solution are as follows:
- Applied the same preprocessing process as Part 3
- Trained the model through vowpal wabbit and performed h-p training with hyperopt
- Fitted the final vw model on the design set: model = Logistic Regression
- Predicted the test set and create the submission file for Kaggle competition

Compare the result to Part 3:

| Kaggle Competition   |        Part 3          |        Part 5         |
|:---------------------|:-----------------------|:----------------------|
|         AUC          |        0.84734         |       0.83073         |

For Part 5, we compared the result with Part 3 as they are both trained through a penalised logistic regression model. Apparently, the model we got from vowpal wabbit was a logistic regression with no penalised terms. Comparing the AUC score on the test set, we observed that online learning (Part 5) performed slightly worse than batch learning (Part 3). In our humble opinion, this result is in line with our expectations. We believed that batch learning would perform better than online learning as it possesses all available information while online learning only sees one at a time. Similar to human-being, we can make a firmer decision when seeing a big picture rather than just part of the picture. Nevertheless, an online learning framework like vowpal wabbit provides the benefit in terms of speed and memory efficiency. This would allow the system to train on huge datasets (i.e., billions of rows), which sometimes cannot fit in the machine memory.