# Baseline Model

Here, we're going to establish a  baseline model for our audio classification model. The baseline will use, as defined in the "models.py" class, will be a 2 layer neural network, with a hidden layer size of 20. 

There is a dropout layer with a 0.75 just after the hidden layer and before the output softmax layer. 

We use a standard scaler because our features are on different scales (look at data prep for more info), which will ensure we get more effecient/effective error minimization.

Afterwards, we will expand this baseline by adding more hidden layers and the necessary model regularization.

We're going to see how this baseline performs with 3 different feature selection methods, so we don't have to try out these permutations later (it's difficult to integrate all these different feature selection methods with sklearn random search). So, we will simply choose the feature selection method that gives the best validation results on the baseline model.

In [1]:
# so we have access to the Google Drive filesystem
#from google.colab import drive
#drive.mount('/content/drive')

In [3]:
# necessary imports
import os
import pandas as pd
import numpy as np

# so we can access local modules within Colab
#os.chdir('/content/drive/My Drive/auto-age-detector-model')

# feature selection defined functions
from feature_selection import lasso_feature_selection
from feature_selection import tree_based_feature_selection
from feature_selection import chi_squared_feature_selection
from feature_selection import pca_feature_selection

# baseline model creation
from models import baseline_model

# for feature scaling
from sklearn.preprocessing import StandardScaler

Using TensorFlow backend.


Here, we import the training data, omitting feature unneccessary for our model. Then we split it into our inputs and outputs.

In [5]:
df_train = pd.read_csv('C:\\Users\\gotty\\Desktop\\project\\models\\sample.csv').drop(columns=['Unnamed: 0','path'])
# drop any null values we may have forgotten
df_train = df_train.dropna(how='any',axis=0)
X_train = df_train.drop(columns=['age'],axis=1)
y_train = df_train['age']

We one hot encode the outputs so we can build a multiclass classification model.

In [8]:
replaced = {'teens':0,'twenties':1,'thirties':2,'fourties':3,'fifties':4,
            'sixties':5,'seventies':6,'eighties':7,'nineties':8}

# https://stackoverflow.com/questions/29831489/convert-array-of-indices-to-1-hot-encoded-numpy-array

# need to put one hot encoded in keras model
y_train_ohe = y_train.replace(replaced).astype(int)
from numpy import array
y_train_ohe = array(y_train_ohe)
y_train_ohe = np.eye(np.max(y_train_ohe)+1)[y_train_ohe]

Now we're going to fit a model with different feature selection methods.

We're going to use a validation split of 0.2, 15 epochs, and a batch size of 32 for these initial baselines. 

## Baseline with L1 feature selection - Logistic Regression

In [17]:
# feature scaling before to help with convergence
scaler = StandardScaler()
scaler.fit(X_train)
X_train_l1_log_reg = scaler.transform(X_train)

X_train_l1_log_reg,data_transformer = lasso_feature_selection(X_train_l1_log_reg,
                                                              y_train,
                                                              model='logreg')

print(f'Reduced to {X_train_l1_log_reg.shape[1]} features.')


model_l1_log_reg = baseline_model(4)
model_l1_log_reg.fit(X_train_l1_log_reg,y_train_ohe,batch_size=32,
                    validation_split=0.15,epochs=10)

Reduced to 0 features.




AssertionError: 

## Baseline with tree-based feature selection

In [16]:
X_train_tree,data_transformer = tree_based_feature_selection(X_train,y_train,
                                                        n_estimators=75)

print(f'Reduced to {X_train_tree.shape[1]} features.')
# feature scaling after
scaler = StandardScaler()
scaler.fit(X_train_tree)
X_train_tree = scaler.transform(X_train_tree)

model_tree = baseline_model(4)
model_tree.fit(X_train_tree,y_train_ohe,batch_size=32,
                    validation_split=0.15,epochs=10)

Reduced to 63 features.

Train on 26 samples, validate on 5 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x1fc23190608>

## Baseline with chi_squared_based_feature selection

Try with 80 best features first.

In [18]:
X_train_chi2_80,data_transformer = chi_squared_feature_selection(X_train,
                                                              y_train)

print(f'Reduced to {X_train_chi2_80.shape[1]} features.')
# feature scaling after
scaler = StandardScaler()
scaler.fit(X_train_chi2_80)
X_train_chi2_80 = scaler.transform(X_train_chi2_80)

model_chi2_80 = baseline_model(4)
model_chi2_80.fit(X_train_chi2_80,y_train_ohe,batch_size=32,
                  validation_split=0.15,epochs=10)

Reduced to 80 features.
Train on 26 samples, validate on 5 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x1fc1039bc88>

Now try with 50 features to see if performance improves.

In [19]:
X_train_chi2_50,data_transformer = chi_squared_feature_selection(X_train,
                                                              y_train,k=50)

print(f'Reduced to {X_train_chi2_50.shape[1]} features.')
# feature scaling after
scaler = StandardScaler()
scaler.fit(X_train_chi2_50)
X_train_chi2_50 = scaler.transform(X_train_chi2_50)

model_chi2_50 = baseline_model(4)
model_chi2_50.fit(X_train_chi2_50,y_train_ohe,batch_size=32,
                  validation_split=0.15,epochs=10)

Reduced to 50 features.
Train on 26 samples, validate on 5 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x1fc117be788>

## PCA Dimensionality Reduction

In [22]:
# feature scaling because PCA is bias towards high magnitude features
scaler = StandardScaler()
scaler.fit(X_train)
X_train_pca = scaler.transform(X_train)

# get 80 best features
X_train_pca,data_transformer = pca_feature_selection(X_train_pca,31)

model_pca = baseline_model(4)
model_pca.fit(X_train_pca,y_train_ohe,batch_size=32,
                  validation_split=0.15,epochs=10)

Train on 26 samples, validate on 5 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x1fc12d96e88>

We've finished all feature selection methods for our baseline models. Luckily, all baseline models are decreasing training error after each epoch, which verifies that our neural network is fitting to the data.

Tree-based feature selection seemed to yield the best results, so we will use this type of feature selection going forward.