## Agenda

* Training a NN to recognize gender from voice data (Kaggle)
* Predicting new instances with a NN

In [None]:
%pylab inline

import numpy as np
import pandas as pd

# pd.DataFrame.to_numpy() is not working in v0.23.X
print(pd.__version__)

import sklearn
from sklearn.model_selection import train_test_split

# Fetching local copy of the data from Kaggle.
df = pd.read_csv('./voice.csv')
df.head()

# Get the label encoder and scaling stuff from scikit-learn 
# (NB: scaling not used here for neural network demo, but might prove useful in tuning...)
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler

In [None]:
# Split Pandas Frame in features...
x = df.iloc[:,0:-1]
#...and label.
y = df.iloc[:,-1]

# Scale the data (which means scaling the extremes, namely -- for 'voices.csv' -- the skewness and curtosis)
scaler = MinMaxScaler(feature_range=(0., 1.))
x = scaler.fit_transform(x)

# Re-define labels as '0' or '1', rather than 'male' or 'female'
label = LabelEncoder()
y = label.fit_transform(y)

In [None]:
type(x), type(y)

In [None]:
# Split that data in training set and testing (validation) set (using scikit-learn lib.).
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [None]:
# Check shapes
x_train.shape, y_train.shape, x_test.shape, y_test.shape

In [None]:
# Check x_train's type
type(x_train), type(x_train)

In [None]:
x_train_flat = x_train
x_test_flat = x_test

In [None]:
x_train_flat.shape, x_test_flat.shape, y_train.shape, y_test.shape

In [None]:
type(y_train), type(y_test)

## Neural network prediction

![](../Week12/nn.png)

## Predicting image classes with Keras

In [None]:
from keras import models
from keras import layers


print(x_train_flat.shape[1])

model = models.Sequential([
  layers.Dense(128, activation='relu', input_shape= (x_train_flat.shape[1],) ),
  layers.Dropout(0.2),
  layers.Dense(2, activation='softmax')
])

In [None]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# Fitting the model gives VERY good score (0.99x), but score may not be good measure. to be sure.
model.fit(x_train_flat, y_train, epochs=500)

In [None]:
# Evaluation shows less flashy performance. Should look into discrepancy with cell above..
model.evaluate(x_test_flat, y_test)

## Using keras

* Define a model of **layers**
* Compile the model with an optimiser and a loss function
* Train the model
* Evaluate the model