# Assignment
For this assigment, we are going to do something like k-fold validation to determine the true expected performance of our model

The reason this is a little tricky is due to the fact that we would like to our test and train samples to always have different users.   As we noted in the **prep** workbook, every 4 users is about 10% of the sample, so lets use this to divide our data and determine our average performance of our model.

**TASK**:
To make this concrete, I want you to divide the full sample up into 9 folds, like this:
* fold1: test sample is users >=0 and <4; train sample is users>=4
* fold2: test sample is users >=4 and <8; train sample is users<4 or users>=8
...etc...

Determine the performance of the CNN version of the model (use the model made using the Keras Functional API) for each fold, then average the results.

**EXTRA**: Incorporate a multi-head model (with at least 3 heads) each using a different kernel size.  Do the averaging as above.   Can you come up with hyperparamters that beat the performance of the earlier 2-headed model (though that was measured with just a single fold).

The assigment below has some starter code in to help you begin.


# Code to read data in and normalize it

In [None]:
import pandas as pd
import numpy as np
#
# Use this to convert text to floating point
def convert_to_float(x):
    try:
        return np.float(x)
    except:
        return np.nan

column_names = ['user-id',
                    'activity',
                    'timestamp',
                    'x-axis',
                    'y-axis',
                    'z-axis']
df = pd.read_csv('/fs/scratch/PAS1495/physics6820/WISDM/WISDM_ar_v1.1/WISDM_ar_v1.1_raw.txt',
                     header=None,
                     names=column_names)

# Last column has a ";" character which must be removed ...
df['z-axis'].replace(regex=True,
      inplace=True,
      to_replace=r';',
      value=r'')
    # ... and then this column must be transformed to float explicitly
df['z-axis'] = df['z-axis'].apply(convert_to_float)
    # This is very important otherwise the model will not fit and loss
    # will show up as NAN
#
# Get rid if rows wth missing data
df.dropna(axis=0, how='any', inplace=True)

from sklearn import preprocessing
# Define column name of the label vector
LABEL = 'ActivityEncoded'
# Transform the labels from String to Integer via LabelEncoder
le = preprocessing.LabelEncoder()
# Add a new column to the existing DataFrame with the encoded values
df[LABEL] = le.fit_transform(df['activity'].values.ravel())

#
# Normalize the data: to make things simple, just normalize all of the data (pre train/test) by 20
max_all = 20.0
df['x-axis'] = df['x-axis'] / 20.0
df['y-axis'] = df['y-axis'] / 20.0
df['z-axis'] = df['z-axis'] / 20.0

print(df.head())

max_x = df['x-axis'].max()
max_y = df['y-axis'].max()
max_z = df['z-axis'].max()

print("max values ", max_x,max_y,max_z)


# Method to create test/train samples

In [None]:
from scipy import stats

# Same labels will be reused throughout the program
LABELS = ['Downstairs',
          'Jogging',
          'Sitting',
          'Standing',
          'Upstairs',
          'Walking']
# The number of steps within one time segment
TIME_PERIODS = 80    # since there are 50 measurements/sec, this is 1.6 seconds of data
# The steps to take from one segment to the next; if this value is equal to
# TIME_PERIODS, then there is no overlap between the segments
STEP_DISTANCE_TRAIN = 40
STEP_DISTANCE_TEST = 80

def create_segments_and_labels(df, time_steps, step, label_name):

    # x, y, z acceleration as features
    N_FEATURES = 3
    # Number of steps to advance in each iteration (for me, it should always
    # be equal to the time_steps in order to have no overlap between segments)
    # step = time_steps
    segments = []
    labels = []
    for i in range(0, len(df) - time_steps, step):
        xs = df['x-axis'].values[i: i + time_steps]
        ys = df['y-axis'].values[i: i + time_steps]
        zs = df['z-axis'].values[i: i + time_steps]
        # Retrieve the most often used label in this segment
        label = stats.mode(df[label_name][i: i + time_steps])[0][0]
        segments.append([xs, ys, zs])
        labels.append(label)

    # Bring the segments into a better shape
    reshaped_segments = np.asarray(segments, dtype= np.float32).reshape(-1, time_steps, N_FEATURES)
    labels = np.asarray(labels)

    return reshaped_segments, labels


# Method to initialize weights
You must to something like this before fitting your model if you do it in a loop.  Otherwize the weights will not change from loop to loop.

In [None]:

def shuffle_weights(model, weights=None):
    """Randomly permute the weights in `model`, or the given `weights`.
    This is a fast approximation of re-initializing the weights of a model.
    Assumes weights are distributed independently of the dimensions of the weight tensors
      (i.e., the weights have the same distribution along each dimension).
    :param Model model: Modify the weights of the given model.
    :param list(ndarray) weights: The model's weights will be replaced by a random permutation of these weights.
      If `None`, permute the model's current weights.
    """
    if weights is None:
        weights = model.get_weights()
    weights = [np.random.permutation(w.flat).reshape(w.shape) for w in weights]
    # Faster, but less random: only permutes along the first dimension
    # weights = [np.random.permutation(w) for w in weights]
    model.set_weights(weights)

In [None]:
import keras
from keras.layers import Input,Conv1D, MaxPooling1D,GlobalAveragePooling1D,Dropout,Dense
from keras.models import Model
# 
# Our first layer gets the input from our samples - this is 80 time steps by 3 channels
#model_m.add(Conv1D(100, 10, activation='relu', input_shape=(80,3)))
inputs1 = Input(shape=(80,3))
conv1 = Conv1D(100, 10, activation='relu')(inputs1)
#
# Anoth convolutional layer
#model_m.add(Conv1D(100, 10, activation='relu'))
conv2 = Conv1D(100, 10, activation='relu')(conv1)
#
# Max pooling 
#model_m.add(MaxPooling1D(3))
pool1 = MaxPooling1D(3)(conv2)
#
# Two more convolutional layers
#model_m.add(Conv1D(160, 10, activation='relu'))
#model_m.add(Conv1D(160, 10, activation='relu'))
conv3 = Conv1D(160, 10, activation='relu')(pool1)
conv4 = Conv1D(160, 10, activation='relu')(conv3)
#
# Global average pooling use this instead of "Flatten" - it helps reduce overfitting
#model_m.add(GlobalAveragePooling1D())
glob1 = GlobalAveragePooling1D()(conv4)
#
drop1 = Dropout(0.5)(glob1)
outputs = Dense(num_classes, activation='softmax')(drop1)

#
# Now define the model
model_m = Model(inputs=inputs1, outputs=outputs)
print(model_m.summary())    
 

# TASK: 

Divide the full sample up into 9 folds, like this:

*  fold1: test sample is users >=0 and <4; train sample is users>=4
*  fold2: test sample is users >=4 and <8; train sample is users<4 or users>=8 
*  ...etc...

Determine the performance of the CNN version of the model (use the model made using the Keras Functional API) for each fold, then average the results.




# Example pseudo-code
We have 36 users.   So lets group them in steps of 4, and use each group of 4 as our test, and the others as our train.

Fill in the rest of the code!

In [None]:
#
# You can define the model outside of the loop


#
# We want our users to 
user_start = 0
for user_groups in range(9):
#
# Define the users who will form the test group.   The train group is everybody else!
    user_start = user_groups*4
    user_end = user_start + 4
    print()
    print("User test group",user_start,user_end)
#
# Define the test and train dataframes here (using user_start and user_start)

#
# Create the x_train, y_train ad  x_test, y_test samples from the above dataframes

#
# Remember to process the y_train and y_test to make one hot versions

#
# Fit the model with these samples

#
# Grab the val_accuracy from the appropriate epoch and store it

#
# When loop is done, average the results to get the overall expected accuracy

# EXTRA
Incorporate a multi-head model (with at least 3 heads) each using a different kernel size.  Do the averaging as above.   Can you come up with hyperparamters that beat the performance of the earlier 2-headed model (though that was measured with just a single fold).
