# Kaggle Competition

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import mltools as ml

np.random.seed(0)
%matplotlib inline

## Loading Data

In [None]:
# Load the training data
X = np.genfromtxt('data/X_train.txt', delimiter=None)
Y = np.genfromtxt('data/Y_train.txt', delimiter=None)

# And the test features
Xte = np.genfromtxt('data/X_test.txt', delimiter=None)

In [None]:
print(X.shape, Y.shape, Xte.shape)

All your work should be done on the training data set. To be able to make educated decisions on which classifier you're going to use, you should split it into train and validation data sets.

In [None]:
Xtr, Xva, Ytr, Yva = ml.splitData(X, Y) # Default is 80% training/20% validation
Xtr, Ytr = ml.shuffleData(Xtr, Ytr)

In [None]:
print(Xtr.shape, Xva.shape)

In [None]:
# Taking a subsample of the data so that trains faster.  You should train on whole data for the Kaggle competition.
Xt, Yt = Xtr[:4000], Ytr[:4000]

## World of Classifiers
Time to start doing some classifications! We'll demonstrate a number of classifiers you may want to use for the final project, but may or may not have covered in class at this point. For now, we'll focus on how to use these classifiers as a black box with MLTools.

**IMPORTANT NOTE:**  For the Kaggle competition, you need to submit probabilities and not just class predictions. Don't worry, you don't need to code that, just use the `predictSoft()` function.

## Decision Tree

Decision trees will be covered in Tuesday's lecture.

In [None]:
# The decision tree classifier has minLeaf and maxDepth parameters. You should know what these mean by next week.
learner = ml.dtree.treeClassify(Xt, Yt, minLeaf=30, maxDepth=20)

# Prediction
probs = learner.predictSoft(Xte)

The `predictSoft` method returns an $M \times C$ table in which for each data point you have the proability of each class.

In [None]:
probs

We can also compute the AUC for both the training and validation data sets. For a refresher on AUC, look at the Bayes [lecture slides](https://canvas.eee.uci.edu/courses/24330/pages/lecture-calendar) or the [Wikipedia article](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve).

In [None]:
print("{0:>15}: {1:.4f}".format('Train AUC', learner.auc(Xt, Yt)))
print("{0:>15}: {1:.4f}".format('Validation AUC', learner.auc(Xva, Yva)))

Play with different parameters to see how AUC changes.

### Printing a decision tree
The MLTools implementation of the decision tree classifier also provides a printing mechanism. However, it only works up to depth 2, so not very useful for us.

In [None]:
learner = ml.dtree.treeClassify()
learner.train(Xt, Yt, maxDepth=2)
print(learner)

## Linear Classifier 

In [None]:
# Scaling the data
XtrP, params = ml.rescale(Xt)
XteP, _ = ml.rescale(Xte, params)

print(XtrP.shape, XteP.shape)

Note that we do **not** need to scale the data for decision trees.

In [None]:
# Training the model
learner = ml.linearC.linearClassify()
learner.train(XtrP, Yt, initStep=0.5, stopTol=1e-6, stopIter=100)

probs = learner.predictSoft(XteP)
print(probs[:5])

And the AUC is:

In [None]:
print("{0:>15}: {1:.4f}".format('Train AUC',learner.auc(XtrP, Yt)))
print("{0:>15}: {1:.4f}".format('Validation AUC', learner.auc(Xva, Yva)))

This is why we're using a validation data set. We can see already that for THIS specific configuration the decision tree is much better. It is very likely that it'll be better on the test data.

## Neural Network

Neural networks will be covered in Thursday's lecture. We'll use the MLTools implementation in our examples. Having said that, if you want to use some more fancy packages you are more than welcome to do that.

In [None]:
nn = ml.nnet.nnetClassify()

After we construct the classifier, we can define the sizes of its layers and initialize their values with `nn.init_weights`:

    nn.init_weights(self, sizes, init, X, Y)

From the method description (see `mltools/nnet.py`):
- `sizes = [Ninput, N1, N2, ... , Noutput]`, where `Ninput` = # of input features, and `Nouput` = # classes
- `init = {'zeros', 'random'}` : initialize to all zeros or small random values (breaks symmetry)

Training the model using gradient descent, we can track the surrogate loss (here, MSE loss on the output vector, compared to a 1-of-K representation of the class), as well as the 0/1 classification loss (error rate):

In [None]:
nn.init_weights([14, 5, 3], 'random', Xt, Yt)
nn.train(Xt, Yt, stopTol=1e-8, stepsize=.25, stopIter=50)

### WHAT DID WE DO WRONG?

In [None]:
# Need to specify the right number of input and output layers.
nn.init_weights([Xt.shape[1], 5, len(np.unique(Yt))], 'random', Xt, Yt)
nn.train(Xt, Yt, stopTol=1e-8, stepsize=.25, stopIter=50)  # Really small stopIter so it will stop fast :)

In [None]:
print("{0:>15}: {1:.4f}".format('Train AUC',nn.auc(Xt, Yt)))
print("{0:>15}: {1:.4f}".format('Validation AUC', nn.auc(Xva, Yva)))

The AUC results are bad because we just used a simple configuration of the NN. NN can be engineered until your last day, but some things should make sense to you.

One example is the option to change the activation function. This is the function that is in the inner layers. By default, the code comes with the `tanh`, but the `logistic` (sigmoid) is also coded in and you can just specify it.

In [None]:
nn.setActivation('logistic')

nn.train(Xt, Yt, stopTol=1e-8, stepsize=.25, stopIter=100)
print("{0:>15}: {1:.4f}".format('Train AUC',nn.auc(Xt, Yt)))
print("{0:>15}: {1:.4f}".format('Validation AUC', nn.auc(Xva, Yva)))

### Writing your own activation function

Not suprisingly, you can also provide a custom activation function. Note that for the last layer you will probably always want the sigmoid function, so only change the inner layer ones.

The function definition is this:

    setActivation(self, method, sig=None, d_sig=None, sig_0=None, d_sig_0=None)
    
You can call it with `method='custom'` and then specify both `sig` and `d_sig` (the `sig_0` and `d_sig_0` ones are for the last layer).

In [None]:
# Here's a dummy activation method (f(x) = x)
sig = lambda z: np.atleast_2d(z)
dsig = lambda z: np.atleast_2d(1)

In [None]:
nn = ml.nnet.nnetClassify()
nn.init_weights([Xt.shape[1], 5, len(np.unique(Yt))], 'random', Xt, Yt)

nn.setActivation('custom', sig, dsig)

nn.train(Xt, Yt, stopTol=1e-8, stepsize=.25, stopIter=100)
print("{0:>15}: {1:.4f}".format('Train AUC',nn.auc(Xt, Yt)))
print("{0:>15}: {1:.4f}".format('Validation AUC', nn.auc(Xva, Yva)))

## Tuning Hyperparameters

We've learned that one way of guessing how well we're doing with different model parameters is to plot the train and validation errors as a function of that parameter (e.g, $k$ in the KNN, or degree in the linear classifier and regression).

Now it seems like there could be more parameters involved. One example is the `minLeaf` and the `maxDepth` values in a decision tree.

When it's two parameters, you can simply use heatmaps. The $x$ and $y$ axes represent the parameters, and the "heat" is the validation/train error as a "third" dimension.

We're going to use a dummy function to show that. Let's assume we have two parameters `p1` and `p2`, and the prediction accuracy is `p1 + p2` (just as an example, you would use a real evaluation metric like the AUC in real applications).

In [None]:
p1 = np.arange(5)
p2 = np.arange(5)

In [None]:
acc = np.zeros([p1.shape[0], p2.shape[0]])
for i in range(p1.shape[0]):
    for j in range(p2.shape[0]):
        acc[i][j] = p1[i] + p2[j]

In [None]:
acc

In [None]:
f, ax = plt.subplots(1, 1, figsize=(8, 5))

heatmap = ax.matshow(acc)
f.colorbar(heatmap)

ax.set_xticks(p1)
ax.set_xticklabels(['a', 'b', 'c', 'd', 'e'])

ax.set_yticks(p2)
ax.set_yticklabels(['%d' % p for p in p2])

plt.show()

## Submitting Predictions

Let's assume that the last classifier we ran was the best one (after we used all that we know to verify it is the best one including that plot from the previous block). Now let's run it on the test data, and create a file that can be submitted.

Each line in the file is a data point ID and the probability of $P(Y=1)$. There's also a header line. Here's how you can create it simply from the probs matrix.

In [None]:
probs

In [None]:
# Create the data for submission by taking the P(Y=1) column from probs and just add a running index as the first column.
Y_sub = np.vstack([np.arange(Xte.shape[0]), probs[:, 1]]).T

# We specify the header (ID, Prob1) and also specify the comments as '' so the header won't be commented out with
# the # sign.
np.savetxt('data/Y_sub.txt', Y_sub, '%d, %.5f', header='ID,Prob1', comments='', delimiter=',')

Finally, we are ready to upload our predictions to the Kaggle website, and compete to top the leaderboard!