It occurred to me that sklearn's logistic regression is not really appropriate here. It's one-vs-all algorithm does not capture, I think, the correct loss penalty. All classes are not of the same type. The classes are actually linear in scale. I need a model which penalises predictions which are far away from actual values for open channels.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
plt.rcParams['agg.path.chunksize'] = 10000

In [3]:
df = pd.read_csv('train.csv',index_col=0)

  mask |= (ar1 == a)


In [4]:
df_terminus = df.index.values[-1]
batches = int(df_terminus/50)
seconds_in_batch = 50
freq = 10000

print(batches)

10


In [5]:
base = np.array([])
for batch in range(batches):
    mini_array = np.ones(50*10000)*batch
    base = np.append(base,mini_array)
print(len(base))

5000000


In [6]:
df['batch'] = base
df['batch'] = df['batch'].astype('int')

In [7]:
batches = int(500/50)
seconds_in_batch = 50
freq = 10000

X = []
y = []

for batch_number in range(batches):
    print('Starting new batch: ', batch_number)
    window_size = 0.001
    batch_df = df[df.batch == batch_number]
    indices = batch_df.index.values
    padd_no = 0
    for index in indices:
        
        if index % 25 == 0:
            print(index)
            
        window_df = batch_df.loc[index-window_size/2:index+window_size/2]
        open_channels = window_df.loc[index].open_channels
        features = window_df.signal.values

        if len(features) <= int(window_size*10000):

            padding = np.ones((int(window_size*10000)-len(features))+1)*features.mean()
            features = np.append(features,padding)
            padd_no += 1
            
        X.append(features)
        y.append(open_channels)
    print('End batch: ', batch_number, ' with padded fraction = ', padd_no/len(indices))
    
    break
X = np.array(X)
y = np.array(y)

Starting new batch:  0
25.0
50.0
End batch:  0  with padded fraction =  0.279776


In [8]:
print(X.shape)

print(y.shape)

(500000, 11)
(500000,)


In [9]:
from sklearn.model_selection import train_test_split

X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.10, random_state=42)

X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.20, random_state=42)

from sklearn.linear_model import LinearRegression

clf = LinearRegression().fit(X_train, y_train)

preds_val = clf.predict(X_val)

preds_val = np.rint(preds_val)

print((preds_val == y_val).mean())

This result is even better than when I used logistic regression (as expected). I can even 'clean up' the preds a little bit to make sure they're all reasonable.

preds_copy = preds_val

for i in range(len(preds_val)):
    if preds_val[i]<0:
        preds_val[i] = 0
print((preds_val == y_val).mean())

'cleaning up' gives almost no improvement. It looks like almost all the values are within reasonable limits.

from sklearn.metrics import f1_score

print('f1 score for vals is ', f1_score(y_val, preds_val, average='macro'))

preds_test = clf.predict(X_test)
preds_test = np.rint(preds_test)
print('f1 score for test is ', f1_score(y_test, preds_test, average='macro'))

The above approach is more successful than log reg too. I'm also seeing similar F1 values for val and pred (val slightly higher) which is consistent with ML principles and suggests that I'm not overtraining

print((preds_test == y_test).mean())

Now for the first time, let's throw a neural network at it and see what happens.
https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/301_regression.py



In [35]:
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt


class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden_1,n_hidden_2, n_output):
        super(Net, self).__init__()
        self.hidden_1 = torch.nn.Linear(n_feature, n_hidden_1)# hidden layer
        self.hidden_2 = torch.nn.Linear(n_hidden_1, n_hidden_2)
        self.predict = torch.nn.Linear(n_hidden_2, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden_1(x))# activation function for hidden layer
        x = F.relu(self.hidden_2(x))
        x = self.predict(x)             # linear output
        return x

net = Net(n_feature=11, n_hidden_1=50, n_hidden_2=20, n_output=1)     # define the network
print(net)  # net architecture

optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()  # this is for regression mean squared loss

plt.ion()   # something about plotting

Net(
  (hidden_1): Linear(in_features=11, out_features=50, bias=True)
  (hidden_2): Linear(in_features=50, out_features=20, bias=True)
  (predict): Linear(in_features=20, out_features=1, bias=True)
)


To reduce compute power, I'll only take the first 1000 values of the train series.

In [36]:
x = torch.tensor(X_train[:1000])
x = x.to(device='cuda')
y = torch.tensor(y_train[:1000])
y = y.to(device='cuda')

In [37]:
torch.cuda.is_available()

True

In [38]:
net.cuda()

Net(
  (hidden_1): Linear(in_features=11, out_features=50, bias=True)
  (hidden_2): Linear(in_features=50, out_features=20, bias=True)
  (predict): Linear(in_features=20, out_features=1, bias=True)
)

In [39]:
losses = []
for t in range(200):
    
    
    prediction = net(x.float())     # input x and predict based on x

    loss = loss_func(prediction, y.float())     # must be (1. nn output, 2. target)

    optimizer.zero_grad()   # clear gradients for next train
    loss.backward()         # backpropagation, compute gradients
    optimizer.step()        # apply gradients

    if t % 5 == 0:
        losses.append(loss)
        print(loss.item())
    # Sanity check
    
plt.ioff()
plt.show()

0.042949117720127106
0.035172492265701294
0.03508143872022629
0.035070765763521194
0.03506045415997505
0.03505045548081398
0.03504081070423126
0.03503149002790451
0.03502243384718895
0.03501362353563309
0.035005081444978714
0.03499665483832359
0.034988366067409515
0.03498020023107529
0.03497215360403061
0.03496414050459862
0.03495592251420021
0.034947093576192856
0.03493800386786461
0.03492914140224457
0.03492065519094467
0.03491266444325447
0.03490583971142769
0.03489989787340164
0.03489459678530693
0.03488977625966072
0.0348852276802063
0.03488088399171829
0.034876711666584015
0.03487265855073929
0.03486871346831322
0.034864895045757294
0.03486116975545883
0.03485756367444992
0.03485405817627907
0.034850653260946274
0.03484736755490303
0.03484416380524635
0.034841034561395645
0.034837983548641205


In [59]:
preds_clean = prediction.detach().cpu().clone().numpy()

y_clean = y.detach().cpu().clone().numpy()

preds_clean = preds_clean.astype('float').rint()

preds_clean = np.concatenate(preds_clean)

preds_clean = np.rint(preds_clean)

(preds_clean == y_clean).mean()

This seems to be working. So what I'll do now is test the first 10000 values.

In [60]:
x = torch.tensor(X_train[:10000])
x = x.to(device='cuda')
y = torch.tensor(y_train[:10000])
y = y.to(device='cuda')
net = Net(n_feature=11, n_hidden_1=50, n_hidden_2=20, n_output=1)     # define the network
print(net)  # net architecture

optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()  # this is for regression mean squared loss

plt.ion()   # something about plotting

net.cuda()

losses = []
for t in range(200):
    
    
    prediction = net(x.float())     # input x and predict based on x

    loss = loss_func(prediction, y.float())     # must be (1. nn output, 2. target)

    optimizer.zero_grad()   # clear gradients for next train
    loss.backward()         # backpropagation, compute gradients
    optimizer.step()        # apply gradients

    if t % 5 == 0:
        losses.append(loss)
        print(loss.item())
    
    # Sanity check
    
plt.ioff()
plt.show()


Net(
  (hidden_1): Linear(in_features=11, out_features=50, bias=True)
  (hidden_2): Linear(in_features=50, out_features=20, bias=True)
  (predict): Linear(in_features=20, out_features=1, bias=True)
)
0.051225725561380386


  return F.mse_loss(input, target, reduction=self.reduction)


0.03650752082467079
0.031662411987781525
0.031633276492357254
0.031632889062166214
0.0316329263150692
0.03163289651274681
0.03163288161158562
0.03163295239210129
0.0316329263150692
0.031632911413908005
0.03163289651274681
0.03163287416100502
0.03163284435868263
0.03163281828165054
0.03163280338048935
0.03163277357816696
0.031632788479328156
0.03163275867700577
0.03163273632526398
0.031632713973522186
0.03163269907236099
0.0316326841711998
0.031632669270038605
0.03163263946771622
0.031632617115974426
0.03163260594010353
0.03163258731365204
0.03163289651274681
0.03163288161158562
0.031632859259843826
0.03163284435868263
0.03163281828165054
0.031632810831069946
0.03163279592990875
0.03163278102874756
0.031632762402296066
0.03163273632526398
0.031632713973522186
0.03163270279765129


In [62]:
preds_clean = prediction.detach().cpu().clone().numpy()

y_clean = y.detach().cpu().clone().numpy()

preds_clean = np.concatenate(preds_clean)
preds_clean = np.rint(preds_clean)

(preds_clean == y_clean).mean()

0.9673

Now, to use one of the old metrics

In [63]:
from sklearn.metrics import f1_score

print('f1 score for trains is ', f1_score(y_clean, preds_clean, average='macro'))


f1 score for trains is  0.4916891170639964


  'precision', 'predicted', average, warn_for)


'not actually a bad f1, although we definitely have to consider this may be overfitting. Let's see what kind of preds I get on the test set with this net as is'

In [65]:
x = torch.tensor(X_val[:10000])
x = x.to(device='cuda')
y = torch.tensor(y_val[:10000])
y = y.to(device='cuda')

prediction = net(x.float())

preds_clean = prediction.detach().cpu().clone().numpy()

y_clean = y.detach().cpu().clone().numpy()

preds_clean = np.concatenate(preds_clean)
preds_clean = np.rint(preds_clean)

(preds_clean == y_clean).mean()

print('f1 score for vals is ', f1_score(y_clean, preds_clean, average='macro'))


f1 score for trains is  0.4923342471316885


f1 score for trains is  0.4923342471316885. That's actually not bad at all! Obviously this neural network isn't better than linear regression for this batch, but it's pretty good!. Similar f1 scores on train and val also indicates that I'm not overtraining. According to Andrew Ng it may also indicate that my neural net is not complicated enough... yet. I wonder if including more layers will help.

In [66]:
class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden_1,n_hidden_2, n_hidden_3, n_output):
        super(Net, self).__init__()
        self.hidden_1 = torch.nn.Linear(n_feature, n_hidden_1)# hidden layer
        self.hidden_2 = torch.nn.Linear(n_hidden_1, n_hidden_2)
        self.hidden_3 = torch.nn.Linear(n_hidden_2, n_hidden_3)
        self.predict = torch.nn.Linear(n_hidden_3, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden_1(x))# activation function for hidden layer
        x = F.relu(self.hidden_2(x))
        x = F.relu(self.hidden_3(x))
        x = self.predict(x)             # linear output
        return x

net = Net(n_feature=11, n_hidden_1=100, n_hidden_2=50, n_hidden_3=10, n_output=1)     # define the network
print(net)  # net architecture

optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()  # this is for regression mean squared loss

plt.ion()   # something about plotting

Net(
  (hidden_1): Linear(in_features=11, out_features=100, bias=True)
  (hidden_2): Linear(in_features=100, out_features=50, bias=True)
  (hidden_3): Linear(in_features=50, out_features=10, bias=True)
  (predict): Linear(in_features=10, out_features=1, bias=True)
)


In [68]:
x = torch.tensor(X_train[:10000])
x = x.to(device='cuda')
y = torch.tensor(y_train[:10000])
y = y.to(device='cuda')

print(net)  # net architecture

optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()  # this is for regression mean squared loss

plt.ion()   # something about plotting

net.cuda()

losses = []
for t in range(200):
    
    
    prediction = net(x.float())     # input x and predict based on x

    loss = loss_func(prediction, y.float())     # must be (1. nn output, 2. target)

    optimizer.zero_grad()   # clear gradients for next train
    loss.backward()         # backpropagation, compute gradients
    optimizer.step()        # apply gradients

    if t % 5 == 0:
        losses.append(loss)
        print(loss.item())
    
    # Sanity check
    
plt.ioff()
plt.show()


Net(
  (hidden_1): Linear(in_features=11, out_features=100, bias=True)
  (hidden_2): Linear(in_features=100, out_features=50, bias=True)
  (hidden_3): Linear(in_features=50, out_features=10, bias=True)
  (predict): Linear(in_features=10, out_features=1, bias=True)
)
0.031659338623285294
0.0316479317843914
0.03164779394865036
0.031647659838199615
0.03164752572774887
0.03164739906787872
0.03164726868271828
0.03164714574813843
0.03164701908826828
0.031646888703107834
0.031646765768527985
0.031646646559238434
0.031646523624658585
0.031646400690078735
0.03164628520607948
0.031646162271499634
0.03164605051279068
0.03164593130350113
0.03164581581950188
0.03164571151137352
0.03164559602737427
0.031645480543375015
0.03164537623524666
0.031645264476537704
0.03164515644311905
0.031645048409700394
0.03164494410157204
0.03164483606815338
0.03164473548531532
0.031644634902477264
0.031644534319639206
0.03164443373680115
0.03164433315396309
0.03164423257112503
0.03164413571357727
0.03164403885602951
0

In [69]:
x = torch.tensor(X_val[:10000])
x = x.to(device='cuda')
y = torch.tensor(y_val[:10000])
y = y.to(device='cuda')

prediction = net(x.float())

preds_clean = prediction.detach().cpu().clone().numpy()

y_clean = y.detach().cpu().clone().numpy()

preds_clean = np.concatenate(preds_clean)
preds_clean = np.rint(preds_clean)

(preds_clean == y_clean).mean()

print('f1 score for vals is ', f1_score(y_clean, preds_clean, average='macro'))


f1 score for vals is  0.4923342471316885


f1 score for vals is  0.4923342471316885. So there's only a tiny improvement in the f1 score.

In [72]:
(preds_clean == 0).mean()

1.0

Bugger. The neural net is predicting 0 every time! There aren't enough open_channel values of 1 or more to knock the NN in that direction. This is really bad. I wonder if there's a way to weight the classes. First I'll train on the whole first batch to make sure.

In [76]:
from torch.utils import data

net = Net(n_feature=11, n_hidden_1=100, n_hidden_2=50, n_hidden_3=10, n_output=1)     # define the network
print(net)  # net architecture

optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()  # this is for regression mean squared loss

plt.ion()   # something about plotting

x = torch.tensor(X_train)
y = torch.tensor(y_train)

net.cuda()

train_dataset = data.TensorDataset(x, y)
train_loader = data.DataLoader(dataset, batch_size=1000,pin_memory = True)

Net(
  (hidden_1): Linear(in_features=11, out_features=100, bias=True)
  (hidden_2): Linear(in_features=100, out_features=50, bias=True)
  (hidden_3): Linear(in_features=50, out_features=10, bias=True)
  (predict): Linear(in_features=10, out_features=1, bias=True)
)


In [82]:
losses = []

for t in range(10):
    
    for x_batch, y_batch in train_loader:
        # the dataset "lives" in the CPU, so do our mini-batches
        # therefore, we need to send those mini-batches to the
        # device where the model "lives"
        x_batch = x_batch.to('cuda')
        y_batch = y_batch.to('cuda')
        
        prediction = net(x_batch.float())     # input x and predict based on x

        loss = loss_func(prediction, y_batch.float())     # must be (1. nn output, 2. target)

        optimizer.zero_grad()   # clear gradients for next train
        loss.backward()         # backpropagation, compute gradients
        optimizer.step()        # apply gradients

    if t % 5 == 0:
        losses.append(loss)
        print(loss.item())
    
    # Sanity check
    
plt.ioff()
plt.show()

0.031001614406704903
0.030988197773694992


In [83]:
losses

[tensor(0.0310, device='cuda:0', grad_fn=<MseLossBackward>),
 tensor(0.0310, device='cuda:0', grad_fn=<MseLossBackward>)]

In [85]:

x = torch.tensor(X_val)
x = x.to(device='cuda')
y = torch.tensor(y_val)
y = y.to(device='cuda')

prediction = net(x.float())

preds_clean = prediction.detach().cpu().clone().numpy()

y_clean = y.detach().cpu().clone().numpy()

preds_clean = np.concatenate(preds_clean)
preds_clean = np.rint(preds_clean)

print((preds_clean != 0).mean())

print('f1 score for vals is ', f1_score(y_clean, preds_clean, average='macro'))

0.0
f1 score for vals is  0.4920104532960055


The neural net is still predicting 0 ever time. This is really no good. The more I look into this, the more I think that I need to 'weight' the predictions accordingly. Alternatively, I could use the Method 2: Ordinal target function to emphasize that the open channels targets is an ordinal classifier. https://stats.stackexchange.com/questions/222073/classification-with-ordered-classes. 

For now I'll dive into trying to 'weight' the classes appropriately.



In [93]:
net = Net(n_feature=11, n_hidden_1=100, n_hidden_2=50, n_hidden_3=10, n_output=1)     # define the network
net.cuda()
print(net)  # net architecture

optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()  # this is for regression mean squared loss

plt.ion()   # something about plotting

x = torch.tensor(X_train)
y = y_train

class_sample_count = np.array(
    [len(np.where(y == t)[0]) for t in np.unique(y)])

weight = 1. / class_sample_count
samples_weight = np.array([weight[int(t)] for t in y])
samples_weight = torch.from_numpy(samples_weight)
samples_weigth = samples_weight.double()
sampler = data.sampler.WeightedRandomSampler(samples_weight, len(samples_weight))

y = torch.tensor(y_train)

train_dataset = data.TensorDataset(x, y)


train_loader = data.DataLoader(dataset, batch_size=1000, sampler=sampler, pin_memory = True)

Net(
  (hidden_1): Linear(in_features=11, out_features=100, bias=True)
  (hidden_2): Linear(in_features=100, out_features=50, bias=True)
  (hidden_3): Linear(in_features=50, out_features=10, bias=True)
  (predict): Linear(in_features=10, out_features=1, bias=True)
)


In [96]:
losses = []

for t in range(10):
    
    for x_batch, y_batch in train_loader:
        # the dataset "lives" in the CPU, so do our mini-batches
        # therefore, we need to send those mini-batches to the
        # device where the model "lives"
        x_batch = x_batch.to('cuda')
        y_batch = y_batch.to('cuda')
        
        prediction = net(x_batch.float())     # input x and predict based on x

        loss = loss_func(prediction, y_batch.float())     # must be (1. nn output, 2. target)

        optimizer.zero_grad()   # clear gradients for next train
        loss.backward()         # backpropagation, compute gradients
        optimizer.step()        # apply gradients

    if t % 5 == 0:
        losses.append(loss)
        print(loss.item())
    
    # Sanity check
    
plt.ioff()
plt.show()

0.24984954297542572
0.250304639339447


So now, even though the classes are weighted, the losses are MUCH MUCH higher. Let's see if our f1 score improved.

In [97]:

x = torch.tensor(X_val)
x = x.to(device='cuda')
y = torch.tensor(y_val)
y = y.to(device='cuda')

prediction = net(x.float())

preds_clean = prediction.detach().cpu().clone().numpy()

y_clean = y.detach().cpu().clone().numpy()

preds_clean = np.concatenate(preds_clean)
preds_clean = np.rint(preds_clean)

print((preds_clean != 0).mean())

print('f1 score for vals is ', f1_score(y_clean, preds_clean, average='macro'))

0.9999666666666667
f1 score for vals is  0.030487190106642203


So now the model almost NEVER predicts 0 because it's so underweighted. In addition, our f1 score is almost 0. So weighting the classes in this way has actually hurt out results.

In [102]:
(y_val==0).mean()

0.9685444444444444

0.9685444444444444 is the fraction of the validation set that has 0 channels open. This class imbalance is massively throwing the loss function off. I think in order to explore this a bit more I'm going to go back and use more basic models in sklearn to attempt different strategies to weighting the data