lstm value error of different shape #61

vinayakumarr · 2016-05-04T14:55:13Z

I tried to modify imdb example to my dataset, which is given below
3 3 373 27 9 615 9 16 10 34 0 8 0 199 65917 1319 122 402 319 183
3 3 77 12 4 66 4 3 0 5 0 14 3 50 106 139 38 164 53 109
3 3 86 6 2 6 2 0 0 1 0 25 0 4 284 77888 19 66 11 25
3 3 469 21 7 291 7 43 15 82 0 207 0 181 115646 59073 294 928 112 675
3 3 2090 21 7 4035 7 17 8 40 0 317 10 717 1033 25661 142 2054 1795 1023
3 3 691 18 6 597 6 30 16 61 0 245 18 273 719 2352305 213 1106 324 719
6 6 229 0 8 526 0 11 1 13 0 6 5 101 7246 2082 120 141 288 1570
3 3 1158 9 3 649 3 16 6 17 1 247 38 477 592 987626 82 1305 653 707
4 4 211 0 10 429 0 16 9 20 0 3 0 106 42725 27302 4280 133 477 1567

The first column is the target which has 9 classes and around 1803 features

from future import print_function
import numpy as np
from sklearn.cross_validation import train_test_split
import tflearn
import pandas as pd
from tflearn.data_utils import to_categorical, pad_sequences

print("Loading")
data = pd.read_csv('Train.csv')

X = data.iloc[:,1:1805]
y = data.iloc[:,0]

X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.2,random_state=42)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

print("Preprocessing")
X_train1 = X_train.values.T.tolist()
X_test1 = X_test.values.tolist()
y_train1 = y_train.values.T.tolist()
y_test1 = y_test.values.tolist()

Data preprocessing

Sequence padding

trainX = pad_sequences(X_train1, maxlen=200, value=0.)
testX = pad_sequences(X_test1, maxlen=200, value=0.)

Converting labels to binary vectors

trainY = to_categorical(y_train, nb_classes=0)
testY = to_categorical(y_test, nb_classes=0)

Network building

net = tflearn.input_data([None, 200])
net = tflearn.embedding(net, input_dim=20000, output_dim=128)
net = tflearn.lstm(net, 128)
net = tflearn.dropout(net, 0.5)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net, optimizer='adam',loss='categorical_crossentropy')

Training

model = tflearn.DNN(net, clip_gradients=0., tensorboard_verbose=0)
model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True, batch_size=128)

aymericdamien · 2016-05-04T15:00:43Z

You have to change last fully connected layer to:

net = tflearn.fully_connected(net, 10, activation='softmax')

In your example you seems to have 10 classes and not 2, so you need your softmax layer to have a 10 output dimension.

vinayakumarr · 2016-05-05T02:54:57Z

I I modified but it is generating the following error

aymericdamien · 2016-05-05T04:00:08Z

in that line you have to change input_dim by your dictionary size (total number of different ids):
net = tflearn.embedding(net, input_dim=20000, output_dim=128)

vinayakumarr · 2016-05-05T04:56:06Z

that i understood by looking at the error itself. But what exactly that dictionary size (total number different ids) with respect to my data set.

aymericdamien · 2016-05-05T06:20:47Z

I guess it should be np.max(trainX)+1

vinayakumarr · 2016-05-05T06:48:57Z

generating the following error

Segmentation fault (core dumped)

aymericdamien · 2016-05-05T07:18:39Z

What np.max(trainX)+1 is returning? Maybe your dictionary size is too large..

vinayakumarr · 2016-05-05T07:27:52Z

np.max(trainX)+1 returning 1930563585. How to solve this..

aymericdamien · 2016-05-05T08:33:38Z

Can you tell what are these numbers? I thought they were ids? I think your main issue here is parsing your data.

vinayakumarr · 2016-05-05T08:48:30Z

when i use print(np.max(trainX)+1), it returns 1930563585.

reading and parsing code is given below

//for reading csv file
data = pd.read_csv('Train.csv')

X = data.iloc[:,1:1805] //all columns
y = data.iloc[:,0] // only first clumn - class label

X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.2,random_state=42) // using scikit learn train_test_split

// for converting into list of values
print("Preprocessing")
X_train1 = X_train.values.T.tolist()
X_test1 = X_test.values.tolist()
y_train1 = y_train.values.T.tolist()
y_test1 = y_test.values.tolist()

Sequence padding

trainX = pad_sequences(X_train1, maxlen=200, value=0.)
testX = pad_sequences(X_test1, maxlen=200, value=0.)

Converting labels to binary vectors

trainY = to_categorical(y_train, nb_classes=0)
testY = to_categorical(y_test, nb_classes=0)

Then finally model creation

vinayakumarr · 2016-05-06T08:35:07Z

The above code is for reading and parsing my dataset. Is there any problem in that?

aymericdamien · 2016-05-06T10:27:16Z

Please can you tell what does these data mean? It would be easier to understand what you are actually trying to do. I mean, these integers are an id (that represent words or whatever)? or are they real values?

aymericdamien · 2016-05-07T05:18:48Z

I see, first you can normalize your data by assigning an id (from 0 to your total number of event) for every event. After you can apply the embedding. But note that if you total number of event is too large, it will be very slow, so you can try to find some ways to reduce your data dimension (keep only events occurring more than X times, or apply a PCA transformation, etc...)

vinayakumarr · 2016-05-07T10:50:20Z

I reduced my feature set now. I have 2000 row and 20 24 columns. Then also it is showing same error

aymericdamien · 2016-05-08T09:59:22Z

oh, your problem is not about number of rows, but about your embedding layer dimensions, you have to first normalize your data and give them ids (0 to total number of event). What is your total number of distinct events?

aymericdamien closed this as completed Jun 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lstm value error of different shape #61

lstm value error of different shape #61

vinayakumarr commented May 4, 2016

aymericdamien commented May 4, 2016

vinayakumarr commented May 5, 2016 •

edited

Loading

aymericdamien commented May 5, 2016

vinayakumarr commented May 5, 2016

aymericdamien commented May 5, 2016 •

edited

Loading

vinayakumarr commented May 5, 2016

aymericdamien commented May 5, 2016

vinayakumarr commented May 5, 2016

aymericdamien commented May 5, 2016

vinayakumarr commented May 5, 2016 •

edited

Loading

vinayakumarr commented May 6, 2016

aymericdamien commented May 6, 2016

aymericdamien commented May 7, 2016

vinayakumarr commented May 7, 2016

aymericdamien commented May 8, 2016

lstm value error of different shape #61

lstm value error of different shape #61

Comments

vinayakumarr commented May 4, 2016

aymericdamien commented May 4, 2016

vinayakumarr commented May 5, 2016 • edited Loading

aymericdamien commented May 5, 2016

vinayakumarr commented May 5, 2016

aymericdamien commented May 5, 2016 • edited Loading

vinayakumarr commented May 5, 2016

aymericdamien commented May 5, 2016

vinayakumarr commented May 5, 2016

aymericdamien commented May 5, 2016

vinayakumarr commented May 5, 2016 • edited Loading

Sequence padding

Converting labels to binary vectors

vinayakumarr commented May 6, 2016

aymericdamien commented May 6, 2016

aymericdamien commented May 7, 2016

vinayakumarr commented May 7, 2016

aymericdamien commented May 8, 2016

vinayakumarr commented May 5, 2016 •

edited

Loading

aymericdamien commented May 5, 2016 •

edited

Loading

vinayakumarr commented May 5, 2016 •

edited

Loading