# Deep Learning and Applications : Joint Faculty Development Programme
# December 9 -13, 2019 

**Principal Coordinator - IIITDM Jabalpur Co-Principal Coordinator - NIT Warangal**

**Particiapting Academies - IIITDM Jabalpur, MNIT Jaipur, NIT Patna, NIT Warangal**


## Tutorial 6 - Recurrent Neural Networks and Artificial Neural Network on Text Data




### A. RNN

**A1. Importing the libraries**

In [1]:
import pandas as pd
import keras
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
import numpy as np

Using TensorFlow backend.


**A2. Reading the data**

In [2]:
df = pd.read_csv('chennai_reviews.csv', sep=',', header=0)
df = df[['Review_Text', 'Sentiment']].copy()
df.head()

Unnamed: 0,Review_Text,Sentiment
0,Its really nice place to stay especially for b...,3
1,It seems that hotel does not check the basic a...,1
2,Worst hotel I have ever encountered. I will ne...,1
3,Had a good time in this hotel and the staff Ku...,3
4,good hotel and staff Veg food good non veg bre...,3


**A3. Data Preprocessing**

In [3]:
df['Sentiment'] = pd.to_numeric(df.Sentiment, errors= 'coerce').dropna().astype(int)

In [4]:
df['Sentiment'].value_counts()

3.0    3391
2.0     827
1.0     485
Name: Sentiment, dtype: int64

In [5]:
df['Sentiment'] = [1 if x > 2 else 0 for x in df.Sentiment]

In [6]:
df['Sentiment'].value_counts()

1    3391
0    1377
Name: Sentiment, dtype: int64

In [7]:
data, labels = (df['Review_Text'].astype(str).values, df['Sentiment'].values)

In [8]:
tokenizer = Tokenizer(lower= True)
tokenizer.fit_on_texts(data)

data_sequence = tokenizer.texts_to_sequences(data)
data_padded = pad_sequences(data_sequence, maxlen= 100, padding='post')

**A4. Splitting the data in train test**

In [9]:
data_train, data_test, labels_train, labels_test = train_test_split(data_padded, labels, test_size= 0.15, random_state= 1)

In [10]:
batch_size = 64

data_train_split = data_train[2*batch_size:]
labels_train_split = labels_train[2*batch_size:]

data_validation_split = data_train[:2*batch_size]
labels_validation_split = labels_train[:2*batch_size]


**A5. Model Building**

In [15]:
vocab_size = len(tokenizer.word_counts.keys())+1
num_words = 100
embedding_len = 32

model = keras.models.Sequential([
                                 keras.layers.Embedding(vocab_size, embedding_len, input_length= num_words),
                                 keras.layers.GRU(64),
                                 keras.layers.Dense(1, activation= 'sigmoid')
])

**A6. Model Compiling**

In [16]:
model.compile(
    optimizer= 'sgd',
    loss= 'binary_crossentropy',
    metrics= ['accuracy']
)



Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


**A7. Training the model**

In [17]:
model.fit(
    data_train_split,
    labels_train_split,
    batch_size= batch_size,
    epochs= 2,
    verbose= 1,
    validation_data= (data_validation_split, labels_validation_split),
   
)


Train on 3924 samples, validate on 128 samples
Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x26526f87788>

**A8. Model Testing**

In [18]:
scores = model.evaluate(data_test, labels_test, verbose= 0)
scores

[0.6250772895759711, 0.6885474863665064]

### B. ANN

In [19]:
from sklearn.utils import compute_class_weight
classWeight = compute_class_weight('balanced', np.unique(labels_train), labels_train) 
classWeight = dict(enumerate(classWeight))

In [20]:
model1 = keras.models.Sequential([
                                 #eras.layers.Embedding(vocab_size, embedding_len, input_length= num_words),
                                 keras.layers.Dense(256),
                                 keras.layers.Dense(1, activation= 'sigmoid')
])

model1.compile(
    optimizer= 'sgd',
    loss= 'binary_crossentropy',
    metrics= ['accuracy']
)

model1.fit(
    data_train_split,
    labels_train_split,
    batch_size= batch_size,
    epochs= 2,
    verbose= 1
)


Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x26597cfc988>