# Transfer Learning MNIST

* Train a simple convnet on the MNIST dataset the first 5 digits [0..4].
* Freeze convolutional layers and fine-tune dense layers for the classification of digits [5..9].

## 1. Import necessary libraries for the model

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
import pandas as pd
import numpy as np
import scipy as sp
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from textblob import TextBlob, Word
from nltk.stem.snowball import SnowballStemmer
%matplotlib inline

## 2. Import MNIST data and create 2 datasets with one dataset having digits from 0 to 4 and other from 5 to 9 

In [105]:
import tensorflow as tf
(trainX, trainY), (testX, testY) = tf.keras.datasets.mnist.load_data()

In [106]:
aTrainX=trainX[trainY<5]

In [107]:
bTrainX=trainX[trainY>=5]

In [108]:
aTestX=testX[testY<5]

In [109]:
bTestX=testX[testY>=5]

In [110]:
print('Data set 1 shape',aTrainX.shape)
print('Data set 2 shape',bTrainX.shape)
print('Data set 3 shape',aTestX.shape)
print('Data set 4 shape',bTestX.shape)

Data set 1 shape (30596, 28, 28)
Data set 2 shape (29404, 28, 28)
Data set 3 shape (5139, 28, 28)
Data set 4 shape (4861, 28, 28)


## 3. Print x_train, y_train, x_test and y_test for both the datasets

In [111]:
print('Data set 1 ',aTrainX)
print('Data set 2 ',bTrainX)
print('Data set 3 ',aTestX)
print('Data set 4 ',bTestX)

Data set 1  [[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 ...

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]]
Data set 2  [[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0

## ** 4. Let us take only the dataset (x_train, y_train, x_test, y_test) for Integers 0 to 4 in MNIST **
## Reshape x_train and x_test to a 4 Dimensional array (channel = 1) to pass it into a Conv2D layer

In [112]:
aTrainX = aTrainX.reshape(aTrainX.shape[0], 28, 28,1)
aTestX = aTestX.reshape(aTestX.shape[0], 28, 28,1)
aTrainX = aTrainX.astype('float32')
aTestX = aTestX.astype('float32')

## 5. Normalize x_train and x_test by dividing it by 255

In [113]:
aTrainX /= 255
aTestX /= 255

## 6. Use One-hot encoding to divide y_train and y_test into required no of output classes

In [114]:
trainY[]

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [117]:
trainY=trainY[trainY<5]

In [119]:
testY=testY[testY<5]

In [120]:
trainYEncoded = tf.keras.utils.to_categorical(trainY)
testYEncoded = tf.keras.utils.to_categorical(testY)

In [121]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten, Reshape
from tensorflow.keras.layers import Convolution2D, MaxPooling2D

## 7. Build a sequential model with 2 Convolutional layers with 32 kernels of size (3,3) followed by a Max pooling layer of size (2,2) followed by a drop out layer to be trained for classification of digits 0-4  

In [122]:
BATCH_SIZE = 32
EPOCHS = 10

In [123]:
print('Data set 1 shape',aTrainX.shape)
print('Data set 2 shape',trainYEncoded.shape)
print('Data set 3 shape',aTestX.shape)
print('Data set 4 shape',testYEncoded.shape)

Data set 1 shape (30596, 28, 28, 1)
Data set 2 shape (30596, 5)
Data set 3 shape (5139, 28, 28, 1)
Data set 4 shape (5139, 5)


In [187]:

    # Define Model
model3 = Sequential()

    # 1st Conv Layer
model3.add(Convolution2D(32, 3, 3, input_shape=(28, 28, 1)))
model3.add(Activation('relu'))

    # 2nd Conv Layer
model3.add(Convolution2D(32, 3, 3))
model3.add(Activation('relu'))

    # Max Pooling
model3.add(MaxPooling2D(pool_size=(2,2)))
    
    # Dropout
model3.add(Dropout(0.25))

    # Fully Connected Layer
model3.add(Flatten())
model3.add(Dense(128))
model3.add(Activation('relu'))
    
    # More Dropout
model3.add(Dropout(0.5))

    # Prediction Layer
model3.add(Dense(5))
model3.add(Activation('softmax'))

    # Loss and Optimizer
model3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    # Store Training Results
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_acc', patience=7, verbose=1, mode='auto')
callback_list = [early_stopping]

    # Train the model
model3.fit(aTrainX, trainYEncoded, batch_size=BATCH_SIZE, epochs=EPOCHS, 
              validation_data=(aTestX, testYEncoded), callbacks=callback_list)
    


Train on 30596 samples, validate on 5139 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1f9ad083550>




















## 8. Post that flatten the data and add 2 Dense layers with 128 neurons and neurons = output classes with activation = 'relu' and 'softmax' respectively. Add dropout layer inbetween if necessary  

## 9. Print the training and test accuracy

## 10. Make only the dense layers to be trainable and convolutional layers to be non-trainable

## 11. Use the model trained on 0 to 4 digit classification and train it on the dataset which has digits 5 to 9  (Using Transfer learning keeping only the dense layers to be trainable)

## 12. Print the accuracy for classification of digits 5 to 9

## Sentiment analysis <br> 

The objective of the second problem is to perform Sentiment analysis from the tweets data collected from the users targeted at various mobile devices.
Based on the tweet posted by a user (text), we will classify if the sentiment of the user targeted at a particular mobile device is positive or not.

### 13. Read the dataset (tweets.csv) and drop the NA's while reading the dataset

In [142]:
# read tweets.csv into a DataFrame
import pandas as pd
tweets = pd.read_csv('tweets.csv',encoding = "ISO-8859-1")

In [144]:
tweets = tweets.dropna(axis = 0, how ='any') 

In [146]:
tweets.count

<bound method DataFrame.count of                                              tweet_text  \
0     .@wesley83 I have a 3G iPhone. After 3 hrs twe...   
1     @jessedee Know about @fludapp ? Awesome iPad/i...   
2     @swonderlin Can not wait for #iPad 2 also. The...   
3     @sxsw I hope this year's festival isn't as cra...   
4     @sxtxstate great stuff on Fri #SXSW: Marissa M...   
7     #SXSW is just starting, #CTIA is around the co...   
8     Beautifully smart and simple idea RT @madebyma...   
9     Counting down the days to #sxsw plus strong Ca...   
10    Excited to meet the @samsungmobileus at #sxsw ...   
11    Find &amp; Start Impromptu Parties at #SXSW Wi...   
12    Foursquare ups the game, just in time for #SXS...   
13    Gotta love this #SXSW Google Calendar featurin...   
14    Great #sxsw ipad app from @madebymany: http://...   
15    haha, awesomely rad iPad app by @madebymany ht...   
17    I just noticed DST is coming this weekend. How...   
18    Just added my #SX

### 14. Preprocess the text and add the preprocessed text in a column with name `text` in the dataframe.

In [147]:
def preprocess(text):
    try:
        return text.decode('ascii')
    except Exception as e:
        return ""

In [148]:
tweets['text'] = [preprocess(text) for text in tweets.tweet_text]

In [149]:
tweets.head(10)

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,text
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion,
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion,
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,
7,"#SXSW is just starting, #CTIA is around the co...",Android,Positive emotion,
8,Beautifully smart and simple idea RT @madebyma...,iPad or iPhone App,Positive emotion,
9,Counting down the days to #sxsw plus strong Ca...,Apple,Positive emotion,
10,Excited to meet the @samsungmobileus at #sxsw ...,Android,Positive emotion,
11,Find &amp; Start Impromptu Parties at #SXSW Wi...,Android App,Positive emotion,


### 15. Consider only rows having Positive emotion and Negative emotion and remove other rows from the dataframe.

In [151]:
tweets.dtypes

tweet_text                                            object
emotion_in_tweet_is_directed_at                       object
is_there_an_emotion_directed_at_a_brand_or_product    object
text                                                  object
dtype: object

In [153]:
tweets.is_there_an_emotion_directed_at_a_brand_or_product.unique()

array(['Negative emotion', 'Positive emotion',
       'No emotion toward brand or product', "I can't tell"], dtype=object)

In [155]:
tweets.shape

(3291, 4)

In [199]:
tweet_data=tweets

In [200]:
tweet_data=tweet_data[(tweet_data.is_there_an_emotion_directed_at_a_brand_or_product == 'Positive emotion') | (tweet_data.is_there_an_emotion_directed_at_a_brand_or_product == 'Negative emotion')]

In [201]:
tweet_data.shape

(3191, 4)

In [202]:
tweet_data.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,text
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion,
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion,
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,


### 16. Represent text as numerical data using `CountVectorizer` and get the document term frequency matrix

#### Use `vect` as the variable name for initialising CountVectorizer.

In [203]:
# use CountVectorizer to create document-term matrices 
vect = CountVectorizer()
tweet_data_dtm = vect.fit_transform(tweet_data.tweet_text)

In [204]:
df = pd.DataFrame(tweet_data_dtm.toarray(), columns=vect.get_feature_names())
print(df)

      000  02  03  08  10  100  100s  100tc  101  106  ...    ûïmute  \
0       0   0   0   0   0    0     0      0    0    0  ...         0   
1       0   0   0   0   0    0     0      0    0    0  ...         0   
2       0   0   0   0   0    0     0      0    0    0  ...         0   
3       0   0   0   0   0    0     0      0    0    0  ...         0   
4       0   0   0   0   0    0     0      0    0    0  ...         0   
5       0   0   0   0   0    0     0      0    0    0  ...         0   
6       0   0   0   0   0    0     0      0    0    0  ...         0   
7       0   0   0   0   0    0     0      0    0    0  ...         0   
8       0   0   0   0   0    0     0      0    0    0  ...         0   
9       0   0   0   0   0    0     0      0    0    0  ...         0   
10      0   0   0   0   0    0     0      0    0    0  ...         0   
11      0   0   0   0   0    0     0      0    0    0  ...         0   
12      0   0   0   0   0    0     0      0    0    0  ...      

In [205]:
tweet_data_dtm.shape

(3191, 5648)

In [206]:
# last 50 features
print (vect.get_feature_names()[-50:])

['zazzle', 'zazzlesxsw', 'zazzlsxsw', 'ze', 'zelda', 'zeldman', 'zero', 'zimride', 'zing', 'zip', 'zite', 'zms', 'zombies', 'zomg', 'zone', 'zoom', 'zzzs', '¼¼', 'á¾_î¾ð', 'äá', 'å_', 'åç', 'åçwhat', 'çü', 'èï', 'ðü', 'öý', 'ù_¾', 'û_', 'ûª', 'ûªll', 'ûªm', 'ûªs', 'ûªt', 'ûï', 'ûï35', 'ûïbuttons', 'ûïfoursquare', 'ûïline', 'ûïmore', 'ûïmute', 'ûïspecials', 'ûïthe', 'ûïview', 'ûò', 'ûòand', 'ûó', 'ûójust', 'ûólewis', 'ûóthe']


### 17. Find number of different words in vocabulary

In [207]:
unique_elements, counts_elements = np.unique(vect.get_feature_names(), return_counts=True)

#### Tip: To see all available functions for an Object use dir

In [208]:
print(len(counts_elements))

5648


### 18. Find out how many Positive and Negative emotions are there.

Hint: Use value_counts on that column

In [214]:
print(tweet_data['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts())

Positive emotion    2672
Negative emotion     519
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64


### 19. Change the labels for Positive and Negative emotions as 1 and 0 respectively and store in a different column in the same dataframe named 'Label'

Hint: use map on that column and give labels

In [218]:
tweet_data_encode = tweet_data.copy()

from sklearn.preprocessing import LabelEncoder

lb_make = LabelEncoder()
tweet_data_encode['label'] = lb_make.fit_transform(tweet_data['is_there_an_emotion_directed_at_a_brand_or_product'])

tweet_data_encode.head() #Results in appending a new column to df

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,text,label
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion,,0
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,,1
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion,,1
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,,0
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,,1


### 20. Define the feature set (independent variable or X) to be `text` column and `labels` as target (or dependent variable)  and divide into train and test datasets

In [220]:
X = tweet_data_encode['tweet_text']
Y= tweet_data_encode['label']

In [221]:
# split the new DataFrame into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=1)

In [233]:
X_train.head(5)

8347    tried installing @mention on my iphone but it ...
2381    #iPad2 rocks #SXSW (@mention Apple POP UP Stor...
8703    What's your take on iPad? @mention I really wa...
4152    : Aron Pilhofer from The New York Times just e...
3368    &lt;---- Guess who won an iPad at the #unsix t...
Name: tweet_text, dtype: object

In [234]:
X_test.head(5)

8135    Apple #sxsw pop-store has iPads again. 16gb wi...
367     ÛÏ@mention Best thing I've heard this weekend...
4721    Anybody know whether I can nab white, 3G, 64GB...
7072    Apple to open iPad2 popup shop @mention core o...
7047    So many Google products. isn't it time to  tra...
Name: tweet_text, dtype: object

In [237]:
y_train.head(5)

8347    0
2381    1
8703    1
4152    1
3368    1
Name: label, dtype: int32

In [238]:
y_test.head(5)

8135    1
367     0
4721    1
7072    1
7047    1
Name: label, dtype: int32

## 21. **Predicting the sentiment:**


### Use Naive Bayes and Logistic Regression and their accuracy scores for predicting the sentiment of the given text

In [242]:
# use default options for CountVectorizer
#vect = CountVectorizer()
#vect = CountVectorizer(ngram_range=(1, 2))

# create document-term matrices
X_train_dtm = vect.fit_transform(X_train)
X_test_dtm = vect.transform(X_test)

# use Naive Bayes to predict the star rating
nb = MultinomialNB()
nb.fit(X_train_dtm, y_train)
y_pred_class = nb.predict(X_test_dtm)

# calculate accuracy
print (metrics.accuracy_score(y_test, y_pred_class))

0.8558897243107769


In [243]:
logreg = LogisticRegression(C=1e9)
logreg.fit(X_train_dtm, y_train)
y_pred_class = logreg.predict(X_test_dtm)
print (metrics.accuracy_score(y_test, y_pred_class))

0.8583959899749374


## 22. Create a function called `tokenize_predict` which can take count vectorizer object as input and prints the accuracy for x (text) and y (labels)

In [247]:
def tokenize_test(vect):
    X_train_dtm = vect.fit_transform(X_train)
    print('Features: ', X_train_dtm.shape[1])
    X_test_dtm = vect.transform(X_test)
    nb = MultinomialNB()
    nb.fit(X_train_dtm, y_train)
    y_pred_class = nb.predict(X_test_dtm)
    print('Accuracy: ', metrics.accuracy_score(y_test, y_pred_class))

### Create a count vectorizer function which includes n_grams = 1,2  and pass it to tokenize_predict function to print the accuracy score

In [253]:
vect = CountVectorizer(ngram_range=(1, 2))
tokenize_test(vect)

Features:  24855
Accuracy:  0.8558897243107769


### Create a count vectorizer function with stopwords = 'english'  and pass it to tokenize_predict function to print the accuracy score

In [254]:
vect = CountVectorizer(stop_words='english')
tokenize_test(vect)

Features:  4681
Accuracy:  0.8533834586466166


### Create a count vectorizer function with stopwords = 'english' and max_features =300  and pass it to tokenize_predict function to print the accuracy score

In [255]:
vect = CountVectorizer(stop_words='english',max_features=300)
tokenize_test(vect)

Features:  300
Accuracy:  0.8107769423558897


### Create a count vectorizer function with n_grams = 1,2  and max_features = 15000  and pass it to tokenize_predict function to print the accuracy score

In [256]:
vect = CountVectorizer(ngram_range=(1, 2),max_features=15000)
tokenize_test(vect)

Features:  15000
Accuracy:  0.8533834586466166


### Create a count vectorizer function with n_grams = 1,2  and include terms that appear at least 2 times (min_df = 2)  and pass it to tokenize_predict function to print the accuracy score

In [252]:
vect = CountVectorizer()
vect = CountVectorizer(ngram_range=(1, 2),min_df=2)
tokenize_test(vect)

Features:  7764
Accuracy:  0.8583959899749374
