## Feature embeddings / feature learning

Figuring out which features to put where is a large part of creating decent models.

## One-hot encoding

![](onehot.png)

When you have large vocabularies, one-hot encoding uses many dimensions.

## Word embeddings

* Word embeddings encodes 'concepts' into vector values.
* Word embeddings are learned
  * For instance from an autoencoder or dimensionality reduction
* Normally around 256 - 1024 features

![](embedding.png)

## Creating word embeddings with Keras

     Word index  ------> Embedding layer  -------> Word vector
   
   
Here we encode a vocabulary of 10'000 into a 64 dimensional embedding
```python
from keras.layers import Embedding

embedding_layer = Embedding(10000, 64)
```

In [1]:
from keras.datasets import imdb

# Number of words to consider as features
max_features = 10000

# Load the data as lists of integers.
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

Using TensorFlow backend.


In [2]:
x_train[:2]

array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),
       list([1, 194, 1153, 194, 8255, 78, 228,

In [3]:
y_train[:2]

array([1, 0])

In [5]:
from keras import preprocessing

# Cut texts after this number of words 
# (among top max_features most common words)
maxlen = 20

# This turns our lists of integers
# into a 2D integer tensor of shape `(samples, maxlen)`
x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)
x_train[:2]

array([[  65,   16,   38, 1334,   88,   12,   16,  283,    5,   16, 4472,
         113,  103,   32,   15,   16, 5345,   19,  178,   32],
       [  23,    4, 1690,   15,   16,    4, 1355,    5,   28,    6,   52,
         154,  462,   33,   89,   78,  285,   16,  145,   95]],
      dtype=int32)

In [6]:
from keras.models import Sequential
from keras.layers import Flatten, Dense, Embedding

model = Sequential()
# We specify the maximum input length to our Embedding layer
# so we can later flatten the embedded inputs
model.add(Embedding(10000, 8, input_length=maxlen))
# After the Embedding layer, 
# our activations have shape `(samples, maxlen, 8)`.

# We flatten the 3D tensor of embeddings 
# into a 2D tensor of shape `(samples, maxlen * 8)`
model.add(Flatten())

# We add the classifier on top
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.summary()

history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 20, 8)             80000     
_________________________________________________________________
flatten_1 (Flatten)          (None, 160)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 161       
Total params: 80,161
Trainable params: 80,161
Non-trainable params: 0
_________________________________________________________________
Instructions for updating:
Use tf.cast instead.
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [7]:
model.predict(x_train[:1])

array([[0.8316832]], dtype=float32)

## Exercise: Train word embedding on sklearn data

https://keras.io/datasets/ has a dataset on 11,228 newswires from Reuters, labeled over 46 topics.
Instead of classifying a binary IMDB review, you have to classify the text into one of the 46 topics.

1. Create a `Sequential` model that encodes the newswires into an embedding dimension of 256
2. Add layers to your neural network
  * Be creative; which layers do you think could help classifying this?
3. Compile, train and test the model

## Transfer learning

Transfer learning takes the **representation of a NN layer and reuses it to something else**.

See https://towardsdatascience.com/keras-transfer-learning-for-beginners-6c9b8b7143e

![](cnn.png)

![](cnn2.png)

![](cnn3.png)

Generally, the initial layers process basic features. The higher layers process more abstract things like leaves, noses, movements. **Just like in humans**.

## Advantages of transfer learning

1. We can re-use another training set
  * Or reduce the amount of training data needed
2. We save computational power is required
  * We are using pre-trained weights and only have to learn the weights of the last few layers.

## Pretrained models

Many pretrained models exist, which have been trained and optimised on huge datasets.

https://keras.io/applications/

## Example: Transfer learning with MobileNet

Example from https://github.com/aditya9898/transfer-learning

In [8]:
!ls train

cats  dogs  horses


In [9]:
!ls train/dogs

'2Q== (1).jpg'	   'images (13).jpg'  'images (21).jpg'  'images (4).jpg'
'2Q== (2).jpg'	   'images (14).jpg'  'images (22).jpg'  'images (5).jpg'
'2Q==.jpg'	   'images (15).jpg'  'images (23).jpg'  'images (6).jpg'
'9k= (1).jpg'	   'images (16).jpg'  'images (24).jpg'  'images (7).jpg'
'9k= (2).jpg'	   'images (17).jpg'  'images (25).jpg'  'images (8).jpg'
'9k=.jpg'	   'images (18).jpg'  'images (26).jpg'  'images (9).jpg'
'images (10).jpg'  'images (19).jpg'  'images (27).jpg'   images.jpg
'images (11).jpg'  'images (1).jpg'   'images (2).jpg'	  Z.jpg
'images (12).jpg'  'images (20).jpg'  'images (3).jpg'


In [1]:
import pandas as pd
import numpy as np
import os
import keras
import matplotlib.pyplot as plt
from keras.layers import Dense,GlobalAveragePooling2D
from keras.applications import MobileNet
from keras.preprocessing import image
from keras.applications.mobilenet import preprocess_input
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
from keras.optimizers import Adam

Using TensorFlow backend.


In [2]:
#imports the mobilenet model and discards the last 1000 neuron layer
base_model = MobileNet(weights='imagenet', include_top=False)

x = base_model.output
x = GlobalAveragePooling2D()(x)

# add dense layers so that the model can learn more complex functions and classify for better results.
x = Dense(1024,activation='relu')(x) 
x = Dense(1024,activation='relu')(x) # dense layer 2
x = Dense(512,activation='relu')(x)  # dense layer 3
preds = Dense(3,activation='softmax')(x) # final layer with softmax activation

Instructions for updating:
Colocations handled automatically by placer.




In [3]:
model = Model(inputs=base_model.input, outputs=preds)

In [4]:
for layer in model.layers[:20]:
    layer.trainable=False
for layer in model.layers[20:]:
    layer.trainable=True

In [5]:
# Preprocessor for images
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input) 

# this is where you specify the path to the main data folder
train_generator = train_datagen.flow_from_directory('./train/', target_size=(224,224), color_mode='rgb',
                                                 batch_size=32, class_mode='categorical', shuffle=True)

Found 197 images belonging to 3 classes.


In [6]:
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['accuracy'])
# Adam optimizer
# loss function will be categorical cross entropy
# evaluation metric will be accuracy

In [7]:
step_size_train=train_generator.n//train_generator.batch_size
model.fit_generator(generator=train_generator,
                    steps_per_epoch=step_size_train, 
                    epochs=5)

Instructions for updating:
Use tf.cast instead.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f3894307c50>

## Recap

* Transfer learning: taking one previously trained model and reusing it
* Closely related to multitask learning, where the *same model is used to solve different problems*
  * Forces the model to converge on a *common* representation
* This works because the neural network networks are *abstractions* of the things you are training
  * If abstract enough, you can use this for many other tasks