In [1]:
pwd

'/Users/vanessa/src/foveation/notebooks'

In [2]:
from textblob import TextBlob

In [3]:
wiki = TextBlob("Python is a high-level, general-purpose programming language.")

In [4]:
wiki.tags # tags is the property

[('Python', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('high-level', 'JJ'),
 ('general-purpose', 'JJ'),
 ('programming', 'NN'),
 ('language', 'NN')]

In [5]:
wiki.sentiment

Sentiment(polarity=0.0, subjectivity=0.0)

https://github.com/sloria/TextBlob/blob/dev/textblob/en/sentiments.py

The polarity and subjectivity of each word is given based on this file
https://github.com/sloria/TextBlob/blob/dev/textblob/en/en-sentiment.xml

For some words there are multiple repetitions, depending on the context. It is not entirly clear how the polarity and the subjectivity are given in this case, but I think through averaging. It seems to happen for polarity in the following case

In [None]:
TextBlob('abrupt').sentiment

Same happens with sentences, where polarity and subjectivity are averaged over the tokens. Nonetheless there is a further mechanism for the analysis of sentences. The library takes into account negations, by checking previous words, do that 

In [6]:
print(TextBlob('not good').sentiment)
print('------------------------------------------------------------------------------------')
print([TextBlob(t_).sentiment for t_ in TextBlob('not good').words])

Sentiment(polarity=-0.35, subjectivity=0.6000000000000001)
------------------------------------------------------------------------------------
[Sentiment(polarity=0.0, subjectivity=0.0), Sentiment(polarity=0.7, subjectivity=0.6000000000000001)]


In [None]:
advertisement = TextBlob("Python is a high-level, beautiful, amaizing, and general-purpose programming language.")

In [None]:
advertisement.sentiment

In [7]:
sentence = TextBlob("Python is a high-level, beautiful, amaizing, terrific, and general-purpose programming language.")
for w_ in sentence.words:
    print(w_, ':', TextBlob(w_).sentiment)

Python : Sentiment(polarity=0.0, subjectivity=0.0)
is : Sentiment(polarity=0.0, subjectivity=0.0)
a : Sentiment(polarity=0.0, subjectivity=0.0)
high-level : Sentiment(polarity=0.0, subjectivity=0.0)
beautiful : Sentiment(polarity=0.85, subjectivity=1.0)
amaizing : Sentiment(polarity=0.0, subjectivity=0.0)
terrific : Sentiment(polarity=0.0, subjectivity=1.0)
and : Sentiment(polarity=0.0, subjectivity=0.0)
general-purpose : Sentiment(polarity=0.0, subjectivity=0.0)
programming : Sentiment(polarity=0.0, subjectivity=0.0)
language : Sentiment(polarity=0.0, subjectivity=0.0)


One first step is

**Decrease noise - acting gently on the data**: get rid of the most frequent words. Articles, propositions are typically not useful. This is partially integrated in the tf import of the dataset, as we specify the first index (default is 3).

**Decrease noise - with the axes**: here we can use a library as TextBlob to identify the words with high subjectivity and polarity and keeping only those for the task.

**Get rid of entire sentences** if the polarity and subjectivity on both single words and entire sentences is low, intuitively with good chance that part of text is not meaningful to the task.

**The dictionary is extremely wide, tf allows to import only the first $k$ most frequent words**

In [None]:
sentiment = (TextBlob('good').sentiment)
print(sentiment)
sentiment.polarity, sentiment.subjectivity

# Word embedding

There is a rich literature on word embeddings. The idea is to download pretrained representations. One possibility is the corpus of words trained on google-news. This is huge, and hard to fit in memory

In [None]:
import gensim 
import gensim.models.keyedvectors as word2vec

In [None]:
?? word2vec.KeyedVectors.load_word2vec_format

In [None]:
path_ = '/Users/vanessa/src/GoogleNews-vectors-negative300.bin'
embed_map = word2vec.KeyedVectors.load_word2vec_format('/Users/vanessa/src/GoogleNews-vectors-negative300.bin',
                                                      binary=True)

Another possibility is using the glove-wiki-gigaword pretrained (Wikipedia 2014 + GigaWord5) representations
https://nlp.stanford.edu/projects/glove/

It consists on a word-embedding of dimension 300 for each word. 

In [None]:
import gensim.downloader as api

#word_vectors = api.load("glove-wiki-gigaword-100")  # load pre-trained word-vectors from gensim-data
#result = word_vectors.most_similar(positive=['woman', 'king'], negative=['man'])
#print("{}: {:.4f}".format(*result[0]))

In [None]:
word_vectors['the']

In [None]:
similarity = word_vectors.similarity('woman', 'man')
similarity

In [None]:
woman_embedding = word_vectors['woman']  # numpy vector of a word
man_embedding = word_vectors['man']

In [None]:
import numpy as np
from numpy.linalg import norm

np.dot(woman_embedding, man_embedding) / (norm(woman_embedding) * norm(man_embedding))

GloVe makes use of words count and statistics. GloVe is a log-bilinear model with a weighted least-squares objective. The main intuition underlying the model is the simple observation that ratios of word-word co-occurrence probabilities have the potential for encoding some form of meaning.

# Building a model

This is what suggested in this google guide (MLCC)
https://developers.google.com/machine-learning/guides/text-classification/step-2-5
in the case of samples/words < 1500 it is a common practice to follow the left right branch of the diagram.

Another thing people do is to use predefined dictionaries https://developers.google.com/machine-learning/guides/text-classification/step-3 to embed the words depending on their meaning. This is similar to what TextBlob does

**Example 1**  
https://keras.io/examples/imdb_cnn/

Embedding  
Dropout  
CNNLayer  
MaxPooling  
Dense  
Dropout + ReLU  
Dense + Sigmoid

**Example 2**  
https://keras.io/examples/imdb_cnn_lstm/  
Embedding  
Dropout  
CNNLayer  
MaxPooling  
LSTM  
Dense + Sigmoid

**Example 3**  
https://keras.io/examples/imdb_fasttext/

**Suggestion from Chollet**

For small $n$ a transformation using the tf-idf representation, followed by a linear algorithm as logistic regression is suggested. https://developers.google.com/machine-learning/guides/text-classification/step-3

***tf*** stands for term frequency  
***idf*** stands for inverse document frequency

Some words are articles, prepositions, are extremely frequent but not extremely informative. The idf term take into account this possibility, by normalizing over the frequency o having that word in other documents.

Other representations exist: $N$-grams. The analysis at word level is consider a unigram. In the N-grams we consider sequence of consecutive words. These models in my option are more complex, but a sort of feature extractors (related to redundancy)? Here is an implementation of how TF-IDF words in scikit-learn.
https://scikit-learn.org/stable/modules/feature_extraction.html

$\text{idf}(t) = \log\left(\frac{1+n}{1+\text{df}(t)}\right) + 1$

$n$, the number of documents  
df$(t)$, the number of documents in the document set that contain term $t$. The resulting tf-idf vectors are then normalized by the Euclidean norm



In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [None]:
corpus = ['This is the first document.',
          'This document is the second document.',
          'And this is the third one.',
          'Is this the first document?']
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names())

In [None]:
# from scipy.sparse.csr_matrix import todense
denseX = X.todense()
print(denseX.shape)
print(denseX)

In [None]:
np.linalg.norm(denseX, axis=-1)  # the vectors are normalized. Each document sums to one
# so not to have dependency from the length of the document

In [None]:
new_text = ['Yes this is a first document again']

vectorizer.transform(new_text).todense()

## Let's see if we can do something similar with a fixed dictionary

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.datasets import imdb

In [None]:
max_n_words = 10000
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=max_n_words,
                                                                      index_from=0)

word_index = imdb.get_word_index(path='imdb_word_index.json')

reverse_word_index = dict(
[(value, key) for (key, value) in word_index.items()])

decoded_review = ' '.join(
[reverse_word_index.get(i, '?') for i in train_data[123]])

print(decoded_review)

In [None]:
3941.36+1296.50+1037.20+2074.40+1037.20

In [None]:
12448/3*2

In [None]:
dictionary_ = dict(pd.Series(word_index).sort_values(ascending=True)[:max_n_words])
dictionary = {k_: i_-1 for (k_, i_) in zip(dictionary_.keys(), dictionary_.values())}
del dictionary_
dictionary

In [None]:
vectorizer = TfidfVectorizer(vocabulary=dictionary)

In [None]:
decoded_review

In [None]:
vectorizer.fit_transform([decoded_review]).todense()

This is one possibility. We compute for each review the vector of features using the tf-idf transform and we ran on top of this a classifier, which can be linear or non linear.

Another possibility is to use the embedding provided by GloVe and then run a classification algorithm (the network). We must keep in mind that in this case, we want invariance by order. We can do an embedding on single words, but then we need to put them all together independently from their order. One thing I have seen is the global pooling. At this point we have only 300 features, which are the different components in the embedding representation and we can run any classification algorithm on top of this. 

There is also the possibility of playing with N-grams but this is not totally clear to me.

Considering one word at the time can limit the results. Just think to the previous case of sentiment analysis with textblob, 'not good' and 'very good' has very different meaning

In [None]:
class Hyperparameters(object):
    """ Add hyper-parameters in init so when you read a json, it will get updated as your latest code. """
    def __init__(self,
                 learning_rate=5e-2,
                 architecture='FC',
                 nodes=128,
                 epochs=500,
                 batch_size=10,
                 loss='cross_entropy',
                 optimizer='sgd',
                 lr_at_plateau=True,
                 reduction_factor=None,
                 validation_check=True):
        """
        :param learning_rate: float, the initial value for the learning rate
        :param architecture: str, the architecture types
        :param epochs: int, the number of epochs we want to train
        :param batch_size: int, the dimension of the batch size
        :param loss: str, loss type, cross entropy or square loss
        :param optimizer: str, the optimizer type.
        :param lr_at_plateau: bool, protocol to decrease the learning rate.
        :param reduction_factor, int, the factor which we use to reduce the learning rate.
        :param validation_check: bool, if we want to keep track of validation loss as a stopping criterion.
        """
        self.learning_rate = learning_rate
        self.architecture = architecture
        self.epochs = epochs
        self.batch_size = batch_size
        self.loss = loss
        self.optimizer = optimizer
        self.lr_at_plateau = lr_at_plateau
        self.reduction_factor = reduction_factor
        self.validation_check = validation_check


class Dataset:
    """ Here we save the dataset specific related to each experiment. The name of the dataset,
    the scenario, if we modify the original dataset, and the dimensions of the input.
    This is valid for the modified_MNIST_dataset, verify if it is going to be valid next"""
    def __init__(self,
                 removed_words=0,
                 first_index=0,
                 n_training=10):
        """
        :param removed_words: float, percentage of removed words
        :param first_index: int, all the more frequent words are removed
        :param n_training: int, number of training examples
        """
        self.removed_words = removed_words
        self.first_index = first_index
        self.n_training = n_training


class Experiment(object):
    """
    This class represents your experiment.
    It includes all the classes above and some general
    information about the experiment index.
    """
    def __init__(self,
                 id,
                 output_path,
                 train_completed=False,
                 hyper=None,
                 dataset=None):
        """
        :param id: index of output data folder
        :param output_path: output directory
        :param train_completed: bool, it indicates if the experiment has already been trained
        :param hyper: instance of Hyperparameters class
        :param dataset: instance of Dataset class
        """
        if hyper is None:
            hyper = Hyperparameters()
        if dataset is None:
            dataset = Dataset()

        self.id = id
        self.output_path = output_path
        self.train_completed = train_completed
        self.hyper = hyper
        self.dataset = dataset

In [None]:
exp = Experiment(0, output_path='.')

In [None]:
import numpy as np
import gensim.downloader as api
from tensorflow.keras.datasets import imdb

In [None]:
word_vectors = api.load("glove-wiki-gigaword-100")

In [None]:
class DatasetGenerator:
    """ This class is meant to be the generator for different
    types of transformation to the IMDb data. We load the IMDb
    dataset and we apply different transformation, depending on the
    Experiment attributes.
    The training samples are extracted from the exp class
    The validation samples per class are 5000 by default.
    The test samples are the one in the original test set.
    The extraction is such that we get a balanced dataset.
    """
    def __init__(self,
                 exp,
                 n_vl=5000):
        """ Initializer for the class. We pass the object Experiment to
        assess the transformation required.
        :param exp: Experiment object
        :param n_vl: int, number of validation samples per class
        """
        self.exp = exp
        self.n_vl = n_vl

        (tr_data, tr_labels), (ts_data, ts_labels) = imdb.load_data(num_words=5000,
                                                                    index_from=0)
        word_index = imdb.get_word_index(path='imdb_word_index.json')
        self.reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
        self.glove_embedding = word_vectors
        
        # remove the indexes based on start_
        id_tr, id_vl = self._split_train_validation(tr_labels)
        
        mean_tr, lst_tr = self.output_embedding([tr_data[i] for i in id_tr])
        self.mean_tr = mean_tr
        self.lst_tr = lst_tr
        self.y_tr = tr_labels[id_tr]
        
        mean_vl, lst_vl = self.output_embedding([tr_data[i] for i in id_vl])
        self.mean_vl = mean_vl
        self.lst_vl = lst_vl
        self.y_vl = tr_labels[id_vl]
        
        mean_ts, lst_ts = self.output_embedding([x_ts for x_ts in ts_data])
        self.mean_ts = mean_ts
        self.lst_ts = lst_ts
        self.y_ts = ts_labels
        

    def _split_train_validation(self, y_learning):
        """ Split of the training and validation set.
        We chose randomly n_training elements from the learning set.
        :param y_learning: labels from the training IMDb dataset.
        """
        id_tr, id_vl = np.array([], dtype=int), np.array([], dtype=int)
        n_tr = self.exp.dataset.n_training

        for y_ in np.unique(y_learning):
            id_class_y_ = np.where(y_learning == y_)[0]
            tmp_id_tr = np.random.choice(id_class_y_,
                                         size=n_tr,
                                         replace=False)
            tmp_id_vl = np.random.choice(np.setdiff1d(id_class_y_, tmp_id_tr),
                                         size=self.n_vl,
                                         replace=False)
            id_tr = np.append(id_tr, tmp_id_tr)
            id_vl = np.append(id_vl, tmp_id_vl)
        return id_tr, id_vl

    def output_embedding(self, X):
        """ Dataset, it has dimensions (n, #words in sample i-th)
        :param X: dataset, list of length n (samples), containing lists
        :return mean_set: the mean embedding for each sample
        :return lst_set: list containing the embedding for each word
        """
        mean_set, lst_set = [], []
        for x in X:
            mean_, lst_ = self.preprocessing(x)  # x is a sample (containing n indexes)
            mean_set.append(mean_)
            lst_set.append(lst_)
        return np.array(mean_set), lst_set
        
    def preprocessing(self, x):
        """ Here we call the functions to perform different types of pre-processing.
        This consists in:
            1) excluding the most frequent words of the dictionary
            2) remove a fixed amount of words
            3) transform indexes into words
            4) embed the each word and average
        :param x: a sample, which is a list containing different indexes,
        one for each word
        """
        x = self._exclude_most_freq(x)
        x = self._exclude_words(x)
        x = self._index2str(x)
        return self._embedding(x)
    

    def _exclude_most_freq(self, x):
        """ We exclude the most frequent words here.
        :param x: we pass one sample, a list containing indexes
        """
        x = np.array(x)
        return x[x >= self.exp.dataset.first_index]

    def _exclude_words(self, x):
        """ We exclude words here.
        The most frequent ones are typically not relevant to the task.
        :param x: a sample, it contains the indexes for the sample
        """
        if self.exp.dataset.removed_words == 0:
            return x
        elif self.exp.dataset.removed_words >= 1:
            raise ValueError("Maximum value of removed words must be less than one.")
        n_to_rm = int(self.exp.dataset.removed_words * len(x))
        rnd_rm = np.random.choice(np.arange(len(x)), size=n_to_rm)
        return list(np.delete(np.array(x), rnd_rm))

    def _index2str(self, x):
        """ From indices to words, given a single sample.
        Transform in a list of string values.
         :param x: a data from imdb.load_data(),
         it contains the most used words
         """
        return [self.reverse_word_index[id_] for id_ in x]

    def _embedding(self, words_lst):
        """ Here we generate the embedding for each sample.
        We use the pre-trained word-vectors from gensim-data
        :param words_lst: the list of words in a sample
        :return embedding_mean: the mean value for the embedding
        over the entire sample.
        :return embedding_array: for each sample, this is a matrix,
        each row is the embedding of a word in the dataset
        """
        emb_lst = []
        for w_ in words_lst:
            if w_ in self.glove_embedding.vocab:
                emb_lst.append(self.glove_embedding[w_])
        embedding_array = np.array(emb_lst)
        embedding_mean = np.mean(embedding_array, axis=0)
        return embedding_mean, embedding_array

In [None]:
dataset_generator = DatasetGenerator(exp)

In [None]:
X_tr = dataset_generator.lst_tr
y_tr = dataset_generator.y_tr

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Flatten, Dense, Conv2D, GlobalMaxPooling2D, Concatenate
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint

In [None]:
np.max(np.array([x_.shape[0] for x_ in X_tr]))

In [None]:
bias_shapes = [[1, 3],[1, 2]]
for b_ in bias_shapes:
    print(b_)
    
# bias = [tf.zeros_initializer(shape=b_) for b_ in bias_shapes]
bias = [tf.Variable(np.zeros((b_[0], b_[1]), dtype=np.float32)) for b_ in bias_shapes]
weights = [tf.Variable(np.ones((b_[0], b_[1]), dtype=np.float32)) for b_ in bias_shapes]
bias[0]

In [None]:
tf.math.maximum(weights[1], 0)

In [None]:
dataset = np.array([np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100)])

dataset = dataset.reshape(9, 50, 100, 1)

dataset = np.vstack((dataset,dataset))
y__ = np.random.randint(low=0, high=2, size=18)
dataset.shape

In [None]:
X_tr_reshape[0].shape

In [None]:
model = tf.keras.Sequential()
model.add(Conv2D(filters=15,
                 kernel_size=(8, 100),
                 activation='relu',
                 input_shape=(None, 100, 1),
                 data_format='channels_last'))
model.add(GlobalMaxPooling2D())
model.add(Dense(2, 
                activation='relu'))

model.compile(optimizer='sgd',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

print(model.summary())

In [None]:
history = model.fit(x=X_tr_reshape, 
                    y=y_tr, 
                    epochs=5,
                    batch_size=1)

In [None]:
X_tr_reshape = [] 
for x_ in X_tr:
    w_, e_ = x_.shape
    X_tr_reshape.append(x_.reshape(w_, e_, 1))
X_tr_reshape = np.array(X_tr_reshape)

X_tr_reshape.shape

In [None]:
from tensorflow.keras import Sequential, Input
from tensorflow.keras.layers import concatenate

submodels = []
FILTERS = 15
MAX_LENGTH = 1000
output_dims = 2


for kw in [8, 12]:    # kernel sizes
    submodel = Sequential()
    submodel.add(Conv2D(FILTERS,
                        kw,
                        padding='valid',
                        activation='relu',
                        strides=1,
                        input_shape=(None, 100, 1)))
    submodel.add(GlobalMaxPooling2D())
    submodels.append(submodel)
    print(submodel.summary())

# big_model = Sequential()
big_model = submodel
# big_model.add()
big_model.add(Dense(1))

print('Compiling model')
big_model.compile(loss='sparse_categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

history = big_model.fit(x=X_tr_reshape, 
                        y=np.array([[0]]), 
                        epochs=5,
                        batch_size=1)

In [None]:
model = tf.keras.Sequential()
model.add(Conv2D(filters=15,
                 kernel_size=(8, 100),
                 activation='relu',
                 input_shape=(None, 100, 1)))
model.add(GlobalMaxPooling2D())
model.add(Dense(2, 
                activation='relu'))

model.compile(optimizer='sgd',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'] 
              )

history = model.fit(X_tr_reshape, y__, epochs=5)

In [None]:
model.summary()

In [None]:
X_tr_reshape[1].shape

In [None]:
model.fit(X_tr_reshape)

In [None]:
X_data_padded = np.zeros((len(X_tr), 1000, 100, 1))
idx_bef_padding = [x_.shape[0] for x_ in X_tr]
print(idx_bef_padding)
for i_, x_ in enumerate(X_tr):
    n_w_ = x_.shape[0]
    X_data_padded[i_, :n_w_, :, :] = x_.reshape(n_w_, -1, 1)

In [None]:
KERNEL_SIZE=8
EMBEDDING_SIZE=100
FILTERS=15
MAX_LENGTH=1000

np_mask = np.zeros((20, MAX_LENGTH-KERNEL_SIZE+1, 1, FILTERS))
for i_, id_ in enumerate(idx_bef_padding):
    np_mask[i_, :id_, 0, :] = 1

# input_x = Input(shape=(MAX_LENGTH, EMBEDDING_SIZE, 1),
#                 name='input_x')
# input_mask = Input(shape=(MAX_LENGTH-KERNEL_SIZE+1, EMBEDDING_SIZE, 1),
#                    name='input_m')

print(input_mask)
model = tf.

print(model)
model.add(Conv2D(filters=15,
                 kernel_size=(KERNEL_SIZE, EMBEDDING_SIZE),
                 activation='relu',
                 input_shape=(MAX_LENGTH, EMBEDDING_SIZE, 1)))


model.add(GlobalMaxPooling2D())
model.add(Dense(2, 
                activation='relu'))

sgd = SGD(learning_rate=0.6)
model.compile(optimizer='sgd',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
# {"x0": x0, "x1": x1}
history = model.fit({"input_x":X_data_padded, 
                     "input_m":np_mask}, 
                    y_tr, 
                    epochs=50)

In [None]:
np_mask.shape

In [None]:
model.summary()

In [1]:
import numpy as np
import tensorflow as tf

In [None]:
k_array = np.random.choice(np.arange(1, 20), size=3)

example = []
for k_ in k_array:
    example.append(np.random.randn(k_, 10))

In [None]:
window_size=5
kernels=2
embedding=10
output_classes=2

def gen_padding_bm(data, window, kernels, max_len):
    """ Here we create the input of the convolutional net. 
    There are two matrices which are passed as input,
        the data, padded in such a way to have the shape of a tensor
        the boolean mask, which is needed to reduce edge effect
    :param data: this is a list, containing elements which are np.arrays
    of dimension (number_of_words, size_of_the_embedding)
    :param window: dimensionality of the kernel, number of words involved in the conv
    :param kernels: number of filters
    :param max_len: maximum dimension of the input
    :returns data_padded: np.array of dimension (n_samples, max_len, emb, 1),
    with 1 corresponding to the number of channels
    :returns bool_mask: np.array of dimension (n_samples, max_len-window+1, emb, 1)
    1 correspond to the features we want to keep
    """
    n_samples = len(data)   # number of examples
    _, emb = data[0].shape  # dimensionality of the embedding
    data_padded = np.zeros((n_samples, max_len, emb, 1))  # uniform dim
    bool_mask = np.zeros((n_samples, max_len-window+1, 1, kernels))  # bool mask
    id_before_padding = [x_.shape[0] for x_ in data]
    for i_, (n_w_, x_) in enumerate(zip(id_before_padding, data)):
        data_padded[i_, :n_w_, :, :] = x_.reshape(n_w_, -1, 1)
        bool_mask[i_, :n_w_-window, 0, :]  = 1  # to reduce edge effects
    return data_padded, bool_mask

def get_weight(shape, name):
    """ Weights initializer.
    :param shape: the shape of the model, list of dimensions for the layer
    :param name: str, layer's name
    """
    initializer = tf.initializers.glorot_uniform()
    return tf.Variable(initializer(shape), name=name, trainable=True, dtype=tf.float32)

shapes = [[window_size, embedding, 1, kernels],
          [kernels, output_classes]]
weights = [get_weight(shapes[i], 'weight{}'.format(i)) for i in range(len(shapes))]

weights_init_npy0 = weights[0].numpy()
weights_init_npy1 = weights[1].numpy()

print('before training', weights[1])

def loss(pred, target):
    return tf.losses.categorical_crossentropy(target, pred)
    
def model(x, window=8, kernels=2, max_len=1000):
    """ Here we generate the shallow CNN for text classification
    as in the work of Yoon Kim
    
                https://arxiv.org/pdf/1408.5882.pdf
                
    We first reduce the dataset to a tensor of fixed dimensions.
    We moreover generate a boolean array so to discard edge effects.
    The architecture is such that
    
        padding, input tensor (n_samples, max_length, embedding, 1)
        bool_mask, input (n_samples, max_length-window+1, 1, kernels)
        
        o1 = w1 * padding, dim: (n, max_length-window+1, 1, kernels)
        o1 = o1 x bool_mask, dim: (n, max_length-window+1, 1, kernels)
        o1 = max(o1), dim: (n, kernels)
        o1 = w2 x o1, dim: (n, classes)
        o1 = softmax(o1), dim: (n, classes) 
    """
    padded_data, bm = gen_padding_bm(x,
                                     window=window_size,
                                     kernels=kernels,
                                     max_len=max_len)
    embedding = padded_data.shape[2]
    x1 = tf.cast(padded_data, dtype=tf.float32)  
    x2 = tf.cast(bm, dtype=tf.float32) 
    c1 = tf.nn.conv2d(x1,
                      weights[0], 
                      strides=1,
                      padding=[[0,0],
                               [0,0],
                               [0,0],
                               [0,0]])  
    depad = tf.multiply(c1, x2)
    max_pool_output = tf.math.reduce_max(depad, 
                                         axis=(1,2))
    return tf.nn.softmax(tf.nn.relu(tf.matmul(max_pool_output, 
                                              weights[1])))

optimizer = tf.optimizers.SGD(learning_rate=0.05, momentum=0)

# almost but not exactly the same
def train_step(model, inputs, outputs):
    with tf.GradientTape() as tape:
        current_loss = loss(model(inputs), outputs)
    grads = tape.gradient(current_loss, weights)
    optimizer.apply_gradients(zip(grads, weights))
    print(tf.reduce_mean(current_loss))
    
def train_sum_step(model, inputs, outputs):
    with tf.GradientTape() as tape:
        current_loss_sum_ = tf.reduce_mean(loss(model(inputs), outputs))
    grads = tape.gradient(current_loss_sum_, weights)
    optimizer.apply_gradients(zip(grads, weights))
    print(current_loss_sum_)
    
num_epochs = 50

for e in range(num_epochs):
    train_step(model, example, tf.one_hot(np.array([0, 1, 1]), depth=2))
print('wo sum', weights[1])

weights[0] = tf.Variable(weights_init_npy0)
weights[1] = tf.Variable(weights_init_npy1)
print('initialization', weights[1])

for e in range(num_epochs):
    train_sum_step(model, example, tf.one_hot(np.array([0, 1, 1]), depth=2))
print('w sum', weights[1])

weights[0] = tf.Variable(weights_init_npy0)
weights[1] = tf.Variable(weights_init_npy1)
print('initialization', weights[1])

for e in range(num_epochs):
    train_step(model, example, tf.one_hot(np.array([0, 1, 1]), depth=2))
print('wo sum_ 2nd time', weights[1])

In [None]:
def train_step_epoch(n_samples, batch_size, outputs=None):
    """ Generate the batches and call the train_step_batch function """
    batches_per_epoch = int(np.floor(n_samples / batch_size))
    id_batches = np.random.choice(np.arange(n_samples),
                                  size=(batches_per_epoch, batch_size))
    for id_ in id_batches:
        print(id_)
    return id_batches

In [3]:
lst = [[1,2,3,4,5],[102,4,5,3],[2,4,5]]
idx_ = np.array([1,2])

In [None]:
np.random.choice([0, 1], size=10)

In [None]:
N = 10 
b = 2 

X = np.random.randn(N, 3)
X

In [None]:
id_batches = train_step_epoch(10, 2)

In [None]:
X.shape

In [None]:
X[id_batches[0]]

In [None]:
from tensorflow.keras import Sequential

class CNN_text:
    "Doc missing"
    def __init__(self,
                 window_size,
                 embedding,
                 kernels,
                 output_classes,
                 max_length):
        "Doc missing"
        self.window_size = window_size
        self.embedding = embedding
        self.kernels = kernels
        self.output_classes = output_classes
        self.max_length = max_length
    
        self.shapes = [[self.window_size, self.embedding, 1, self.kernels],
                       [self.kernels, self.output_classes]]
        
        weights = [get_weight(self.shapes[i], 'weight{}'.format(i)) 
                   for i in range(len(self.shapes))]
        weights = self.weights
        
    def _gen_padding_bm(data, window, kernels, max_len):
    """ Here we create the input of the convolutional net. 
    There are two matrices which are passed as input,
        the data, padded in such a way to have the shape of a tensor
        the boolean mask, which is needed to reduce edge effect
    :param data: this is a list, containing elements which are np.arrays
    of dimension (number_of_words, size_of_the_embedding)
    :param window: dimensionality of the kernel, number of words involved in the conv
    :param kernels: number of filters
    :param max_len: maximum dimension of the input
    :returns data_padded: np.array of dimension (n_samples, max_len, emb, 1),
    with 1 corresponding to the number of channels
    :returns bool_mask: np.array of dimension (n_samples, max_len-window+1, emb, 1)
    1 correspond to the features we want to keep
    """
    n_samples = len(data)   # number of examples
    _, emb = data[0].shape  # dimensionality of the embedding
    data_padded = np.zeros((n_samples, max_len, emb, 1))  # uniform dim
    bool_mask = np.zeros((n_samples, max_len-window+1, 1, kernels))  # bool mask
    id_before_padding = [x_.shape[0] for x_ in data]
    for i_, (n_w_, x_) in enumerate(zip(id_before_padding, data)):
        data_padded[i_, :n_w_, :, :] = x_.reshape(n_w_, -1, 1)
        bool_mask[i_, :n_w_-window, 0, :]  = 1  # to reduce edge effects
    return data_padded, bool_mask
 
    def generate(x):
    """ Here we generate the shallow CNN for text classification
    as in the work of Yoon Kim
    
                https://arxiv.org/pdf/1408.5882.pdf
                
    We first reduce the dataset to a tensor of fixed dimensions.
    We moreover generate a boolean array so to discard edge effects.
    The architecture is such that
    
        padding, input tensor (n_samples, max_length, embedding, 1)
        bool_mask, input (n_samples, max_length-window+1, 1, kernels)
        
        o1 = w1 * padding, dim: (n, max_length-window+1, 1, kernels)
        o1 = o1 x bool_mask, dim: (n, max_length-window+1, 1, kernels)
        o1 = max(o1), dim: (n, kernels)
        o1 = w2 x o1, dim: (n, classes)
        o1 = softmax(o1), dim: (n, classes) 
    """
    padded_data, bm = self._gen_padding_bm(x,
                                           window=self.window_size,
                                           kernels=self.kernels,
                                           max_len=self.max_length)
    embedding = padded_data.shape[2]
    x1 = tf.cast(padded_data, dtype=tf.float32)  
    x2 = tf.cast(bm, dtype=tf.float32) 
    c1 = tf.nn.conv2d(x1,
                      weights[0], 
                      strides=1,
                      padding=[[0,0],
                               [0,0],
                               [0,0],
                               [0,0]])  
    depad = tf.multiply(c1, x2)
    max_pool_output = tf.math.reduce_max(depad, 
                                         axis=(1,2))
    return tf.nn.softmax(tf.nn.relu(tf.matmul(max_pool_output, 
                                              weights[1])))

    def loss(pred, target):
        return tf.losses.categorical_crossentropy(target, pred)
    
    def get_weight(shape, name):
        """ Weights initializer.
        :param shape: the shape of the model, list of dimensions for the layer
        :param name: str, layer's name
        """
        initializer = tf.initializers.glorot_uniform()
        return tf.Variable(initializer(shape), name=name, trainable=True, dtype=tf.float32)

    def train_step(model, inputs, outputs):
        with tf.GradientTape() as tape:
            current_loss = loss(model(inputs), outputs)
        grads = tape.gradient(current_loss, weights)
        optimizer.apply_gradients(zip(grads, weights))
        print(tf.reduce_mean(current_loss))
    
    def train_sum_step(model, inputs, outputs):
        with tf.GradientTape() as tape:
            current_loss_sum_ = tf.reduce_mean(loss(model(inputs), outputs))
        grads = tape.gradient(current_loss_sum_, weights)
        optimizer.apply_gradients(zip(grads, weights))
        print(current_loss_sum_)
        
    def learning_on_plateau():
    
    def early_stopping():
        
    num_epochs = 50
    for e in range(num_epochs):
        train_step(model, example, tf.one_hot(np.array([0, 1, 1]), depth=2))
    print('wo sum', weights[1])

    weights[0] = tf.Variable(weights_init_npy0)
    weights[1] = tf.Variable(weights_init_npy1)
    print('initialization', weights[1])

    for e in range(num_epochs):
        train_sum_step(model, example, tf.one_hot(np.array([0, 1, 1]), depth=2))
    print('w sum', weights[1])

In [None]:
import numpy as np
import tensorflow as tf

In [None]:
def get_weight(shape, name):
    """ Weights initializer.

    :param shape: the shape of the model, list of dimensions for the layer
    :param name: str, layer's name

    :returns tf.Variable: a tensor object corresponding to a trainable layer
    """
    initializer = tf.initializers.glorot_uniform()
    return tf.Variable(initializer(shape),
                       name=name,
                       trainable=True,
                       dtype=tf.float32)

def _gen_padding_bm(data):
    """ Here we create the input of the convolutional net.
    There are two matrices which are passed as input,
        the data, padded in such a way to have the shape of a tensor
        the boolean mask, which is needed to reduce edge effect

    :param data: this is a list, containing elements which are np.arrays
    of dimension (number_of_words, size_of_the_embedding)

    :returns data_padded: np.array of dimension (n_samples, max_len, emb, 1),
    with 1 corresponding to the number of channels
    :returns bool_mask: np.array of dimension (n_samples, max_len-window+1, emb, 1)
    1 correspond to the features we want to keep
    """
    n_samples = len(data)   # number of examples
    _, emb = data[0].shape  # dimensionality of the embedding
    max_length = np.max(np.array([d_.shape[0] for d_ in data]))
    data_padded = np.zeros((n_samples, max_length, emb, 1))  # uniform dim
    id_before_padding = [x_.shape[0] for x_ in data]

    for i_, (n_w_, x_) in enumerate(zip(id_before_padding, data)):
        data_padded[i_, :n_w_, :, :] = x_.reshape(n_w_, -1, 1)
        
    return data_padded  

def train_batch_step(padded_data, bm_data, outputs):
    """
    Here we train on a single batch

    :param padded_data: np.array of uniform dimensions containing the embedding
    :param bm_data: np.array with the bool mask to reduce the padding
    :param outputs: output labels

    :return: loss value over single batch
    """
    with tf.GradientTape() as tape:
        current_loss_sum_ = tf.reduce_mean(tf.losses.categorical_crossentropy(model(padded_data),
                                                                              tf.one_hot(outputs,
                                                                                         depth=2)))
    grads = tape.gradient(current_loss_sum_, weights + bias)
    optimizer.apply_gradients(zip(grads, weights + bias))
    
def train_epoch_step(inputs,
                     outputs,
                     batch_size):
    """ Generate the batches and call the train_step_batch function.

    :param inputs: not uniform input, a list
    :param outputs: target, the labels
    :param batch_size: number of example for the batch

    :return [loss, accuracy]: loss and accuracy on the training set
    """
    n_samples = len(inputs)  # this is a list
    batches_per_epoch = int(np.floor(n_samples / batch_size))
    id_batches = np.random.choice(np.arange(n_samples),
                                  size=(batches_per_epoch, batch_size))
    print('input dimensions', inputs_padding.shape)
    for id_ in id_batches:
        tmp_in = _gen_padding_bm([inputs_[i__] for i__ in id_])
        train_batch_step(tmp_in, outputs[id_])
    return evaluate(inputs, outputs)

def model(x, padded=False): 
    """ We first reduce the dataset to a tensor of fixed dimensions.
    We moreover generate a boolean array so to discard edge effects.
    The architecture is such that

        padding, input tensor (n_samples, max_length, embedding, 1)
        bool_mask, input (n_samples, max_length-window+1, 1, kernels)

        o1 = w1 * padding, dim: (n, max_length-window+1, 1, kernels)
        o1 = o1 x bool_mask, dim: (n, max_length-window+1, 1, kernels)
        o1 = max(o1), dim: (n, kernels)
        o1 = w2 x o1, dim: (n, classes)
        o1 = softmax(o1), dim: (n, classes)

    :param x: input data
    :param padded: bool, if True the data have already been padded

    :returns logits: the output of the model, after applying the softmax function
    """
    max_norm = 3  # for l2 regularization
    bernoulli = 0.5  # 
    if not padded:
        padded_data = _gen_padding_bm(x)
    else:
        padded_data = x

    x1 = tf.cast(padded_data, dtype=tf.float32)

    # for all the convolutional layers
    c_lst = [tf.nn.conv2d(x1,
                          weights[j_],
                          strides=1,
                          padding=[[0, 0],
                                   [0, 0],
                                   [0, 0],
                                   [0, 0]])
             for j_ in range(len(weights)-1)]
    print([c_.shape for c_ in c_lst])

    depad_lst = [tf.nn.relu(c_ + tf.constant(np.ones((c_.shape[0], c_.shape[1], c_.shape[2], 1),
                                                      dtype=np.float32)) * b_)
                 for (c_, b_) in zip(c_lst, bias[:-1])]
    print([d_.shape for d_ in depad_lst])

    max_pool_output = [tf.math.reduce_max(depad_, axis=(1, 2)) for depad_ in depad_lst]
    print([m_pool.shape for m_pool in max_pool_output])
    stack_output = tf.concat(max_pool_output, axis=-1)
    
    norm_weights = tf.norm(weights[-1], axis=-1)
    bm = norm_weights > max_norm     
        
    l2_weights = tf.stack([max_norm * weights[-1][i] / norm_weights[i] 
                          if tmp_bm_ else weights[-1][i] for i, tmp_bm_ in enumerate(bm)])
    
    weights[-1] = tf.cast(l2_weights, dtype=tf.float32)
    # weights[-1] l2 reg by heart
    lst_layer = tf.matmul(tf.nn.dropout(stack_output, rate=bernoulli), weights[-1])
    
    print('shape weights', weights[-1].shape)
    print(weights[-1])

    return tf.nn.softmax(tf.nn.relu(tf.matmul(stack_output,
                                              weights[-1]) + bias[-1]))


window_size = [3, 4, 5]
embedding = 15
kernels = 100       # kernels for each convolutional filter?
output_classes = 2  # task of sentiment analysis
lr = 0.2

shapes_conv_layer = [[w_, embedding, 1, kernels] for w_ in window_size]
shapes_last_layer = [[kernels * len(window_size), output_classes]]
shapes = shapes_conv_layer + shapes_last_layer

weights = [get_weight(shapes[i], 'weight{}'.format(i))
           for i in range(len(shapes))]

bias_shapes = [[1, kernels] for w_ in range(len(window_size))] + [[1, output_classes]]
print(bias_shapes)
bias = [tf.Variable(np.zeros((b_[0], b_[1]), dtype=np.float32))
        for b_ in bias_shapes]

In [4]:
xx = [[1,2,3,4,5], [4,3,2], [39,3,3,2,1]]
[xx[i] for i in np.array([0,2])]

[[1, 2, 3, 4, 5], [39, 3, 3, 2, 1]]

In [None]:
print(xx[i])
tf.minimum(3, tf.Variable(xx[i]))

In [25]:
dataset = np.array([np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100),
                    np.random.randn(50, 100)])

dataset = dataset.reshape(9, 50, 100, 1)

In [15]:
x = tf.Variable(3 * np.ones(3, dtype=np.int64))
y = tf.Variable(np.array([[1,2,3],
                          [1,0,0]]))

print(x)
print(y)

out_ = tf.math.multiply(tf.expand_dims(x, axis=0), y)

print(out_)

<tf.Variable 'Variable:0' shape=(3,) dtype=int64, numpy=array([3, 3, 3])>
<tf.Variable 'Variable:0' shape=(2, 3) dtype=int64, numpy=
array([[1, 2, 3],
       [1, 0, 0]])>
tf.Tensor(
[[3 6 9]
 [3 0 0]], shape=(2, 3), dtype=int64)


In [23]:
print(out_[0])

print(tf.norm(tf.cast(out_, dtype=tf.float32), axis=0))

3 / tf.maximum(3, tf.norm(tf.cast(out_, dtype=tf.float32), axis=0))

tf.Tensor([3 6 9], shape=(3,), dtype=int64)
tf.Tensor([4.2426405 6.        9.       ], shape=(3,), dtype=float32)


<tf.Tensor: id=239, shape=(3,), dtype=float32, numpy=array([0.7071068 , 0.5       , 0.33333334], dtype=float32)>

In [None]:
max_norm = 3
norm_weights = max_norm / tf.maximum(max_norm, tf.norm(out_[0]))

out_

In [None]:
tf.math.multiply(tf.Variable(np.ones(3)), tf.Variable(np.ones(3)))

(9, 50, 100, 1)

In [68]:
window_size = [3, 4, 5]
embedding = 100
kernels = 10   # kernels for each convolutional filter?
output_classes = 2  # task of sentiment analysis
lr = 0.5
max_norm = 3

def get_weight(shape, name):
    """ Weights initializer.

    :param shape: the shape of the model, list of dimensions for the layer
    :param name: str, layer's name

    :returns tf.Variable: a tensor object corresponding to a trainable layer
    """
    initializer = tf.initializers.glorot_uniform()
    return tf.Variable(initializer(shape),
                       name=name,
                       trainable=True,
                       dtype=tf.float32)
    
shapes_conv_layer = [[w_, embedding, 1, kernels] for w_ in window_size]
shapes_last_layer = [[kernels * len(window_size), output_classes]]
shapes = shapes_conv_layer + shapes_last_layer

# for i in range(len(shapes)):
#     print(get_weight)
    
weights = [get_weight(shapes[i], name='weight{}'.format(i))
           for i in range(len(shapes))]

bias_shapes = [[kernels, 1] for _ in range(len(window_size))] + [[1, output_classes]]
bias = [tf.Variable(np.ones((b_[0], b_[1]), dtype=np.float32))
        for b_ in bias_shapes]

optimizer = tf.optimizers.SGD(learning_rate=lr, momentum=0.)

padded_data = tf.constant(dataset)
max_norm = 3
bernoulli = 0.5
x1 = tf.cast(padded_data, dtype=tf.float32)
c_lst = [tf.nn.conv2d(x1,
                      weights[j_],
                      strides=1,
                      padding=[[0, 0],
                               [0, 0],
                               [0, 0],
                               [0, 0]])
         for j_ in range(len(weights)-1)]
    
# depad_lst = [tf.nn.relu(c_ + tf.constant(np.ones((c_.shape[0], c_.shape[1], c_.shape[2], 1),
#                                                   dtype=np.float32)) * b_)
#              for (c_, b_) in zip(c_lst, bias[:-1])]

depad_lst = [tf.nn.relu(c_ + b_)
             for (c_, b_) in zip(c_lst, bias[:-1])]

    
max_pool_output = [tf.math.reduce_max(depad_, axis=(1, 2)) for depad_ in depad_lst]
stack_output = tf.concat(max_pool_output, axis=-1)  # max pooling for all filters
norm_factor = max_norm / tf.maximum(max_norm, tf.norm(weights[-1], axis=0))
weights_ = tf.math.multiply(weights[-1], tf.expand_dims(norm_factor, axis=0))

tf.nn.softmax(tf.nn.relu(tf.matmul(tf.nn.dropout(stack_output, rate=bernoulli),
                                          weights_)
                         + bias[-1]))

shape stack:  (9, 30)
stack output tf.Tensor(
[[2.332398  1.8630922 2.1214435 1.8010736 1.4867631 2.0538683 1.9626138
  1.7879187 1.8524877 2.2088437 1.8257742 1.816961  1.4248927 1.8328207
  2.1272378 2.4448812 1.9019835 1.7031076 2.0582237 1.886518  2.1893208
  1.5831802 2.2825003 1.9813423 1.7777984 1.8720455 2.2496622 1.8453422
  1.8104869 2.0029702]
 [1.598913  2.4138987 2.703781  1.7718315 1.8488879 1.6589468 1.5838578
  1.6436236 1.7747483 1.7426593 2.1603308 1.9582219 1.9306952 2.0486352
  1.9146405 1.8834946 2.2700248 1.9075313 1.6279695 1.9739599 1.9781665
  1.9543726 1.6457492 2.1369655 1.9467411 1.5701345 1.9562988 1.9603231
  1.5837873 1.8780863]
 [1.6130258 1.8430867 2.0311074 2.0157812 1.8900421 2.124814  1.7894207
  1.7446111 1.8666575 2.229631  1.9988213 1.7887042 2.0624864 1.8043039
  2.200151  2.2494295 1.6745272 2.0546455 1.8889694 1.7302876 1.9787111
  1.8213413 1.6300561 2.1897554 1.7821853 1.6300544 1.7443308 1.948788
  2.0612752 1.820596 ]
 [2.1405773 1.9990156 

<tf.Tensor: id=3802, shape=(9, 2), dtype=float32, numpy=
array([[9.9999845e-01, 1.5116467e-06],
       [8.3569753e-01, 1.6430253e-01],
       [9.9996018e-01, 3.9857379e-05],
       [9.9985492e-01, 1.4506295e-04],
       [9.9850285e-01, 1.4971243e-03],
       [9.9315095e-01, 6.8490040e-03],
       [9.4597942e-01, 5.4020543e-02],
       [5.0000000e-01, 5.0000000e-01],
       [9.9999130e-01, 8.6581103e-06]], dtype=float32)>

In [57]:
weights[-1]

<tf.Tensor: id=2958, shape=(2,), dtype=float32, numpy=array([-0.34225303,  0.41425046], dtype=float32)>

In [45]:
y = x
print(x)
print(y)

print(x * y)

<tf.Variable 'Variable:0' shape=(3,) dtype=int64, numpy=array([3, 3, 3])>
<tf.Variable 'Variable:0' shape=(3,) dtype=int64, numpy=array([3, 3, 3])>
tf.Tensor([9 9 9], shape=(3,), dtype=int64)


In [65]:
xxx = weights[-1].numpy()
print(xxx)
np.linalg.norm(xxx, axis=0)

[[ 0.00204888  0.03794369]
 [-0.17463446  0.32155117]
 [-0.19833496 -0.43216068]
 [ 0.35364744  0.12675437]
 [-0.35755625 -0.00674838]
 [-0.2035008  -0.05474293]
 [ 0.18332669  0.38334343]
 [ 0.40378729  0.40338716]
 [-0.28325248 -0.41558236]
 [-0.32761064 -0.3344826 ]
 [-0.10421717  0.17281076]
 [-0.07968849 -0.3115092 ]
 [ 0.32004532 -0.0669494 ]
 [ 0.41113195  0.14366165]
 [ 0.23162362 -0.02027196]
 [ 0.38439593 -0.1889917 ]
 [ 0.39039436 -0.3733074 ]
 [ 0.13598052 -0.36755043]
 [ 0.2852355  -0.10046044]
 [-0.01449642 -0.33549652]
 [ 0.13937643 -0.05323243]
 [ 0.31090876  0.09859082]
 [-0.2559731  -0.21248531]
 [-0.31904972  0.40190503]
 [ 0.31203404 -0.11498028]
 [-0.14575353 -0.01189315]
 [-0.03977287  0.06153083]
 [-0.08846799  0.0925059 ]
 [ 0.34201983  0.13616666]
 [-0.42916086  0.42607382]]


array([1.4844205, 1.3911282], dtype=float32)

In [70]:
?? tf.optimizers.Adadelta