<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# NLP Basics

**Prediction of Text (based on Words)**

&copy; Dr. Yves J. Hilpisch

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>

## Imports

In [None]:
!git clone https://github.com/tpq-classes/natural_language_processing.git
import sys
sys.path.append('natural_language_processing')


In [None]:
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, SimpleRNN, LSTM
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers.legacy import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from keras.preprocessing.sequence import TimeseriesGenerator

In [None]:
np.set_printoptions(suppress=True)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

In [None]:
import warnings
warnings.simplefilter('ignore')

In [None]:
from pylab import plt
plt.style.use('seaborn-v0_8')
%config InlineBackend.figure_format = 'svg'

## The Text

In [None]:
text = 'this is a short sentence. this is another one. and yet another one. this is a'

In [None]:
text = '''So much for a blind obedience to a blundering oracle, throwing the
stones over their heads behind them, and not seeing where they fell.
Most men, even in this comparatively free country, through mere
ignorance and mistake, are so occupied with the factitious cares and
superfluously coarse labors of life that its finer fruits cannot be
plucked by them. Their fingers, from excessive toil, are too clumsy
and tremble too much for that. Actually, the laboring man has not
leisure for a true integrity day by day; he cannot afford to sustain
the manliest relations to men; his labor would be depreciated in the
market. He has no time to be anything but a machine. so much for'''

In [None]:
text += ''' I sometimes wonder that we can be so frivolous, I may almost
say, as to attend to the gross but somewhat foreign form of
servitude called Negro Slavery, there are so many keen and subtle
masters that enslave both North and South. It is hard to have a
Southern overseer; it is worse to have a Northern one; but worst of
all when you are the slave-driver of yourself. Talk of a divinity
in man! Look at the teamster on the highway, wending to market by
day or night; does any divinity stir within him? His highest duty
to fodder and water his horses! What is his destiny to him compared
with the shipping interests? Does not he drive for Squire
Make-a-stir? How godlike, how immortal, is he? See how he cowers
and sneaks, how vaguely all the day he fears, not being immortal nor
divine, but the slave and prisoner of his own opinion of himself, a
fame won by his own deeds. Public opinion is a weak tyrant compared
with our own private opinion. What a man thinks of himself, that it
is which determines, or rather indicates, his fate.
Self-emancipation even in the West Indian provinces of the fancy and
imagination -- what Wilberforce is there to bring that about?
Think, also, of the ladies of the land weaving toilet cushions
against the last day, not to betray too green an interest in their
fates! As if you could kill time without injuring eternity.
The mass of men lead lives of quiet desperation. What is called
resignation is confirmed desperation. From the desperate city you
go into the desperate country, and have to console yourself with the
bravery of minks and muskrats. A stereotyped but unconscious
despair is concealed even under what are called the games and
amusements of mankind. There is no play in them, for this comes
after work.'''

In [None]:
text += ''' For many years I was self-appointed inspector of snow-storms and
rain-storms, and did my duty faithfully; surveyor, if not of
highways, then of forest paths and all across-lot routes, keeping
them open, and ravines bridged and passable at all seasons, where
the public heel had testified to their utility.
I have looked after the wild stock of the town, which give a
faithful herdsman a good deal of trouble by leaping fences; and I
have had an eye to the unfrequented nooks and corners of the farm;
though I did not always know whether Jonas or Solomon worked in a
particular field to-day; that was none of my business. I have
watered the red huckleberry, the sand cherry and the nettle-tree,
the red pine and the black ash, the white grape and the yellow
violet, which might have withered else in dry seasons.
In short, I went on thus for a long time (I may say it without
boasting), faithfully minding my business, till it became more and
more evident that my townsmen would not after all admit me into the
list of town officers, nor make my place a sinecure with a moderate
allowance. My accounts, which I can swear to have kept faithfully,
I have, indeed, never got audited, still less accepted, still less
paid and settled. However, I have not set my heart on that.
Not long since, a strolling Indian went to sell baskets at the
house of a well-known lawyer in my neighborhood. "Do you wish to
buy any baskets?" he asked. "No, we do not want any," was the
reply. "What!" exclaimed the Indian as he went out the gate, "do
you mean to starve us?"'''

In [None]:
text = text.lower().replace('\n', ' ')

In [None]:
text

In [None]:
text_ = text.split()

In [None]:
text_[:7]

In [None]:
length = 3

In [None]:
snippets = list()
next_words = list()

In [None]:
for i in range(len(text_) - length):
    snippets.append(text_[i:i + length])
    next_words.append(text_[i + length])

In [None]:
snippets[:5]

In [None]:
next_words[:5]

In [None]:
tokens = sorted(set(text_))
tokens[:10]

In [None]:
len(tokens)

In [None]:
wti = {c: i for i, c in enumerate(tokens)}

In [None]:
itw = {i: c for i, c in enumerate(tokens)}

In [None]:
X = list()
for s in snippets:
    il = list()
    for word in s:
        il.append(wti[word])
    X.append(il)
X = np.array(X)

In [None]:
X[:5]

In [None]:
y = np.array([wti[word] for word in next_words])

In [None]:
y[:5]

## RNNs for Classification 

In [None]:
encoder = OneHotEncoder(sparse_output=False)

In [None]:
y_ = encoder.fit_transform(y.reshape(-1, 1))

In [None]:
y_.shape

In [None]:
model = Sequential()
model.add(LSTM(64, activation='relu',
               return_sequences=True, input_shape=(length, 1)))
model.add(LSTM(64, activation='relu'))
model.add(Dense(len(tokens), activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer=Adam(learning_rate=0.001))

In [None]:
%time model.fit(X, y_, epochs=750, verbose=False)

In [None]:
model.predict(X)[:1]

In [None]:
p = np.argmax(model.predict(X), axis=1)
p[:10]

In [None]:
tp = [itw[max(i, 0)] for i in p]
textp = ' '.join(tp)
textp

In [None]:
# print(textp)

In [None]:
sum([text_[length:][i] == tp[i] for i in range(len(tp))]) / len(tp)

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>