<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# NLP Basics

**Prediction of Text**

&copy; Dr. Yves J. Hilpisch

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>

## Imports

In [None]:
!git clone https://github.com/tpq-classes/natural_language_processing.git
import sys
sys.path.append('natural_language_processing')


In [None]:
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, SimpleRNN, LSTM
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import StandardScaler, OneHotEncoder

In [None]:
np.set_printoptions(suppress=True)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

In [None]:
import warnings
warnings.simplefilter('ignore')

In [None]:
from pylab import plt
plt.style.use('seaborn-v0_8')
%config InlineBackend.figure_format = 'svg'

## The Text

In [None]:
text = 'this is a short sentence. this is another one. and yet another one.'

In [None]:
text = '''So much for a blind obedience to a blundering oracle, throwing the
stones over their heads behind them, and not seeing where they fell.
Most men, even in this comparatively free country, through mere
ignorance and mistake, are so occupied with the factitious cares and
superfluously coarse labors of life that its finer fruits cannot be
plucked by them. Their fingers, from excessive toil, are too clumsy
and tremble too much for that. Actually, the laboring man has not
leisure for a true integrity day by day; he cannot afford to sustain
the manliest relations to men; his labor would be depreciated in the
market. He has no time to be anything but a machine.'''

In [None]:
text = text.lower().replace('\n', ' ')

In [None]:
text

In [None]:
length = 10

In [None]:
snippets = list()
next_chars = list()

In [None]:
for i in range(len(text) - length):
    snippets.append(text[i:i + length])
    next_chars.append(text[i + length])

In [None]:
snippets[:5]

In [None]:
next_chars[:5]

In [None]:
chars = sorted(set(text))
chars[:10]

In [None]:
cti = {c: i for i, c in enumerate(chars)}

In [None]:
itc = {i: c for i, c in enumerate(chars)}

In [None]:
X = list()
for s in snippets:
    il = list()
    for c in s:
        il.append(cti[c])
    X.append(il)
X = np.array(X)

In [None]:
X[:5]

In [None]:
y = np.array([cti[c] for c in next_chars])

In [None]:
y[:5]

## RNNs for Estimation

In [None]:
tf.random.set_seed(100)

In [None]:
model = Sequential()
model.add(SimpleRNN(24, activation='relu',
                    input_shape=(length, 1)))
model.add(Dense(1, activation='linear'))
model.compile(loss='mse', optimizer=Adam())

In [None]:
%time model.fit(X, y, epochs=2500, verbose=False)

In [None]:
model.predict(X)[:5]

In [None]:
y[:5]

In [None]:
p = model.predict(X).round().flatten()
p[:5]

In [None]:
len(chars)

In [None]:
tp = [itc[min(max(i, 0), len(chars) - 1)] for i in p]
textp = ''.join(tp)
textp

## RNNs for Classification

In [None]:
encoder = OneHotEncoder(sparse_output=False)

In [None]:
y_ = encoder.fit_transform(y.reshape(-1, 1))

In [None]:
len(chars)

In [None]:
y_.shape

In [None]:
model = Sequential()
model.add(LSTM(256, activation='relu', input_shape=(length, 1)))
model.add(Dense(len(chars), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=Adam())

In [None]:
%time model.fit(X, y_, epochs=500, verbose=False)

In [None]:
model.predict(X)[:1]

In [None]:
p = np.argmax(model.predict(X), axis=1)
p[:10]

In [None]:
tp = [itc[max(i, 0)] for i in p]
textp = ''.join(tp)
textp

In [None]:
sum([text[length:][i] == textp[i] for i in range(len(textp))]) / len(textp)

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>