<a href="https://colab.research.google.com/github/ucheokechukwu/ml_tensorflow_deeplearning/blob/main/08_introduction_to_nlp_in_tensorflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# introduction to NLP fundamentals in Tensorflow

NLP has the goal of deriving information out of natural langauge (could be sequence text or speech).

Another common term for NLP problems is sequence to sequence problmes (seq2seq).

In [1]:
## check for GPU
!nvidia-smi -L

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.



In [2]:
# get helper functions
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
from helper_functions import unzip_data, create_tensorboard_callback, plot_loss_curves, compare_historys

--2023-03-13 17:05:30--  https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10246 (10K) [text/plain]
Saving to: ‚Äòhelper_functions.py‚Äô


2023-03-13 17:05:30 (31.8 MB/s) - ‚Äòhelper_functions.py‚Äô saved [10246/10246]



## Get a text dataset
Kaggle's introduction to NLP dataset. Text samples of tweets labelled as disaster or not disaster. 
- binary clssification
https://www.kaggle.com/c/nlp-getting-started

In [3]:
!wget https://storage.googleapis.com/ztm_tf_course/nlp_getting_started.zip
unzip_data("nlp_getting_started.zip")


--2023-03-13 17:05:38--  https://storage.googleapis.com/ztm_tf_course/nlp_getting_started.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 108.177.13.128, 172.217.193.128, 173.194.210.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|108.177.13.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 607343 (593K) [application/zip]
Saving to: ‚Äònlp_getting_started.zip‚Äô


2023-03-13 17:05:38 (68.9 MB/s) - ‚Äònlp_getting_started.zip‚Äô saved [607343/607343]



## Visualizing a text dataset

to visualize our text samples, we first have to read them in. we can do so using Pandas for Python 

In [4]:
import pandas as pd
train_df = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")
train_df.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


In [5]:
train_df["text"][20]

'this is ridiculous....'

In [6]:
train_df_shuffled = train_df.sample(frac=1, random_state=42)

In [7]:
train_df_shuffled.head()

Unnamed: 0,id,keyword,location,text,target
2644,3796,destruction,,So you have a new weapon that can cause un-ima...,1
2227,3185,deluge,,The f$&amp;@ing things I do for #GISHWHES Just...,0
5448,7769,police,UK,DT @georgegalloway: RT @Galloway4Mayor: ¬â√õ√èThe...,1
132,191,aftershock,,Aftershock back to school kick off was great. ...,0
6845,9810,trauma,"Montgomery County, MD",in response to trauma Children of Addicts deve...,0


In [8]:
# what does the text dataframe look like?
test_df.head()

Unnamed: 0,id,keyword,location,text
0,0,,,Just happened a terrible car crash
1,2,,,"Heard about #earthquake is different cities, s..."
2,3,,,"there is a forest fire at spot pond, geese are..."
3,9,,,Apocalypse lighting. #Spokane #wildfires
4,11,,,Typhoon Soudelor kills 28 in China and Taiwan


In [9]:
# how many examples of each class are there?
train_df.target.value_counts()

0    4342
1    3271
Name: target, dtype: int64

In [10]:
# how many total samples
len(train_df), len(test_df)

(7613, 3263)

In [11]:
# let's visualize some random training examples
import random
random_index = random.randint(0,len(train_df)-5)
for row in train_df_shuffled[["text", "target"]][random_index:random_index+5].itertuples():
  _, text, target = row
  print(f"Target: {target}", "(real disaster)" if target > 0 else "(not real disaster)")
  print(f"Text:\n{text}\n")
  print("---\n")

Target: 0 (not real disaster)
Text:
@CIA hey you guy's i stopped a massacre so you   send the cops to my house to make this town permanently hate me wtf?

---

Target: 0 (not real disaster)
Text:
I think this is my plan for retirement. Check out the weapons of mass instruction! #bookmobile #libraries #reading http://t.co/L2NMywrmq2

---

Target: 0 (not real disaster)
Text:
I feel like death

---

Target: 1 (real disaster)
Text:
Tunisia beach massacre linked to March terror attack on museum http://t.co/kuRqLxFiHL

---

Target: 1 (real disaster)
Text:
#flood #disaster Bengal floods: CM Mamata Banerjee blames DVC BJP claims state failed to use ... - Economic T... http://t.co/BOZlwr716Z

---



### Split data into training and validation sets

In [12]:
from sklearn.model_selection import train_test_split
train_sentences, val_sentences, train_labels, val_labels = train_test_split(train_df_shuffled["text"].to_numpy(),
                                                                             train_df_shuffled["target"].to_numpy(),
                                                                             test_size=0.1,
                                                                             random_state=42)
len(train_sentences), len(val_sentences), len(train_labels), len(val_labels)

(6851, 762, 6851, 762)

In [13]:
# Check the first ten examples
train_sentences[:10], train_labels[:10]

(array(['@mogacola @zamtriossu i screamed after hitting tweet',
        'Imagine getting flattened by Kurt Zouma',
        '@Gurmeetramrahim #MSGDoing111WelfareWorks Green S welfare force ke appx 65000 members har time disaster victim ki help ke liye tyar hai....',
        "@shakjn @C7 @Magnums im shaking in fear he's gonna hack the planet",
        'Somehow find you and I collide http://t.co/Ee8RpOahPk',
        '@EvaHanderek @MarleyKnysh great times until the bus driver held us hostage in the mall parking lot lmfao',
        'destroy the free fandom honestly',
        'Weapons stolen from National Guard Armory in New Albany still missing #Gunsense http://t.co/lKNU8902JE',
        '@wfaaweather Pete when will the heat wave pass? Is it really going to be mid month? Frisco Boy Scouts have a canoe trip in Okla.',
        'Patient-reported outcomes in long-term survivors of metastatic colorectal cancer - British Journal of Surgery http://t.co/5Yl4DC1Tqt'],
       dtype=object), array([0, 

## Converting text into numbers

When dealing with text problem, one of the first things you need to do is numerically encode the text.

Methods:

1. Tokenization - direct mapping of token (word or character to number) or one-hot encoding.

2 - Embedding - creating a matrix of feature vectors for each token. The size of the vector can be defined and this embedding, which is essentially a matrix of weights can be learned.

## Text vectorization (tokenization)

In [14]:
import tensorflow as tf
# from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from tensorflow.keras.layers import TextVectorization

In [15]:
# Use the default TextVectorization parameters
text_vectorizer = TextVectorization(max_tokens=None,
                                    standardize="lower_and_strip_punctuation",
                                    split="whitespace",
                                    ngrams=None,
                                    output_mode="int",
                                    output_sequence_length=None
                                    )

In [16]:
# find the average number of tokens (words) in the training tweets

In [17]:
len(train_sentences[0].split())


7

In [18]:

round(sum([len(i.split()) for i in train_sentences])/len(train_sentences))

15

In [19]:
# set up text vectorization variables
max_vocab_length = 10000 #max number of words to have in our vocabulary
max_length = 15 # max length our sequences will be (e.g. how many words from a tweet does our model see?)

text_vectorizer = TextVectorization(max_tokens=max_vocab_length,
                                    output_mode="int",
                                    output_sequence_length=max_length)

# fit the text vectorizer to the training text
text_vectorizer.adapt(train_sentences)

In [20]:
sample_sentence="there is a flood in my street!"
text_vectorizer([sample_sentence])

<tf.Tensor: shape=(1, 15), dtype=int64, numpy=
array([[ 74,   9,   3, 232,   4,  13, 698,   0,   0,   0,   0,   0,   0,
          0,   0]])>

* Note that the shape is (1,15) because we passed it in **1** sequence and **15** is because the max_length is 15.

In [21]:
text_vectorizer(["there is a man in my backyard!"])

<tf.Tensor: shape=(1, 15), dtype=int64, numpy=
array([[  74,    9,    3,   89,    4,   13, 6143,    0,    0,    0,    0,
           0,    0,    0,    0]])>

In [22]:
random_sentence = random.choice(train_sentences)
print(f"Original text: \n{random_sentence}\n\n\nVectorized Version: {text_vectorizer([random_sentence])}")

Original text: 
Beware of your temper and a loose tongue! These two dangerous weapons combined can lead a person to the Hellfire #islam!


Vectorized Version: [[4096    6   33 3346    7    3 1819 1748  222  116 1418  258 2515   71
  1393]]


In [23]:
# get the unique words in the vocubalary
words_in_vocab = text_vectorizer.get_vocabulary() # get all the unique words in our training data
top_5_words = words_in_vocab[:10]
bottom_5_words = words_in_vocab[-10:]
print(f"Number of words in vocab: {len(words_in_vocab)} \n\n5 most common words: \n{top_5_words}\n\n5 least common words: \n{bottom_5_words}")
# [UNK] is unknown text, that is it's outside of 10000 words

Number of words in vocab: 10000 

5 most common words: 
['', '[UNK]', 'the', 'a', 'in', 'to', 'of', 'and', 'i', 'is']

5 least common words: 
['painthey', 'painful', 'paine', 'paging', 'pageshi', 'pages', 'paeds', 'pads', 'padres', 'paddytomlinson1']


## Text vectorization (embedding)
`tf.keras.layers.Embedding`
turns positive integers into dense vectors of fixed size
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding

The parameters we care most about for our embedding layer:
* `input_dim` - the size of our vocabulary
* `output_dim` - the size of the output embedding vector e.g. a value of 100 means each token gets represented by a vector of length 100
* `input_length` - the length of sequences passed into the embedding layer (in this case, it's 15)

In [24]:
from tensorflow.keras import layers
embedding = layers.Embedding(input_dim=max_vocab_length, #set input shape
                             output_dim=128, #neural networks work best with numbers divisible by 8
                             input_length=max_length # how long is each input
)

In [25]:
# test on random sentences from the training set
random_sentence = random.choice(train_sentences)
print(f"Original text: \n{random_sentence}\
n\nEmbedded version:")
# embed the random sentence (turn it into dense vectors of fixed size)
sample_embed = embedding(text_vectorizer(random_sentence))
sample_embed

Original text: 
No don't evacuate the students just throw them in the dungeon. That is stupid.n
Embedded version:


<tf.Tensor: shape=(15, 128), dtype=float32, numpy=
array([[-0.0458607 ,  0.01816807, -0.02580811, ..., -0.04666785,
         0.0346473 ,  0.02744099],
       [ 0.02099374, -0.03005712,  0.00930461, ..., -0.01458265,
         0.02010695, -0.02697699],
       [ 0.0079918 ,  0.01031456,  0.03410769, ..., -0.00135984,
        -0.02071763, -0.01041263],
       ...,
       [ 0.03802755,  0.00078378,  0.04083348, ..., -0.04772193,
        -0.04600535,  0.04822692],
       [-0.02552068, -0.03836172, -0.01723952, ...,  0.03295627,
        -0.04284518, -0.00408127],
       [ 0.03980645, -0.01484523, -0.02595969, ..., -0.0440552 ,
        -0.04664922, -0.00611497]], dtype=float32)>

In [26]:
sample_embed = tf.expand_dims(sample_embed, axis=0)

In [27]:
# check out a single token's embedding
sample_embed[0][0], sample_embed[0][0].shape, random_sentence

(<tf.Tensor: shape=(128,), dtype=float32, numpy=
 array([-0.0458607 ,  0.01816807, -0.02580811, -0.03211554,  0.01282436,
        -0.01188555, -0.03748599,  0.01447311,  0.00770329, -0.03373734,
        -0.04786849, -0.01033177, -0.03739934, -0.0299038 , -0.03773041,
         0.00764878, -0.03520491,  0.010307  ,  0.01986578,  0.01575235,
        -0.04802128,  0.01620081, -0.02974489, -0.01073744,  0.0077072 ,
        -0.033354  , -0.01995434, -0.00134494, -0.03513313,  0.03414101,
        -0.02182283, -0.04649058,  0.01975454,  0.01777165, -0.02613107,
        -0.00736674, -0.01469647, -0.03670442, -0.00747863, -0.01748114,
        -0.04757627,  0.03621948,  0.01499205, -0.04908527,  0.01224456,
        -0.02071856, -0.03606253, -0.01804631, -0.0420357 ,  0.01277703,
        -0.02817786, -0.03638612, -0.04839441,  0.04215503,  0.01673306,
        -0.04128503, -0.00764592,  0.03699971, -0.01620064,  0.02963973,
         0.03912009, -0.03050065,  0.01027741,  0.03022255,  0.0348619 ,
  

# Modelling our text dataset - running a series of experiments

It's time to start building a series of modelling experiments, starting with a baseline and moving on from there:

* Model 0: Naive Bayes (baseline)
* Model 1: feed-forward neural network (dense model)
* Model 2: LSTM model (long-short term memory) (RNN)
* Model 3: GRU model (RNN)
* Model 4: Bidirectional LSTM model (RNN)
* Model 5: 1D Convolutional Neural network
* Model 6: Tensorflow Hub pretrained feature extracctor (using transfer learning for NLP)
* Model 7: same as 6 with 10% of the dataset

Method of approach: standard steps with modelling with tensorflow:
- prepare data -> build -> compile -> fit -> evaluate -> experiment and improve

## Model 0 - getting a baseline
This will be our baseline model that serves as a benchmark for future experiments to build up. We're going to use `sklearn` Multinomial Naive Bayes using the TF-IDF formula to convert our words to numbers. 

* üîë It's common practice to use non-DL algorithm as a baseline because of their speed and later use DL to see how to improve upon them.

In [28]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline


In [29]:
# Create tokenization and modelling pipeline
model_0 = Pipeline([
    ("tfidf", TfidfVectorizer()), # convert words to numbers using tfidf
    ("clf", MultinomialNB()) # model the text using this classifier(clf)
])

# fit the pipeline to the training data
model_0.fit(train_sentences, train_labels)

In [30]:
# evaluate our baseline model
baseline_score = model_0.score(val_sentences, val_labels) 
#.score is for sklearn what .evaluate is for tensorflow. the default evaluation metric for classification is accuracy

In [31]:
print(f"Our baseline score achieves an accuracy of {baseline_score*100:.2f}%")

Our baseline score achieves an accuracy of 79.27%


In [32]:
# make predictions
baseline_preds = model_0.predict(val_sentences)
baseline_preds[:20]

array([1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1])

In [33]:
# Creating evaluation function
def evaluation (model, val_sentences, val_labels):
  """Function to return the evaluation metrics of a model 
  given the model and the validation data
  """
  from sklearn.metrics import recall_score, precision_score, classification_report
  accuracy = model.score(val_sentences, val_labels)
  predicted_labels = model.predict(val_sentences)
  precision = precision_score(val_labels, predicted_labels)
  recall = recall_score(val_labels, predicted_labels)
  report = classification_report(val_labels, predicted_labels)

  return accuracy, precision, recall, report

In [34]:
base_evaluation = evaluation(model_0, val_sentences, val_labels)
print(f"Accuracy is: {base_evaluation[0]*100:.2f}%. \nPrecision Score is:{base_evaluation[1]:.2f}\
\nRecall Score is: {base_evaluation[2]:.2f} \
\n\n\nClassification Report is {base_evaluation[3]}")

Accuracy is: 79.27%. 
Precision Score is:0.89
Recall Score is: 0.63 


Classification Report is               precision    recall  f1-score   support

           0       0.75      0.93      0.83       414
           1       0.89      0.63      0.73       348

    accuracy                           0.79       762
   macro avg       0.82      0.78      0.78       762
weighted avg       0.81      0.79      0.79       762



In [35]:
# Creating evaluation function
def calculate_results (y_true, y_preds):
  """Function to return the evaluation metrics of a model 
  given the model and the validation data
  """
  from sklearn.metrics import accuracy_score, precision_recall_fscore_support
  model_accuracy = accuracy_score(y_true, y_preds) *100
  
  model_prediction, model_recall, model_f1, _ = precision_recall_fscore_support(y_true, y_preds,
                                                                                average="weighted")
  model_results = {"accuracy": model_accuracy,
                   "prediction": model_prediction,
                   "recall": model_recall,
                   "f1_score": model_f1}

  return model_results

In [36]:
baseline_results = calculate_results(val_labels, baseline_preds)
baseline_results

{'accuracy': 79.26509186351706,
 'prediction': 0.8111390004213173,
 'recall': 0.7926509186351706,
 'f1_score': 0.7862189758049549}

## Model 1: Feedforward neural networks (dense model)


In [37]:
# Create a tensorboard callback
from helper_functions import create_tensorboard_callback
SAVE_DIR = 'model_logs'

In [38]:
# Build model with Functional API
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype=tf.string) # or "string" Inputs are 1-dimensional strings
x = text_vectorizer(inputs) # numerically encode the input texts
x = embedding(x) # create an embedding of the numerized numbers
x = layers.GlobalAveragePooling1D()(x) # condense the feature vector for each token to one vector
# without the above, I kept getting errors
outputs = layers.Dense(1, activation="sigmoid")(x)

model_1 = tf.keras.Model(inputs, outputs, name="model_1_dense")
model_1.summary()

Model: "model_1_dense"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 global_average_pooling1d (G  (None, 128)              0         
 lobalAveragePooling1D)                                          
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
Total params: 1,280,129
Trainable params: 1,280,129
N

In [39]:
# Compile model
model_1.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics="accuracy")

In [40]:
# fit the model
history_1 = model_1.fit(x=train_sentences,
                        y=train_labels,
                        epochs=5,
                        validation_data=(val_sentences, val_labels),
                        callbacks=[create_tensorboard_callback(SAVE_DIR,experiment_name="Model_1_Dense")])

Saving TensorBoard log files to: model_logs/Model_1_Dense/20230313-170547
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [41]:
results_1 = model_1.evaluate(val_sentences, val_labels)



In [42]:
baseline_results

{'accuracy': 79.26509186351706,
 'prediction': 0.8111390004213173,
 'recall': 0.7926509186351706,
 'f1_score': 0.7862189758049549}

In [43]:
model_1_preds_probs = model_1.predict(val_sentences)
model_1_preds_probs[:10], model_1_preds_probs.shape



(array([[0.32291383],
        [0.75132406],
        [0.99749315],
        [0.08116034],
        [0.09392211],
        [0.92382145],
        [0.90879637],
        [0.9914016 ],
        [0.95665306],
        [0.21461754]], dtype=float32), (762, 1))

In [44]:
# Convert model prediction probabilities to label format and squeeze out the extra dimension
model_1_preds=tf.round(tf.squeeze(model_1_preds_probs))
model_1_preds[:10]


<tf.Tensor: shape=(10,), dtype=float32, numpy=array([0., 1., 1., 0., 0., 1., 1., 1., 1., 0.], dtype=float32)>

In [45]:
# Calculate model_1 results
model_1_results = calculate_results(y_true=val_labels,
                                    y_preds=model_1_preds)
model_1_results

{'accuracy': 78.87139107611549,
 'prediction': 0.7964015586347394,
 'recall': 0.7887139107611548,
 'f1_score': 0.7848945056280915}

In [46]:
baseline_results

{'accuracy': 79.26509186351706,
 'prediction': 0.8111390004213173,
 'recall': 0.7926509186351706,
 'f1_score': 0.7862189758049549}

In [47]:
# Compare the results
import numpy as np
np.array(list(model_1_results.values())) > np.array(list(baseline_results.values()))

array([False, False, False, False])

* None of the metrics were greater than the baseline!

## Visualiizng learned embedding

In [48]:
# get the vocabulary from the text vectorization layer
words_in_vocab = text_vectorizer.get_vocabulary()
len(words_in_vocab), words_in_vocab[:10]

(10000, ['', '[UNK]', 'the', 'a', 'in', 'to', 'of', 'and', 'i', 'is'])

In [49]:
model_1.summary()

Model: "model_1_dense"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 global_average_pooling1d (G  (None, 128)              0         
 lobalAveragePooling1D)                                          
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
Total params: 1,280,129
Trainable params: 1,280,129
N

In [50]:
# get the weight matrix of the embedding layer
# these are teh numerical represenations of each token in our training data which has been trained for 5 epochs

embed_weights = model_1.get_layer("embedding").get_weights()
embed_weights = tf.squeeze(embed_weights)
embed_weights, embed_weights.shape

(<tf.Tensor: shape=(10000, 128), dtype=float32, numpy=
 array([[ 6.03997409e-02,  1.95189205e-03, -4.35978733e-02, ...,
         -6.45836964e-02, -6.67870790e-02, -2.47183759e-02],
        [ 3.43110748e-02, -1.51894167e-02,  3.54718305e-02, ...,
         -1.60110183e-02, -2.66722124e-02,  3.30437683e-02],
        [ 3.59281041e-02,  3.48134302e-02, -2.15147156e-05, ...,
          5.84828202e-03, -6.71400689e-03, -5.46348169e-02],
        ...,
        [ 2.69657113e-02,  3.60074677e-02,  9.95416567e-03, ...,
          3.83692645e-02,  3.19452174e-02,  1.57534964e-02],
        [ 1.43589433e-02,  1.73303168e-02,  2.45586270e-03, ...,
         -5.61228357e-02, -3.17440778e-02, -3.27865444e-02],
        [ 1.00543626e-01,  8.86800811e-02, -9.33860689e-02, ...,
         -6.12838119e-02, -1.00721218e-01, -1.13969699e-01]], dtype=float32)>,
 TensorShape([10000, 128]))

* Every token is represented by a 128-length vector
* Now we've gotten the embedding matrix our model has learned to represent our tokens, let's visualize it.
* Tensorflow has a tool: https://projector.tensorflow.org/
* and a guide on word embeddings - https://www.tensorflow.org/text/guide/word_embeddings

In [51]:
# create embedding files (got from tensorflow word embeddings documentation)
import io 
out_v = io.open('vectors.tsv', 'w', encoding='utf-8')
out_m = io.open('metadata.tsv', 'w', encoding='utf-8')

for index, word in enumerate(words_in_vocab):
  if index == 0:
    continue  # skip 0, it's padding.
  vec = embed_weights[index]
  out_v.write('\t'.join([str(x) for x in vec]) + "\n")
  out_m.write(word + "\n")
out_v.close()
out_m.close()

KeyboardInterrupt: ignored

In [None]:
# download files from Colab to upload to project
try:
  from google.colab import files
  files.download('vectors.tsv')
  files.download('metadata.tsv')
except Exception:
  pass

##  Recurrent Neural Networks (RNNs)

RNNs are useful for sequence data.

the premise of recurrent neural networks is to use the representation of a previous input to aid the representation of a later input.


üìñ Resources: Overviews of RNNs are the following - 
* MIT's sequence modelling lecture
* Chris Olah's intro to LSTM - https://colah.github.io/posts/2015-08-Understanding-LSTMs/
* https://karpathy.github.io/2015/05/21/rnn-effectiveness/

## Model 2: LSTM
LSTM - long short term memory

our structure of an RNN typically looks like this:

``` 
input(text) -> tokenize -> embedding -> layers (RNN/dense) -> output (label probability)
```

In [52]:
# create an LSTM model
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = embedding(x)
# x = layers.LSTM(units=64, return_sequences=True)(x)
# when stacking RNN cells together, need to return Sequences
x = layers.LSTM(64)(x)
# x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_2 = tf.keras.Model(inputs, outputs, name="model_2_LSTM")
model_2.summary()

Model: "model_2_LSTM"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 lstm (LSTM)                 (None, 64)                49408     
                                                                 
 dense_1 (Dense)             (None, 1)                 65        
                                                                 
Total params: 1,329,473
Trainable params: 1,329,473
Non-trainable params: 0
____________________________________________

In [53]:
# compile and fit
model_2.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics="accuracy")
history_2 = model_2.fit(train_sentences, train_labels,
                        validation_data=(val_sentences, val_labels),
                        epochs=5,
                        callbacks=create_tensorboard_callback(SAVE_DIR, experiment_name="model_2_LSTM"))

Saving TensorBoard log files to: model_logs/model_2_LSTM/20230313-171051
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [54]:
# make predictions with LSTM model
model_2_pred_probs = model_2.predict(val_sentences)
model_2_pred_probs[:10]



array([[0.01208567],
       [0.759593  ],
       [0.9995072 ],
       [0.13979708],
       [0.00111495],
       [0.9867425 ],
       [0.58826125],
       [0.99946874],
       [0.9991367 ],
       [0.55757886]], dtype=float32)

In [55]:
model_2_preds = tf.round(tf.squeeze(model_2_pred_probs))

In [56]:
model_2_results = calculate_results(y_true = val_labels, y_preds= model_2_preds)
model_2_results, baseline_results

({'accuracy': 77.03412073490814,
  'prediction': 0.7715893693867238,
  'recall': 0.7703412073490814,
  'f1_score': 0.7684486602580174},
 {'accuracy': 79.26509186351706,
  'prediction': 0.8111390004213173,
  'recall': 0.7926509186351706,
  'f1_score': 0.7862189758049549})

In [57]:
np.array(list(model_2_results.values()))>np.array(list(baseline_results.values()))

array([False, False, False, False])

## Model 3: GRU

GRU (Gated recurrent unit) cell has similar features to LSTM but less parameters

In [58]:
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = embedding(x)
x = layers.GRU(64)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_3 = tf.keras.Model(inputs, outputs, name="model_3_GRU")
model_3.summary()

Model: "model_3_GRU"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 gru (GRU)                   (None, 64)                37248     
                                                                 
 dense_2 (Dense)             (None, 1)                 65        
                                                                 
Total params: 1,317,313
Trainable params: 1,317,313
Non-trainable params: 0
_____________________________________________

In [59]:
model_3.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics="accuracy")
history_3 = model_3.fit(train_sentences, train_labels,
            validation_data=(val_sentences, val_labels),
            epochs=5,
            callbacks=[create_tensorboard_callback(SAVE_DIR, experiment_name="model_3_GRU")]
    
)

Saving TensorBoard log files to: model_logs/model_3_GRU/20230313-171134
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [60]:
# evaluate the data
model_3_preds_probs = model_3.predict(val_sentences)
model_3_preds = tf.round(tf.squeeze(model_3_preds_probs))
model_3_results = calculate_results(y_true=val_labels, y_preds=model_3_preds)
model_3_results



{'accuracy': 77.69028871391076,
 'prediction': 0.7836526472838238,
 'recall': 0.7769028871391076,
 'f1_score': 0.7729557843731072}

In [61]:
np.array(list(results_3))>np.array(list(baseline_results))

NameError: ignored

In [None]:
def calculate_predictions_and_results(model, val_sentences=val_sentences, val_labels=val_labels):
  model_pred_probs = model.predict(val_sentences)
  model_preds = tf.squeeze(tf.round(model_pred_probs))
  model_results = calculate_results(val_labels, model_preds)
  print(np.array(list(model_results.values()))>np.array(list(baseline_results.values())))
  
  return model_results

In [None]:
model_3_results = calculate_predictions_and_results(model_3)

## Model 4: Bidirectional RNN

* Normal RNN go in one direction (left to right, for English for example),
* bidirectional RNN go from right to left as well as left to right

In [None]:
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = embedding(x)
# x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x)
# x = layers.Bidirectional(layers.GRU(64))(x)
x = layers.Bidirectional(layers.LSTM(64))(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_4 = tf.keras.Model(inputs, outputs, name="model_4_bidirectional")
model_4.summary()

* Note: how the shape of the bidirectional layer is twice its input i.e. 64 becomes 128

In [None]:
model_4.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics="accuracy")
history_4 = model_4.fit(train_sentences, train_labels,
                        validation_data=(val_sentences, val_labels),
                        epochs=5,
                        callbacks=[create_tensorboard_callback(SAVE_DIR, "model_4_bidirectional")])

In [None]:
model_4_results = calculate_predictions_and_results(model_4)
model_4_results

## Model 5: Conv1D

Convolutional Neural Networks for texts (and other types of sequences)

we've used CNNs for iamges but images are usually 2D but text data is 1D.
previously we've used `conv2D` for image data, but now we'll use `conv1D`

```

inputs(text) -> tokenization -> embedding -> layers(conv1D & pooling) -> output layer
```

In [93]:
# test out our embedding layer, conv1D layer and max pooling
embedding_test = embedding(text_vectorizer(["this is a test sentence"]))
conv_1d = layers.Conv1D(filters=64,
                        kernel_size=5,
                        activation="relu",
                        padding="valid")
conv_1d_output = conv_1d(embedding_test)
max_pool = layers.GlobalMaxPool1D()
max_pool_output = max_pool(conv_1d_output) # get the most important feature

embedding_test.shape, conv_1d_output.shape, max_pool_output.shape

(TensorShape([1, 15, 128]), TensorShape([1, 11, 64]), TensorShape([1, 64]))

In [94]:
# building the model
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = embedding(x)
x = conv_1d(x)
x = max_pool(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_5 = tf.keras.Model(inputs, outputs, name="model_5_cnn")
model_5.summary()

Model: "model_5_cnn"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_24 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  multiple                 0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       multiple                  1280000   
                                                                 
 conv1d (Conv1D)             (None, 11, 64)            41024     
                                                                 
 global_max_pooling1d (Globa  (None, 64)               0         
 lMaxPooling1D)                                                  
                                                                 
 dense_28 (Dense)            (None, 1)                 

In [None]:
model_5.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics="accuracy")
history_5 = model_5.fit(train_sentences, train_labels,
                        validation_data=(val_sentences, val_labels),
                        epochs=5,
                        callbacks=[create_tensorboard_callback(SAVE_DIR,"model_5_cnn")])

In [None]:
model_5_results = calculate_predictions_and_results(model_5)
model_5_results

## Model 6: Using a feature extractor
Now we've nuilt some of our own models, let's try and use Transfer Learning for NLP.
USE Feature extractor 
https://www.tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder

the input is variable length English text and the output is a 512 dimensional vector

In [None]:
sample_sentence

In [None]:
import tensorflow_hub as hub
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
embed_samples = embed([sample_sentence,
                       "when you call the universal sentence encoder on a sentence, it turns it into numbers"])
print(embed_samples[0][:50])

In [None]:
embed_samples.shape

In [None]:
# how to build the feature extractor
# create a Keras Layer using the USe pretrained layer
encoder_layer = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
                              input_shape=[],
                              dtype="string",
                              trainable=False,
                              name="USE")

In [None]:
model_6 = tf.keras.Sequential([
    encoder_layer,
    layers.Dense(64, activation="relu"),
    layers.Dense(1,activation="sigmoid")
    ],
    name="model_6_USE_feature_extractor"
)
model_6.summary()

In [None]:
model_6.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics="accuracy")
history_6 = model_6.fit(train_sentences, train_labels,
                        validation_data=(val_sentences, val_labels),
                        epochs=5,
                        callbacks=[create_tensorboard_callback(SAVE_DIR, "model_6_use_feature_extractor")]
)

In [None]:
model_6_results = calculate_predictions_and_results(model_6)
model_6_results

## Model 7: TF Hub Pretrained USE with 10% of the trianing data

Replicating model 6 but trained on only 10% of the data. Transfer Learning really helps when you don't have a large dataset.

In [None]:
train_df_shuffled.head()

In [None]:
train_10_percent = train_df_shuffled[["text", "target"]].sample(frac=0.1, random_state=42)

In [None]:
train_10_percent.head()

In [None]:
train_sentences_10_percent=train_10_percent.text.to_list()
train_labels_10_percent=train_10_percent.target.to_list()
len(train_sentences_10_percent), len(train_labels_10_percent)

In [None]:
# check the number of our tragers in our subset of data
train_10_percent.target.value_counts()

In [None]:
train_df_shuffled.target.value_counts()

### Cloning models

to recreate the same model, you can clone the model
`tf.keras.clone_model`
it will reset the model with its original weights not the trained models

In [None]:
# let's build the same model_6
model_7 = tf.keras.models.clone_model(model_6)
model_7.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics="accuracy")
model_7.summary()

In [None]:
# Can't change the name so we're going to copy and paste this one again!
model_7 = tf.keras.Sequential([
    encoder_layer,
    layers.Dense(64, activation="relu"),
    layers.Dense(1,activation="sigmoid")
    ],
    name="model_7_USE_feature_extractor"
)
model_7.summary()
model_7.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics="accuracy")

In [None]:
history_7 = model_7.fit(train_sentences_10_percent, train_labels_10_percent,
                        validation_data=(val_sentences, val_labels),
                        epochs=5,
                        callbacks=[create_tensorboard_callback(SAVE_DIR,"model_7")])

In [None]:
calculate_predictions_and_results(model_7)

In [None]:
calculate_predictions_and_results(model_6)

* ü§î Why are the results for 10% of the data better than the results on the full data?

Answer: **Data leakage**
* Both `train_sentences_10_percent` and `validation_data` are taken from `train_df_shuffled` so there's a good chance that there's an overlap between the two so it's a big no no no in machine learning.

In [None]:
train_10_percent_split = int(0.1* len(train_sentences))
train_10_percent_sentences= train_sentences[:train_10_percent_split]
train_10_percent_labels=train_labels[:train_10_percent_split]
train_10_percent[:10]

In [None]:
len(train_10_percent_sentences), len(train_10_percent_labels)

In [None]:
pd.Series(train_10_percent_labels).value_counts()

In [None]:
# Can't change the name so we're going to copy and paste this one again!
model_8 = tf.keras.Sequential([
    encoder_layer,
    layers.Dense(64, activation="relu"),
    layers.Dense(1,activation="sigmoid")
    ],
    name="model_7_USE_feature_extractor_correct_split"
)
model_8.summary()
model_8.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics="accuracy")

In [None]:
history_8 = model_8.fit(train_sentences_10_percent, train_labels_10_percent,
                        validation_data=(val_sentences, val_labels),
                        epochs=5,
                        callbacks=[create_tensorboard_callback(SAVE_DIR,"model_8")])

In [None]:
model_7_results = calculate_predictions_and_results(model_8)
model_7_results

In [None]:
calculate_predictions_and_results(model_6)

# Comparing the performance of each of our models


In [None]:
# Combine model results into DataFrame
all_model_results = pd.DataFrame({"0_baseline": baseline_results,
                                  "1_simple_dense": model_1_results,
                                  "2_lstm": model_2_results,
                                  "3_gru": model_3_results,
                                  "4_bidirectional": model_4_results,
                                  "5_conv1d": model_5_results,
                                  "6_tf_hub_USE_encoder": model_6_results,
                                  "7_tf_hub_USE_encoder_10_percent": model_7_results})
all_model_results = all_model_results.transpose()
all_model_results["accuracy"] = all_model_results["accuracy"]/100
all_model_results

In [None]:
# plot and compare all of the model results
all_model_results.plot(kind="bar",
                       figsize=(10, 7)).legend(bbox_to_anchor=(1.0, 1.0));

In [None]:
# sort model results by f1 score
all_model_results.sort_values("f1_score", ascending=False)["f1_score"].plot(kind="bar", figsize=(10,7));
# meaning: sort the model results according to f1_score in descending order, then extract the f1 column and plot a bar graph

In [None]:
! tensorboard dev

## Uploading our model training logs to Tensorboard.dev
We can further inspect our performance using Tensorboard.dev

In [None]:
# upload tensorboard dev records

!tensorboard dev upload --logdir /content/model_logs \
--name "NLP Modelling Experiments ZTM TF Course" \
--description "Comparing multiple different types of model architectures on the Kaggle tweets text classification dataset" \
--one_shot # exit the upload once uploading is finished

## Saving and loading a trained model
There are 2 formats to save a model in Tensorflow:
1. HDF5 format
2. The `savedmodel` format

In [None]:
# Save model to HDF5 format
model_6.save("model_6.h5")

In [None]:
import tensorflow_hub as hub
loaded_model_6 = tf.keras.models.load_model("model_6.h5",
                                            custom_objects={"KerasLayer":hub.KerasLayer})

In [None]:
model_6_results

In [None]:
# calculate_predictions_and_results(loaded_model_6)

In [None]:
# Save TF Hub encoder to SavedModel format (default)
# model_6.save("model_6_Savedmodel_format")

In [None]:
loaded_model6_saved_model_format = tf.keras.models.load_model("model_6_Savedmodel_format")

In [None]:
calculate_predictions_and_results(loaded_model6_saved_model_format)

In [None]:


from google.colab import files

In [None]:
files.download("/content/model_6_Savedmodel_format")

## Finding the most wrong examples

* if our best model isn't perfect, what examples is it getting wrong?

* and of all these wrong examples, which is getting the most wrong?

for example if a sample should have a label of 0, but our model predicts 0.9999 (really close to 1) and vice versa

In [None]:
# Download a pretrained model from Google storage

In [None]:
!wget https://storage.googleapis.com/ztm_tf_course/08_model_6_USE_feature_extractor.zip
unzip_data("08_model_6_USE_feature_extractor.zip")

In [None]:
model_6_pretrained = tf.keras.models.load_model("/content/08_model_6_USE_feature_extractor")

In [None]:
calculate_predictions_and_results(model_6_pretrained)

In [None]:
model_6_pretrained_preds_probs = tf.squeeze(model_6_pretrained.predict(val_sentences))

model_6_pretrained_preds=tf.round(model_6_pretrained_preds_probs)


In [None]:
# Create DataFrame with validation sentences
val_df = pd.DataFrame({"text": val_sentences,
                       "target": val_labels,
                       "pred": model_6_pretrained_preds,
                       "pred_prob": model_6_pretrained_preds_probs})
val_df.head()

In [None]:
len(val_df)

In [None]:
# find the wrong predictions and sort by predition probabilities
most_wrong = val_df[val_df.pred != val_df.target].sort_values("pred_prob", ascending=False)
most_wrong.head(20) # false positives

In [None]:
most_wrong.tail(20) # false negatives

In [None]:
# Checking the false positives...
for row in most_wrong[:10].itertuples():
  _, text, target, pred, pred_prob = row
  print(f"Target: {target}, Pred: {pred}, Prob: {pred_prob}")
  print(f"Text:\n{text}\n--------\n")

In [None]:
# Checking the false negatives...

for row in most_wrong[-10:].itertuples():
  _, text, target, pred, pred_prob = row
  print(f"Target: {target}, Pred: {pred}, Prob: {pred_prob}")
  print(f"Text:\n{text}\n--------\n")

## Making predictions on the Test Dataset

In [None]:
test_df

In [None]:
type(val_sentences)

In [None]:
test_sentences = test_df["text"].to_numpy()
test_sentences[:10]

In [None]:
test_pred_probs = model_6_pretrained.predict(test_sentences)

In [None]:
test_pred_probs.shape

In [None]:
test_pred_probs[:5]

In [None]:
test_pred_probs=tf.squeeze(test_pred_probs)
test_preds = tf.round(test_pred_probs)
test_pd = pd.DataFrame({"text":test_sentences,
                        "target_predicted": test_preds,
                        "predict_probabilities": test_pred_probs})

In [None]:
test_pd.head()


In [None]:
test_pd[test_pd["target_predicted"]==1]

### Visualizing predictions (mrdbourke)

In [None]:
# Visualizing predictions 
sample_test = random.sample(range(len(test_pd)),5)
for sample in sample_test:
  print(f"\nPred:  {test_pd.loc[sample].target_predicted} \t Probability: {test_pd.loc[sample].predict_probabilities} \n")
  print(f"Text: {test_pd.loc[sample].text} \n\n------\n")

In [None]:
test_pd.iloc[2].text

In [None]:
model_6_pretrained.predict(["911"])

# The speed/score tradeoff

In [None]:
# let's make a function to measure the time of prediction
import time
def pred_timer(model, samples):
  """
  Times how long a model takes to make predictions on samples
  """
  start_time = time.perf_counter()
  model.predict(samples)
  end_time = time.perf_counter()
  total_time = end_time - start_time
  time_per_predictions = total_time/len(samples)
  return total_time, time_per_predictions 

In [None]:
# Calculate TF Hub Sentence encoder time per pred
model_6_total_pred_time, model_6_time_per_pred = pred_timer(model_6_pretrained, samples=val_sentences)

In [None]:
model_6_total_pred_time, model_6_time_per_pred

In [None]:
# Calculate our baseline model times per pred
baseline_total_pred_time, baseline_time_per_pred = pred_timer(model_0, samples=val_sentences)
baseline_total_pred_time, baseline_time_per_pred

In [None]:
model_6_pretrained_results = calculate_predictions_and_results(model_6_pretrained)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.figure(figsize=(10,7))
plt.scatter(baseline_time_per_pred, baseline_results["f1_score"], label="baseline")
plt.scatter(model_6_time_per_pred, model_6_pretrained_results["f1_score"], label="tf_hub_encoder")
plt.legend()
plt.title("F1-score versus time per prediction")
plt.xlabel("Time per prediction")
plt.ylabel("F1 Score")
plt.ylim(0.7,0.9)

* Question - was the improvement in performance worth the loss in time???

# Exercises

## 1. Build models 1, 2 and 5 with Sequential API

In [90]:
model_1_seq = tf.keras.Sequential([
    layers.Input(shape=(1,),dtype="string"),
    text_vectorizer,
    embedding,
    layers.GlobalAveragePooling1D(),
    layers.Dense(1, activation="sigmoid")
], name="model_1_seq")

model_1_seq.compile(loss="binary_crossentropy", optimizer="Adam", metrics="accuracy")
model_1_seq.summary()

Model: "model_1_seq"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 text_vectorization_1 (TextV  multiple                 0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       multiple                  1280000   
                                                                 
 global_average_pooling1d_9   (None, 128)              0         
 (GlobalAveragePooling1D)                                        
                                                                 
 dense_25 (Dense)            (None, 1)                 129       
                                                                 
Total params: 1,280,129
Trainable params: 1,280,129
Non-trainable params: 0
_________________________________________________________________


In [67]:
model_1.summary()

Model: "model_1_dense"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 global_average_pooling1d (G  (None, 128)              0         
 lobalAveragePooling1D)                                          
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
Total params: 1,280,129
Trainable params: 1,280,129
N

In [92]:
model_2_seq = tf.keras.Sequential([
    layers.Input(shape=(1,),dtype="string"),
    text_vectorizer,
    embedding,
    layers.LSTM(64),
    layers.Dense(1,activation="sigmoid")
], name="model_2_seq")

model_2_seq.compile(loss="binary_crossentropy", optimizer="Adam", metrics="accuracy")
model_2_seq.summary(), model_2.summary()

Model: "model_2_seq"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 text_vectorization_1 (TextV  multiple                 0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       multiple                  1280000   
                                                                 
 lstm_14 (LSTM)              (None, 64)                49408     
                                                                 
 dense_27 (Dense)            (None, 1)                 65        
                                                                 
Total params: 1,329,473
Trainable params: 1,329,473
Non-trainable params: 0
_________________________________________________________________
Model: "model_2_LSTM"
_________________________________________________________________
 Layer (type)          

(None, None)

In [None]:
# building the model
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = embedding(x)
x = conv_1d(x)
x = max_pool(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model_5 = tf.keras.Model(inputs, outputs, name="model_5_cnn")
model_5.summary()

In [104]:
model_5_seq = tf.keras.Sequential([
    layers.Input(shape=(1,), dtype="string"),
    text_vectorizer,
    embedding,
    layers.Conv1D(filters=64,
                  kernel_size=5,
                  activation="relu",
                  padding="valid"),
    layers.MaxPooling1D(),
    layers.Dense(1, activation="sigmoid")
])
model_5_seq.compile(loss="binary_crossentropy", metrics="accuracy", optimizer="Adam")
model_5_seq.summary(), model_5.summary()

Model: "sequential_25"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 text_vectorization_1 (TextV  multiple                 0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       multiple                  1280000   
                                                                 
 conv1d_7 (Conv1D)           (None, 11, 64)            41024     
                                                                 
 max_pooling1d_6 (MaxPooling  (None, 5, 64)            0         
 1D)                                                             
                                                                 
 dense_35 (Dense)            (None, 5, 1)              65        
                                                                 
Total params: 1,321,089
Trainable params: 1,321,089
N

(None, None)