# Light Sentiment Analysis Demo (RNN, LSTM, GRU, Transformer, BERT)
This notebook demonstrates toy sentiment analysis using several model families.
Training uses a very small dataset and few epochs for fast demo purposes.
Results are illustrative, not production-quality.

In [17]:
!pip install -q transformers sentencepiece tensorflow --upgrade
print("Install complete")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m620.7/620.7 MB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.12.0 requires tensorflow==2.19.0, but you have tensorflow 2.20.0 which is incompatible.
tensorflow-text 2.19.0 requires tensorflow<2.20,>=2.19.0, but you have tensorflow 2.20.0 which is incompatible.
tf-keras 2.19.0 requires tensorflow<2.20,>=2.19, but you have tensorflow 2.20.0 which is incompatible.[0m[31m
[0mInstall complete


## 1) Tiny Sentiment Dataset

In [2]:

import random

data = [
    ("I love this movie", "positive"),
    ("This is the worst book I have ever read", "negative"),
    ("The food was okay, nothing special", "neutral"),
    ("Amazing performance by the actors", "positive"),
    ("I hate getting up early", "negative"),
    ("It is just another day", "neutral")
]
random.shuffle(data)

for s, l in data:
    print(f"Text: {s} | Label: {l}")


Text: It is just another day | Label: neutral
Text: This is the worst book I have ever read | Label: negative
Text: Amazing performance by the actors | Label: positive
Text: I love this movie | Label: positive
Text: The food was okay, nothing special | Label: neutral
Text: I hate getting up early | Label: negative


## 2) Tokenization & Label Encoding

In [3]:

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

texts = [s for s, l in data]
labels = [l for s, l in data]

label_map = {'negative':0, 'neutral':1, 'positive':2}
y = np.array([label_map[l] for l in labels])

# Tokenizer
tokenizer = Tokenizer(oov_token="<oov>")
tokenizer.fit_on_texts(texts)
vocab_size = len(tokenizer.word_index)+1

max_len = max(len(t.split()) for t in texts)
X = tokenizer.texts_to_sequences(texts)
X = pad_sequences(X, maxlen=max_len, padding='post')

print("Vocabulary size:", vocab_size, "Max sequence length:", max_len)


Vocabulary size: 30 Max sequence length: 9


## 3) Build Seq Models (RNN / LSTM / GRU)

In [4]:

from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Input, Embedding, SimpleRNN, LSTM, GRU, Dense, GlobalAveragePooling1D

def build_seq_model(cell_type='rnn', embedding_dim=32, latent_dim=32):
    model = Sequential()
    model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_len))
    if cell_type=='rnn':
        model.add(SimpleRNN(latent_dim))
    elif cell_type=='lstm':
        model.add(LSTM(latent_dim))
    elif cell_type=='gru':
        model.add(GRU(latent_dim))
    model.add(Dense(3, activation='softmax'))
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model


## 4) Train & Demo Seq Models

In [5]:

models = {}
for cell in ['rnn','lstm','gru']:
    print('\nTraining', cell.upper())
    m = build_seq_model(cell_type=cell, embedding_dim=32, latent_dim=32)
    m.fit(X, y, epochs=50, batch_size=2, verbose=0)
    models[cell] = m
    print("Predictions:")
    preds = np.argmax(m.predict(X, verbose=0), axis=1)
    for t, l, p in zip(texts, labels, preds):
        pred_label = [k for k,v in label_map.items() if v==p][0]
        print(f"Text: {t} | GOLD: {l} | PRED: {pred_label}")



Training RNN




Predictions:
Text: It is just another day | GOLD: neutral | PRED: neutral
Text: This is the worst book I have ever read | GOLD: negative | PRED: negative
Text: Amazing performance by the actors | GOLD: positive | PRED: positive
Text: I love this movie | GOLD: positive | PRED: positive
Text: The food was okay, nothing special | GOLD: neutral | PRED: neutral
Text: I hate getting up early | GOLD: negative | PRED: negative

Training LSTM
Predictions:
Text: It is just another day | GOLD: neutral | PRED: neutral
Text: This is the worst book I have ever read | GOLD: negative | PRED: negative
Text: Amazing performance by the actors | GOLD: positive | PRED: positive
Text: I love this movie | GOLD: positive | PRED: positive
Text: The food was okay, nothing special | GOLD: neutral | PRED: neutral
Text: I hate getting up early | GOLD: negative | PRED: negative

Training GRU
Predictions:
Text: It is just another day | GOLD: neutral | PRED: neutral
Text: This is the worst book I have ever read | GOL

## 5) Tiny Transformer (Educational)

In [6]:

from tensorflow.keras.layers import MultiHeadAttention, LayerNormalization, Dropout, GlobalAveragePooling1D

def build_tiny_transformer(vocab_size, seq_len, embedding_dim=32, num_heads=2, ff_dim=64):
    inp = Input(shape=(seq_len,))
    emb = Embedding(vocab_size, embedding_dim)(inp)
    attn = MultiHeadAttention(num_heads=num_heads, key_dim=embedding_dim)(emb, emb)
    x = LayerNormalization(epsilon=1e-6)(attn + emb)
    ff = Dense(ff_dim, activation='relu')(x)
    ff = Dense(embedding_dim)(ff)
    x = LayerNormalization(epsilon=1e-6)(ff + x)
    x = GlobalAveragePooling1D()(x)
    out = Dense(3, activation='softmax')(x)
    model = Model(inp, out)
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

trans_model = build_tiny_transformer(vocab_size, max_len)
trans_model.fit(X, y, epochs=50, batch_size=2, verbose=0)
preds = np.argmax(trans_model.predict(X, verbose=0), axis=1)
print("Predictions (Tiny Transformer):")
for t, l, p in zip(texts, labels, preds):
    pred_label = [k for k,v in label_map.items() if v==p][0]
    print(f"Text: {t} | GOLD: {l} | PRED: {pred_label}")


Predictions (Tiny Transformer):
Text: It is just another day | GOLD: neutral | PRED: neutral
Text: This is the worst book I have ever read | GOLD: negative | PRED: negative
Text: Amazing performance by the actors | GOLD: positive | PRED: positive
Text: I love this movie | GOLD: positive | PRED: positive
Text: The food was okay, nothing special | GOLD: neutral | PRED: neutral
Text: I hate getting up early | GOLD: negative | PRED: negative


## 6) BERT Sentiment (Pretrained, Inference)

In [18]:
from transformers import pipeline

# Use pipeline with model name (no manual TF load needed)
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

texts = [
    "I love this movie",
    "This is the worst book I have ever read",
    "The food was okay, nothing special"
]

for t in texts:
    res = classifier(t)[0]
    label = res['label']
    if label == 'NEGATIVE':
        pred_label = 'negative'
    else:
        pred_label = 'positive'
    print(f"Text: {t} | PRED: {pred_label} | Score: {res['score']:.2f}")


Device set to use cpu


Text: I love this movie | PRED: positive | Score: 1.00
Text: This is the worst book I have ever read | PRED: negative | Score: 1.00
Text: The food was okay, nothing special | PRED: negative | Score: 0.98


## Notes
- All models are trained on a tiny dataset and few epochs for fast demonstration.
- Predictions are illustrative and may not match real-world sentiment.
- For real applications, use larger datasets and more epochs, and consider BERT fine-tuning.