(**Click the icon below to open this notebook in Colab**)

[![Open InColab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/xiangshiyin/machine-learning-for-actuarial-science/blob/main/2025-spring/week15/notebook/demo.ipynb)

# Introduction to NLP

## Preprocessing

In [1]:
import nltk

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('punkt_tab')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/xiangshiyin/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/xiangshiyin/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/xiangshiyin/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [2]:
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords


data = "This is a simple example to demonstrate removing stopwords using NLTK."
stopWords = set(stopwords.words('english'))

In [3]:
len(stopWords)

198

In [4]:
stopwords.words('english')[:10]

['a', 'about', 'above', 'after', 'again', 'against', 'ain', 'all', 'am', 'an']

In [5]:
tokenized_data = word_tokenize(data)

In [6]:
print(f"Original text: {data}")
print(f"Tokenized text: {"|".join(tokenized_data)}")

Original text: This is a simple example to demonstrate removing stopwords using NLTK.
Tokenized text: This|is|a|simple|example|to|demonstrate|removing|stopwords|using|NLTK|.


In [7]:
filtered_tokenized_data = [
    w
    for w in tokenized_data
    if w not in stopWords
]
print(f"After removing stopwords: {filtered_tokenized_data}")

After removing stopwords: ['This', 'simple', 'example', 'demonstrate', 'removing', 'stopwords', 'using', 'NLTK', '.']


In [8]:
print(f"Original text: {data}")
print(f"Tokenized text: {"|".join(tokenized_data)}")
print(f"After removing stopwords: {"|".join(filtered_tokenized_data)}")

Original text: This is a simple example to demonstrate removing stopwords using NLTK.
Tokenized text: This|is|a|simple|example|to|demonstrate|removing|stopwords|using|NLTK|.
After removing stopwords: This|simple|example|demonstrate|removing|stopwords|using|NLTK|.


## Feature Extraction

### Bag of Words

In [9]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Sample dataset
texts = [
    "I love this product",         # positive
    "This is amazing",             # positive
    "Very happy with the result",  # positive
    "I hate this",                 # negative
    "Worst experience ever",       # negative
    "Not satisfied at all"         # negative
]

labels = [1, 1, 1, 0, 0, 0]  # 1 = positive, 0 = negative

# 2. Convert text to bag-of-words vectors
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# 3. Show feature names
print("Feature Names (Vocabulary):")
print(vectorizer.get_feature_names_out())


Feature Names (Vocabulary):
['all' 'amazing' 'at' 'ever' 'experience' 'happy' 'hate' 'is' 'love' 'not'
 'product' 'result' 'satisfied' 'the' 'this' 'very' 'with' 'worst']


In [10]:
X.toarray()

array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0]])

In [11]:
X.shape

(6, 18)

### TF-IDF

In [12]:
from sklearn.datasets import load_files
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score
import nltk

# Download NLTK movie_reviews data
nltk.download('movie_reviews')
from nltk.corpus import movie_reviews

# Prepare dataset
docs = []
labels = []

for fileid in movie_reviews.fileids():
    docs.append(movie_reviews.raw(fileid))
    labels.append(movie_reviews.categories(fileid)[0])  # 'pos' or 'neg'

# Convert labels to binary format
y = [1 if label == 'pos' else 0 for label in labels]

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(docs, y, test_size=0.2, random_state=42)

# Vectorize using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Train classifier
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_tfidf, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test_tfidf)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


[nltk_data] Downloading package movie_reviews to
[nltk_data]     /Users/xiangshiyin/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!


Accuracy: 0.8275
Classification Report:
               precision    recall  f1-score   support

           0       0.84      0.81      0.82       199
           1       0.82      0.84      0.83       201

    accuracy                           0.83       400
   macro avg       0.83      0.83      0.83       400
weighted avg       0.83      0.83      0.83       400



In [13]:
feature_names = vectorizer.get_feature_names_out()
feature_names[:20]

array(['00', '000', '0009f', '007', '00s', '03', '04', '05', '05425',
       '10', '100', '1000', '10000', '100m', '101', '102', '103', '104',
       '105', '106'], dtype=object)

In [14]:
import pandas as pd

# Choose a sample document from the test set
sample_idx = 0
sample_vector = X_test_tfidf[sample_idx]

# Convert sparse vector to dense and create DataFrame
df_features = pd.DataFrame(
    data=sample_vector.toarray()[0],
    index=feature_names,
    columns=["tfidf"]
)

# Filter non-zero features and sort
df_nonzero = df_features[df_features.tfidf > 0].sort_values(by="tfidf", ascending=False)

# Show top 15 features by TF-IDF weight
print("\nTop TF-IDF features in sample test document:")
print(df_nonzero.head(15))


Top TF-IDF features in sample test document:
                 tfidf
bates         0.401204
annie         0.253364
caan          0.169454
kathy         0.148969
sledgehammer  0.127790
realises      0.117562
french        0.114832
nerve         0.107711
cake          0.104894
misery        0.102454
masterful     0.100301
basically     0.093828
prisoner      0.092226
arm           0.088148
tense         0.086170


### Word2Vec

#### Hand-craft implementation

In [15]:
import numpy as np
import re
import random

# Sample corpus
corpus = "The quick brown fox jumps over the lazy dog"

# Preprocessing: Tokenization and vocabulary building
tokens = re.findall(r'\b\w+\b', corpus.lower())
vocab = set(tokens)
word_to_idx = {word: idx for idx, word in enumerate(vocab)}
idx_to_word = {idx: word for word, idx in word_to_idx.items()}
vocab_size = len(vocab)

In [16]:
word_to_idx

{'quick': 0,
 'the': 1,
 'brown': 2,
 'dog': 3,
 'fox': 4,
 'over': 5,
 'jumps': 6,
 'lazy': 7}

In [17]:
idx_to_word

{0: 'quick',
 1: 'the',
 2: 'brown',
 3: 'dog',
 4: 'fox',
 5: 'over',
 6: 'jumps',
 7: 'lazy'}

In [18]:
tokens

['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

In [19]:
# Generate training data
def generate_training_data(tokens, window_size):
    training_data = []
    for idx, target_word in enumerate(tokens):
        target_idx = word_to_idx[target_word]
        context_range = list(range(max(0, idx - window_size), idx)) + \
                        list(range(idx + 1, min(len(tokens), idx + window_size + 1)))
        for context_idx in context_range:
            context_word = tokens[context_idx]
            context_word_idx = word_to_idx[context_word]
            training_data.append((target_idx, context_word_idx))
    return training_data

window_size = 2
training_data = generate_training_data(tokens, window_size)


In [20]:
# Inspect the training data
print(f"Corpus: {corpus}")
print([
    (idx_to_word[t[0]], idx_to_word[t[1]])
    for t in training_data
])

Corpus: The quick brown fox jumps over the lazy dog
[('the', 'quick'), ('the', 'brown'), ('quick', 'the'), ('quick', 'brown'), ('quick', 'fox'), ('brown', 'the'), ('brown', 'quick'), ('brown', 'fox'), ('brown', 'jumps'), ('fox', 'quick'), ('fox', 'brown'), ('fox', 'jumps'), ('fox', 'over'), ('jumps', 'brown'), ('jumps', 'fox'), ('jumps', 'over'), ('jumps', 'the'), ('over', 'fox'), ('over', 'jumps'), ('over', 'the'), ('over', 'lazy'), ('the', 'jumps'), ('the', 'over'), ('the', 'lazy'), ('the', 'dog'), ('lazy', 'over'), ('lazy', 'the'), ('lazy', 'dog'), ('dog', 'the'), ('dog', 'lazy')]


In [21]:
# Initialize parameters
embedding_dim = 10
W1 = np.random.randn(vocab_size, embedding_dim)
W2 = np.random.randn(embedding_dim, vocab_size)

# Sigmoid function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Training parameters
epochs = 1000
learning_rate = 0.01
num_negative_samples = 2

# Training loop
for epoch in range(epochs):
    loss = 0
    for target_idx, context_idx in training_data:
        # Positive sample
        h = W1[target_idx]
        u = np.dot(h, W2[:, context_idx])
        pred = sigmoid(u)
        error = pred - 1
        loss += -np.log(pred + 1e-7)
        # Gradients
        grad_W2 = error * h
        grad_W1 = error * W2[:, context_idx]
        # Update weights
        W2[:, context_idx] -= learning_rate * grad_W2
        W1[target_idx] -= learning_rate * grad_W1

        # Negative sampling
        negative_samples = random.sample([i for i in range(vocab_size) if i != context_idx], num_negative_samples)
        for neg_idx in negative_samples:
            u_neg = np.dot(h, W2[:, neg_idx])
            pred_neg = sigmoid(u_neg)
            error_neg = pred_neg
            loss += -np.log(1 - pred_neg + 1e-7)
            # Gradients
            grad_W2_neg = error_neg * h
            grad_W1_neg = error_neg * W2[:, neg_idx]
            # Update weights
            W2[:, neg_idx] -= learning_rate * grad_W2_neg
            W1[target_idx] -= learning_rate * grad_W1_neg
    if (epoch + 1) % 100 == 0:
        print(f"Epoch {epoch + 1}, Loss: {loss:.4f}")

Epoch 100, Loss: 43.5765
Epoch 200, Loss: 39.8485
Epoch 300, Loss: 37.8640
Epoch 400, Loss: 39.2224
Epoch 500, Loss: 41.6311
Epoch 600, Loss: 39.8883
Epoch 700, Loss: 40.8639
Epoch 800, Loss: 41.8159
Epoch 900, Loss: 41.0794
Epoch 1000, Loss: 35.5038


In [22]:
# Retrieve word embeddings
word_embeddings = W1

# Example: Find similar words
def find_similar(word, top_n=3):
    if word not in word_to_idx:
        print(f"'{word}' not in vocabulary.")
        return
    idx = word_to_idx[word]
    vec = word_embeddings[idx]
    similarities = []
    for i in range(vocab_size):
        if i == idx:
            continue
        sim = np.dot(vec, word_embeddings[i]) / (np.linalg.norm(vec) * np.linalg.norm(word_embeddings[i]))
        similarities.append((idx_to_word[i], sim))
    similarities.sort(key=lambda x: x[1], reverse=True)
    for word, sim in similarities[:top_n]:
        print(f"{word}: {sim:.4f}")

# Test the model
print("\nWords similar to 'fox':")
find_similar('fox')


Words similar to 'fox':
jumps: 0.4163
brown: 0.3140
quick: 0.3060


#### With `Gensim`

In [23]:
import gensim
from gensim.models import Word2Vec

# Sample corpus
sentences = [
    ["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"],
    ["i", "love", "natural", "language", "processing"],
    ["word2vec", "is", "a", "technique", "for", "natural", "language", "processing"],
    ["the", "dog", "is", "lazy", "but", "the", "brown", "fox", "is", "quick"]
]


In [24]:
# Initialize and train the model
model = Word2Vec(
    sentences,
    vector_size=100,  # Dimensionality of the word vectors
    window=5,         # Maximum distance between the current and predicted word
    min_count=1,      # Ignores all words with total frequency lower than this
    workers=4,        # Use these many worker threads to train the model
    sg=1              # 1 for Skip-gram; 0 for CBOW
)

In [25]:
# Find most similar words
similar_words = model.wv.most_similar("fox", topn=3)
print(similar_words)

# Compute similarity between two words
similarity = model.wv.similarity("dog", "fox")
print(f"Similarity between 'dog' and 'fox': {similarity:.4f}")

[('over', 0.16699621081352234), ('brown', 0.1388736069202423), ('quick', 0.13150502741336823)]
Similarity between 'dog' and 'fox': -0.1052


In [26]:
sample = """
Mr. Dursley was the director of a firm called Grunnings, which made
drills. He was a big, beefy man with hardly any neck, although he did
have a very large mustache. Mrs. Dursley was thin and blonde and had
nearly twice the usual amount of neck, which came in very useful as she
spent so much of her time craning over garden fences, spying on the
neighbors. The Dursleys had a small son called Dudley and in their
opinion there was no finer boy anywhere.


The Dursleys had everything they wanted, but they also had a secret, and
their greatest fear was that somebody would discover it. They didn't
think they could bear it if anyone found out about the Potters. Mrs.
Potter was Mrs. Dursley's sister, but they hadn't met for several years;
in fact, Mrs. Dursley pretended she didn't have a sister, because her
sister and her good-for-nothing husband were as unDursleyish as it was
possible to be. The Dursleys shuddered to think what the neighbors would
say if the Potters arrived in the street. The Dursleys knew that the
Potters had a small son, too, but they had never even seen him. This boy
was another good reason for keeping the Potters away; they didn't want
Dudley mixing with a child like that.


When Mr. and Mrs. Dursley woke up on the dull, gray Tuesday our story
starts, there was nothing about the cloudy sky outside to suggest that
strange and mysterious things would soon be happening all over the
country. Mr. Dursley hummed as he picked out his most boring tie for
work, and Mrs. Dursley gossiped away happily as she wrestled a screaming
Dudley into his high chair.
"""

sentences = [
    gensim.utils.simple_preprocess(sentence)
    for sentence in sample.split("\n\n")
]

In [27]:
model = Word2Vec(
    sentences,
    vector_size=100,  # Dimensionality of the word vectors
    window=5,         # Maximum distance between the current and predicted word
    min_count=1,      # Ignores all words with total frequency lower than this
    workers=4,        # Use these many worker threads to train the model
    sg=1              # 1 for Skip-gram; 0 for CBOW
)

In [28]:
# Find most similar words
similar_words = model.wv.most_similar("potter", topn=3)
print(similar_words)

[('dursley', 0.2640940845012665), ('on', 0.2523151636123657), ('starts', 0.24110931158065796)]


### `GLoVE`

- https://nlp.stanford.edu/projects/glove/
- https://radimrehurek.com/gensim/models/word2vec.html#pretrained-models

In [29]:
import gensim.downloader

In [30]:
# All available models in gensim-data
for model in gensim.downloader.info()['models'].keys():
    print(model)

fasttext-wiki-news-subwords-300
conceptnet-numberbatch-17-06-300
word2vec-ruscorpora-300
word2vec-google-news-300
glove-wiki-gigaword-50
glove-wiki-gigaword-100
glove-wiki-gigaword-200
glove-wiki-gigaword-300
glove-twitter-25
glove-twitter-50
glove-twitter-100
glove-twitter-200
__testing_word2vec-matrix-synopsis


In [31]:
glove_vectors = gensim.downloader.load('glove-twitter-25')

In [32]:
glove_vectors.most_similar('twitter', topn=20)

[('facebook', 0.948005199432373),
 ('tweet', 0.9403423070907593),
 ('fb', 0.9342358708381653),
 ('instagram', 0.9104824066162109),
 ('chat', 0.8964964747428894),
 ('hashtag', 0.8885937333106995),
 ('tweets', 0.8878158330917358),
 ('tl', 0.8778461217880249),
 ('link', 0.8778210878372192),
 ('internet', 0.8753897547721863),
 ('bio', 0.8740679621696472),
 ('skype', 0.8711126446723938),
 ('youtube', 0.8707534074783325),
 ('spam', 0.8684024214744568),
 ('tumblr', 0.8668119311332703),
 ('ex', 0.8645952939987183),
 ('ask', 0.8644779920578003),
 ('dm', 0.8439710736274719),
 ('insta', 0.8426101207733154),
 ('post', 0.8411487340927124)]

In [33]:
glove_vectors.most_similar('president', topn=20)

[('barack', 0.9471943378448486),
 ('obama', 0.9400959014892578),
 ('clinton', 0.9378828406333923),
 ('former', 0.9294927716255188),
 ('minister', 0.9137527346611023),
 ('romney', 0.9051568508148193),
 ('pope', 0.9035295248031616),
 ('senator', 0.8977937698364258),
 ('kerry', 0.8958768844604492),
 ('hillary', 0.893998920917511),
 ('potus', 0.8929537534713745),
 ('bill', 0.8894913196563721),
 ('says', 0.8860845565795898),
 ('candidate', 0.8827615976333618),
 ('justice', 0.882584273815155),
 ('gov', 0.8798654675483704),
 ('leader', 0.8764994144439697),
 ('labour', 0.8748924732208252),
 ('claims', 0.8715708255767822),
 ('reagan', 0.8713157176971436)]

In [34]:
glove_vectors.most_similar('usa', topn=20)

[('china', 0.8639861941337585),
 ('local', 0.8453366756439209),
 ('capital', 0.8419684767723083),
 ('base', 0.8405494689941406),
 ('a', 0.8330598473548889),
 ('fox', 0.8293148279190063),
 ('sub', 0.8233862519264221),
 ('america', 0.8221979737281799),
 ('union', 0.8131723403930664),
 ('media', 0.8116157650947571),
 ('club', 0.8025693297386169),
 ('pro', 0.8019703030586243),
 ('central', 0.8011414408683777),
 ('uk', 0.7979711294174194),
 ('red', 0.794592559337616),
 ('dc', 0.7931137681007385),
 ('top', 0.791972279548645),
 ('rock', 0.7895259261131287),
 ('no', 0.786898672580719),
 ('york', 0.7861546874046326)]

In [35]:
glove_vectors.get_vector('king')

array([-0.74501 , -0.11992 ,  0.37329 ,  0.36847 , -0.4472  , -0.2288  ,
        0.70118 ,  0.82872 ,  0.39486 , -0.58347 ,  0.41488 ,  0.37074 ,
       -3.6906  , -0.20101 ,  0.11472 , -0.34661 ,  0.36208 ,  0.095679,
       -0.01765 ,  0.68498 , -0.049013,  0.54049 , -0.21005 , -0.65397 ,
        0.64556 ], dtype=float32)

In [36]:
king = glove_vectors.get_vector('king')
queen = glove_vectors.get_vector('queen')
man = glove_vectors.get_vector('man')
woman = glove_vectors.get_vector('woman')

res = king - man + woman

In [37]:
# calculate the cosine similarity
from sklearn.metrics.pairwise import cosine_similarity

similarity = cosine_similarity(res.reshape(1, -1), queen.reshape(1, -1))
print(f"Similarity between queen and res: {similarity[0][0]}")

Similarity between queen and res: 0.7530912756919861


In [38]:
# calculate the cosine similarity of two vectors following the linear algebra formula
import numpy as np

def cosine_similarity(v1, v2):
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))


# Prompt Engineering

## Quick Example

In [1]:
import openai
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv(), override=True) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

python-dotenv could not parse statement starting at line 1


In [2]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    return response.choices[0].message["content"]

In [3]:
text = f"""
Cooking ma po tofu is easy. First, you need to buy some tofu. Then you need to heat some oil in a pan.
After that, you need to add the tofu to the pan. Then you need to cook the tofu. After that, you need 
to add some seasoning to the tofu. Some people might first cook some ground beef and then add the tofu.
And that's it! You have cooked some delicious tofu. Enjoy!
"""

prompt = f"""
You will be provided with text delimited by triple quotes. If the content contains a sequence of instructions,
re-write those instructions in the following format:

Step 1 - ...
Step 2 - ...
...
Step N - ...
If the content does not contain a sequence of instructions, then simply write \"No steps provided.\"
\"\"\"{text}\"\"\"
"""

response = get_completion(prompt)
print("Completion for Text-to-Step transformation:")
print(response)


Completion for Text-to-Step transformation:

Step 1 - Buy some tofu.
Step 2 - Heat some oil in a pan.
Step 3 - Add the tofu to the pan.
Step 4 - Cook the tofu.
Step 5 - Add seasoning to the tofu.
Step 6 - Optionally, cook some ground beef before adding the tofu.
Step 7 - Enjoy your delicious tofu.


## Tokens

In [7]:
import tiktoken

tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [8]:
tokenizer.encode('tiktoken is great!')

[83, 1609, 5963, 374, 2294, 0]

In [12]:
def num_tokens_from_string(string: str, model_name: str = "gpt-3.5-turbo") -> int:
    """Returns the number of tokens in a text string."""
    tokenizer = tiktoken.encoding_for_model(model_name)
    num_tokens = len(tokenizer.encode(string))
    return num_tokens

In [13]:
num_tokens_from_string(prompt)

157

In [None]:
num_tokens_from_string(response)

73

In [15]:
# turn tokens into text
tokenizer.decode([83, 1609, 5963, 374, 2294, 0])

'tiktoken is great!'

## More Examples

In [16]:
text = f"""
Cooking ma po tofu is easy. First, you need to buy some tofu. Then you need to heat some oil in a pan.
After that, you need to add the tofu to the pan. Then you need to cook the tofu. After that, you need 
to add some seasoning to the tofu. Some people might first cook some ground beef and then add the tofu.
And that's it! You have cooked some delicious tofu. Enjoy!
"""

prompt = f"""
You will be provided with text delimited by triple quotes. If the content contains a sequence of instructions,
re-write those instructions in the following format:

Step 1 - ...
Step 2 - ...
...
Step N - ...
If the content does not contain a sequence of instructions, then simply write \"No steps provided.\"
Please provide the response in JSON format with the following keys:
step_numbers, steps
\"\"\"{text}\"\"\"
"""

response = get_completion(prompt)
print("Completion for Text-to-Step transformation:")
print(response)

Completion for Text-to-Step transformation:
{
    "step_numbers": "1, 2, 3, 4, 5, 6",
    "steps": [
        "Step 1 - Buy some tofu.",
        "Step 2 - Heat some oil in a pan.",
        "Step 3 - Add the tofu to the pan.",
        "Step 4 - Cook the tofu.",
        "Step 5 - Add some seasoning to the tofu.",
        "Step 6 - Optionally, cook some ground beef before adding the tofu."
    ]
}
