## PHASE 2 -  N-grams and TF-IDF

## **Data Preparation**

*In this section, we will download the IMDb movie reviews dataset, extract its contents, and explore the structure of the data to understand how the files are organized.*


In [None]:
!wget -q --show-progress https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz

!tar -xzf aclImdb_v1.tar.gz




In [None]:
# Directory structure
!ls -R aclImdb | head -n 30



aclImdb:
imdbEr.txt
imdb.vocab
README
test
train

aclImdb/test:
labeledBow.feat
neg
pos
urls_neg.txt
urls_pos.txt

aclImdb/test/neg:
0_2.txt
10000_4.txt
10001_1.txt
10002_3.txt
10003_3.txt
1000_3.txt
10004_2.txt
10005_2.txt
10006_2.txt
10007_4.txt
10008_4.txt
10009_3.txt
10010_2.txt
10011_1.txt
10012_1.txt


In [None]:
!echo "Contents of 'train' directory:"
!ls -R aclImdb/train | head -n 20

!echo "Contents of 'train/neg' directory:"
!ls aclImdb/train/neg | head -n 10

!echo "Contents of 'train/pos' directory:"
!ls aclImdb/train/pos | head -n 10

!echo "Contents of 'train/unsup' directory:"
!ls aclImdb/train/unsup | head -n 10


Contents of 'train' directory:
aclImdb/train:
labeledBow.feat
neg
pos
unsup
unsupBow.feat
urls_neg.txt
urls_pos.txt
urls_unsup.txt

aclImdb/train/neg:
0_3.txt
10000_4.txt
10001_4.txt
10002_1.txt
10003_1.txt
10004_3.txt
1000_4.txt
10005_3.txt
10006_4.txt
Contents of 'train/neg' directory:
0_3.txt
10000_4.txt
10001_4.txt
10002_1.txt
10003_1.txt
10004_3.txt
1000_4.txt
10005_3.txt
10006_4.txt
10007_1.txt
Contents of 'train/pos' directory:
0_9.txt
10000_8.txt
10001_10.txt
10002_7.txt
10003_8.txt
10004_8.txt
10005_7.txt
10006_7.txt
10007_7.txt
10008_7.txt
Contents of 'train/unsup' directory:
0_0.txt
10000_0.txt
1000_0.txt
10001_0.txt
10002_0.txt
10003_0.txt
10004_0.txt
10005_0.txt
10006_0.txt
10007_0.txt


# **Data Exploration and Loading**

*In this section, we will read and display the content of some files to understand the data better. We will also load the reviews into a DataFrame along with their corresponding sentiments.*

## **Reading and Displaying File Contents**

*We define a function to read and display the top 10 lines of the first `.txt` file in each directory with text wrapping to make it more readable.*

In [None]:
import os
import textwrap

# Function to read and display the top 10 lines of the first .txt file with wrapping
def display_first_file_with_wrapping(directory):
    files = sorted(os.listdir(directory))
    with open(os.path.join(directory, files[0]), 'r', encoding='utf-8') as file:
        print(f"Top 10 lines of {files[0]} from {directory} directory:\n")
        for i, line in enumerate(file):
            if i < 10:
                wrapped_line = textwrap.fill(line.strip(), width=80)
                print(wrapped_line)
            else:
                break
        print("\n")

# Display the first file in the train/neg directory
display_first_file_with_wrapping('aclImdb/train/neg')

# Display the first file in the train/pos directory
display_first_file_with_wrapping('aclImdb/train/pos')

# Display the first file in the train/unsup directory
display_first_file_with_wrapping('aclImdb/train/unsup')


Top 10 lines of 0_3.txt from aclImdb/train/neg directory:

Story of a man who has unnatural feelings for a pig. Starts out with a opening
scene that is a terrific example of absurd comedy. A formal orchestra audience
is turned into an insane, violent mob by the crazy chantings of it's singers.
Unfortunately it stays absurd the WHOLE time with no general narrative
eventually making it just too off putting. Even those from the era should be
turned off. The cryptic dialogue would make Shakespeare seem easy to a third
grader. On a technical level it's better than you might think with some good
cinematography by future great Vilmos Zsigmond. Future stars Sally Kirkland and
Frederic Forrest can be seen briefly.


Top 10 lines of 0_9.txt from aclImdb/train/pos directory:

Bromwell High is a cartoon comedy. It ran at the same time as some other
programs about school life, such as "Teachers". My 35 years in the teaching
profession lead me to believe that Bromwell High's satire is much closer to

In [None]:
import pandas as pd


# Empty lists to hold the reviews and sentiments
reviews = []
sentiments = []

# Positive reviews in Dataframe and labelling them as 1-
positive_directory = 'aclImdb/train/pos'
for review_file in os.listdir(positive_directory):
    with open(os.path.join(positive_directory, review_file), 'r', encoding='utf-8') as file:
        reviews.append(file.read())
        sentiments.append(1)

# Negative reviews in Dataframe and labelling them as 0-
negative_directory = 'aclImdb/train/neg'
for review_file in os.listdir(negative_directory):
    with open(os.path.join(negative_directory, review_file), 'r', encoding='utf-8') as file:
        reviews.append(file.read())
        sentiments.append(0)

# Unsupervised reviews in Dataframe and labelling them as None-
unsupervised_directory = 'aclImdb/train/unsup'
for review_file in os.listdir(unsupervised_directory):
    with open(os.path.join(unsupervised_directory, review_file), 'r', encoding='utf-8') as file:
        reviews.append(file.read())
        sentiments.append(None)




In [None]:
train_data = pd.DataFrame({'review': reviews, 'sentiment': sentiments})
train_data['sentiment'] = train_data['sentiment'].astype('Int64')

In [None]:
train_data.head()

Unnamed: 0,review,sentiment
0,"Although she is little known today, Deanna Dur...",1
1,"I didn't expect much from this, but I have to ...",1
2,"The first of the official Ghibli films, Laputa...",1
3,"A well-made and imaginative production, refres...",1
4,"The first time I came upon Delirious, I only h...",1


In [None]:
unique_sentiments = train_data['sentiment'].unique()
unique_sentiments

<IntegerArray>
[1, 0, <NA>]
Length: 3, dtype: Int64

# **Text Preprocessing**

*In this section, we will preprocess the text data to clean and normalize it for further analysis. This involves steps such as removing HTML tags, URLs, special characters, punctuation, numbers, and stop words, as well as performing tokenization, stemming, and lemmatization.*

## **Preprocessing Steps**

*We will use the NLTK library for various preprocessing tasks. Let's start by importing the necessary libraries and downloading the required resources.*

In [None]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

# Initialize NLTK tools
stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Preprocessing texts
def preprocess_text(text):
    # Lowercasing
    text = text.lower()

    # Removing HTML tags
    text = re.sub(r'<.*?>', '', text)

    # Removing URLs
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)

    # Removing special characters and punctuation
    text = re.sub(r'\[.*?\]', '', text)
    text = re.sub(r'[^\w\s]', '', text)

    # Removing numbers
    text = re.sub(r'\d+', '', text)

    # Tokenization
    tokens = word_tokenize(text)

    # Removing stop words
    tokens = [word for word in tokens if word not in stop_words]

    # Stemming and Lemmatization

    # tokens = [stemmer.stem(word) for word in tokens]
    tokens = [lemmatizer.lemmatize(word) for word in tokens]

    # Removing non-alphabetic tokens
    tokens = [word for word in tokens if word.isalpha()]

    # Removing extra whitespace
    text = ' '.join(tokens)

    return text

# Apply preprocessing to the dataframe
train_data['processed_text'] = train_data['review'].apply(preprocess_text)



[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


In [None]:
train_data.head()

Unnamed: 0,review,sentiment,processed_text
0,"Although she is little known today, Deanna Dur...",1,although little known today deanna durbin one ...
1,"I didn't expect much from this, but I have to ...",1,didnt expect much admit rolling ground laughin...
2,"The first of the official Ghibli films, Laputa...",1,first official ghibli film laputa similar pred...
3,"A well-made and imaginative production, refres...",1,wellmade imaginative production refreshingly f...
4,"The first time I came upon Delirious, I only h...",1,first time came upon delirious heard listened ...


# **TF-IDF Vectorization**

*In this section, we will use the `TfidfVectorizer` from the `sklearn` library to convert our preprocessed text data into TF-IDF features. This method helps in representing the importance of words in the documents relative to the entire corpus.*

## **What is TF-IDF?**

*TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus.*

### **Formula**

- **TF (Term Frequency)**:
  TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)

- **IDF (Inverse Document Frequency)**:
  IDF(t) = log(Total number of documents / Number of documents with term t in it)

- **TF-IDF**:
  TF-IDF(t) = TF(t) * IDF(t)

## **Using TfidfVectorizer**

*We use the `TfidfVectorizer` to transform the processed text into a TF-IDF matrix, which we then convert into a DataFrame.*

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_features=5000)

X_tfidf = tfidf_vectorizer.fit_transform(train_data['processed_text'])

# Convert the TF-IDF matrix to a DataFrame
tfidf_df = pd.DataFrame(X_tfidf.toarray(), columns=tfidf_vectorizer.get_feature_names_out())

tfidf_df['sentiment'] = train_data['sentiment'].values



In [None]:
tfidf_df.head()

Unnamed: 0,abandon,abandoned,abc,ability,able,abrupt,absence,absent,absolute,absolutely,...,youd,youll,young,younger,youre,youth,youve,zero,zombie,zone
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.072763,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.056166,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.104796,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.147764,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# **Manual TF-IDF Calculation**

*In this section, we will manually calculate Term Frequency (TF), Inverse Document Frequency (IDF), and TF-IDF scores for each term in our corpus. This process will help us understand the underlying calculations and how TF-IDF values are derived.*

## **Calculate Term Frequency (TF)**

*The term frequency (TF) is the number of times a term appears in a document, normalized by the total number of terms in the document.*

In [None]:
# Calculate Term Frequency (TF)
def calculate_tf(documents):
    """
    Calculate term frequency (TF) for each document in the corpus.

    :param documents: List of preprocessed text documents.
    :return: List of dictionaries containing term frequency for each document.
    """
    tf_dicts = []
    for document in documents:
        words = document.split()
        tf_dict = {}
        total_words = len(words)
        for word in words:
            tf_dict[word] = tf_dict.get(word, 0) + 1
        for word in tf_dict:
            tf_dict[word] = tf_dict[word] / total_words
        tf_dicts.append(tf_dict)
    return tf_dicts



## **Calculate Inverse Document Frequency (IDF)**

*The inverse document frequency (IDF) is a measure of how much information a term provides, based on its occurrence across the documents in the corpus.*

In [None]:
# Calculate Inverse Document Frequency (IDF)
def calculate_idf(documents):
    """
    Calculate inverse document frequency (IDF) for each term in the corpus.

    :param documents: List of preprocessed text documents.
    :return: Dictionary containing IDF for each term.
    """
    idf_dict = {}
    total_documents = len(documents)
    all_words = set(word for document in documents for word in document.split())

    for word in all_words:
        containing_docs = sum(1 for document in documents if word in document.split())
        idf_dict[word] = math.log(total_documents / (1 + containing_docs))

    return idf_dict


## **Calculate TF-IDF**

*The TF-IDF value is a combination of the term frequency (TF) and the inverse document frequency (IDF) for each term in a document. It is calculated as:*

*TF-IDF(t)=TF(t)×IDF(t)*

In [None]:
# Calculate TF-IDF
def calculate_tf_idf(tf_dicts, idf_dict):
    """
    Calculate TF-IDF for each term in each document.

    :param tf_dicts: List of dictionaries containing term frequency for each document.
    :param idf_dict: Dictionary containing IDF for each term.
    :return: List of dictionaries containing TF-IDF for each term in each document.
    """
    tf_idf_dicts = []
    for tf_dict in tf_dicts:
        tf_idf_dict = {}
        for word, tf_value in tf_dict.items():
            tf_idf_dict[word] = tf_value * idf_dict.get(word, 0)
        tf_idf_dicts.append(tf_idf_dict)
    return tf_idf_dicts

In [None]:

train_data_chunk = train_data.head(30)


In [None]:
import math

documents = train_data_chunk['processed_text'].tolist()

# Calculate TF, IDF, and TF-IDF
tf_dicts = calculate_tf(documents)
idf_dict = calculate_idf(documents)
tf_idf_dicts = calculate_tf_idf(tf_dicts, idf_dict)

# Convert the TF-IDF dictionaries to a DataFrame
tf_idf_df = pd.DataFrame(tf_idf_dicts).fillna(0)

tf_idf_df['sentiment'] = train_data_chunk['sentiment'].values


In [None]:
tf_df = pd.DataFrame(tf_dicts).fillna(0)
idf_df = pd.DataFrame(list(idf_dict.items()), columns=['term', 'idf']).fillna(0)


# Term Frequency

In [None]:
tf_df.head(10)

Unnamed: 0,although,little,known,today,deanna,durbin,one,popular,star,pretty,...,treatment,ubiqutous,appearance,diverse,super,trooper,sexegenarian,jude,lawshort,haul
0,0.022727,0.011364,0.011364,0.011364,0.011364,0.022727,0.011364,0.011364,0.011364,0.011364,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.004032,0.0,0.0,0.0,0.0,0.004032,0.0,0.0,0.004032,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.004,0.0,0.0,0.0,0.0,0.0,0.004,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.007547,0.0,0.003774,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Inverse Document Frequency

In [None]:
idf_df.head(10)

Unnamed: 0,term,idf
0,amazon,2.70805
1,embrace,2.70805
2,poked,2.70805
3,cox,2.70805
4,sitting,2.70805
5,matter,1.791759
6,powerful,2.70805
7,feel,2.302585
8,solution,2.70805
9,lorna,2.70805


# TF-IDF

In [None]:
tf_idf_df.head(10)

Unnamed: 0,although,little,known,today,deanna,durbin,one,popular,star,pretty,...,ubiqutous,appearance,diverse,super,trooper,sexegenarian,jude,lawshort,haul,sentiment
0,0.040722,0.01502,0.030773,0.022897,0.030773,0.061547,0.007143,0.030773,0.026166,0.022897,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
2,0.0,0.00533,0.0,0.0,0.0,0.0,0.002535,0.0,0.0,0.008125,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
4,0.0,0.0,0.0,0.035349,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
6,0.0,0.0,0.0,0.0,0.0,0.0,0.020278,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
7,0.007167,0.0,0.0,0.0,0.0,0.0,0.002514,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
8,0.0,0.0,0.0,0.0,0.0,0.0,0.004744,0.0,0.008689,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
9,0.0,0.0,0.0,0.0,0.0,0.0,0.044901,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1


#N-GRAMS

##Build the Unigram Model
we are building a unigram model. A unigram model is the simplest form of a language model that treats each word as independent and just counts how many times each word appears.

We start with a list of sentences, split them into words, and count the occurrences of each word using Python’s Counter function.

In [None]:
from collections import Counter
import random

# Example data: List of sentences (after preprocessing)
texts = ["the quick brown fox jumps over the lazy dog",
         "the quick blue fox",
         "the lazy dog sleeps",
         "the quick red fox jumps over"]

# Build the unigram model as a frequency distribution
def build_unigram_model(texts):
    word_counts = Counter()
    for text in texts:
        words = text.split()
        word_counts.update(words)
    return word_counts

unigram_model = build_unigram_model(texts)
print("Unigram Model:", unigram_model)


Unigram Model: Counter({'the': 5, 'quick': 3, 'fox': 3, 'jumps': 2, 'over': 2, 'lazy': 2, 'dog': 2, 'brown': 1, 'blue': 1, 'sleeps': 1, 'red': 1})


##Calculate Word Probabilities
We calculate the probabilities of each word based on the counts from the unigram model. The probability of a word is calculated by dividing its count by the total number of words. Words that appear more frequently will have higher probabilities.

In [None]:
# Calculate the probability of each word
def calculate_probabilities(unigram_model):
    total_words = sum(unigram_model.values())
    probabilities = {word: count / total_words for word, count in unigram_model.items()}
    return probabilities

word_probabilities = calculate_probabilities(unigram_model)
print("Word Probabilities:", word_probabilities)


Word Probabilities: {'the': 0.21739130434782608, 'quick': 0.13043478260869565, 'brown': 0.043478260869565216, 'fox': 0.13043478260869565, 'jumps': 0.08695652173913043, 'over': 0.08695652173913043, 'lazy': 0.08695652173913043, 'dog': 0.08695652173913043, 'blue': 0.043478260869565216, 'sleeps': 0.043478260869565216, 'red': 0.043478260869565216}


##Predict the Next Word
We predict the next word based on the probabilities we calculated. The word is chosen randomly, but words with higher probabilities are more likely to be picked. This mimics how more common words appear more often in natural language.

In [None]:
# Function to predict a word based on the unigram model
def predict_word(word_probabilities):
    words, probs = zip(*word_probabilities.items())  # unzip words and their probabilities
    predicted_word = random.choices(words, weights=probs, k=1)[0]  # pick one word based on the probability distribution
    return predicted_word

predicted_word = predict_word(word_probabilities)
print("Predicted Word:", predicted_word)


Predicted Word: quick


## Using NLTK Library for Ngrams

Preprocess: Tokenize the text and remove unwanted characters.


In [None]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.util import ngrams
from nltk import FreqDist
import pandas as pd

# Download necessary NLTK resources
nltk.download('punkt')

# Sample size to avoid memory issues in Colab
N = 200

# Pre-process and tokenize the reviews
def preprocess_text(text):
    tokens = word_tokenize(text.lower())
    filtered_tokens = [word for word in tokens if word.isalpha() and word not in ['br', '<s>', '</s>']]  # Remove unwanted tokens
    return filtered_tokens




[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Generate 4-grams: Create sequences of 4 consecutive words.

Calculate Frequency: Flatten the 4-grams into a single list and calculate how often each 4-gram appears.*italicized text*

In [None]:
# Function to predict the next word
def predict_next_word(last_three_words, fdist):
    candidates = {ngram: freq for ngram, freq in fdist.items() if ngram[:3] == tuple(last_three_words)}
    if candidates:
        return max(candidates, key=candidates.get)[-1]  # Return the most frequent fourth word
    else:
        return "No prediction available"

train_data_sample = train_data.head(N).copy()

# pre-processing to each review using .loc to avoid the warning
train_data_sample.loc[:, 'tokens'] = train_data_sample['review'].apply(preprocess_text)

# Generate 4-grams for each tokenized review
train_data_sample.loc[:, 'ngrams'] = train_data_sample['tokens'].apply(
    lambda x: list(ngrams(x, 4, pad_left=True, pad_right=True, left_pad_symbol='<s>', right_pad_symbol='</s>'))
)

# Flatten the list of all 4-grams from all reviews
all_ngrams = [ngram for review in train_data_sample['ngrams'] for ngram in review]

# Get the frequency distribution of 4-grams
fdist = FreqDist(all_ngrams)


sentences = [
    ['this', 'is', 'a'],
    ['the', 'movie', 'was']
]

# Predict the next word for each sentence
for i, sentence in enumerate(sentences, 1):
    predicted_word = predict_next_word(sentence, fdist)
    print(f"Predicted next word for sentence {i} ({' '.join(sentence)}): {predicted_word}")

Predicted next word for sentence 1 (this is a): really
Predicted next word for sentence 2 (the movie was): simply


#Thank you
