## Q1 NLP Preprocessing Pipeline

In [4]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

# Download necessary resources (only once)
nltk.download('punkt')
nltk.download('stopwords')
# Download the 'punkt_tab' resource
nltk.download('punkt_tab') # This line is added to download the missing data.

def preprocess_nlp(sentence):
    # 1. Tokenize the sentence into individual words
    tokens = word_tokenize(sentence)
    print("Original Tokens:")
    print(tokens)

    # 2. Remove common English stopwords
    stop_words = set(stopwords.words('english'))
    filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
    print("\nTokens Without Stopwords:")
    print(filtered_tokens)

    # 3. Apply stemming to reduce each word to its root form
    stemmer = PorterStemmer()
    stemmed_words = [stemmer.stem(word) for word in filtered_tokens]
    print("\nStemmed Words:")
    print(stemmed_words)

# Test the function
sentence = "NLP techniques are used in virtual assistants like Alexa and Siri."
preprocess_nlp(sentence)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Original Tokens:
['NLP', 'techniques', 'are', 'used', 'in', 'virtual', 'assistants', 'like', 'Alexa', 'and', 'Siri', '.']

Tokens Without Stopwords:
['NLP', 'techniques', 'used', 'virtual', 'assistants', 'like', 'Alexa', 'Siri', '.']

Stemmed Words:
['nlp', 'techniqu', 'use', 'virtual', 'assist', 'like', 'alexa', 'siri', '.']


# 1. What is the difference between stemming and lemmatization? Provide examples with the word “running.”
Stemming is a crude method that chops off word endings to reduce words to their base or "stem" form. It may not return a valid word.

Example:

running → run (using Porter Stemmer)

running → runn (sometimes, depending on stemmer)

Lemmatization is a smarter process that uses vocabulary and grammar to return the dictionary base form (lemma) of a word.

Example:

running → run (if it's a verb)

It considers part of speech (POS) to do this accurately.

Key difference: Lemmatization is more accurate and meaningful; stemming is faster but more error-prone.

# 2. Why might removing stop words be useful in some NLP tasks, and when might it actually be harmful?
Useful:
Removes common, low-information words (e.g., “the”, “is”, “in”) that don’t help in tasks like:

Text classification

Topic modeling

Search/retrieval

Helps reduce noise and dimensionality.

Harmful:
In tasks where context and sentence structure matter:

Sentiment analysis: “I am not happy” – removing “not” flips the meaning.

Machine translation: stop words may be critical in forming correct syntax.

Text summarization or question answering: important details might be lost.

#Q 2 Named Entity Recognition with SpaCy

In [5]:
import spacy

# Load English NLP model
nlp = spacy.load("en_core_web_sm")

# Input sentence
sentence = "Barack Obama served as the 44th President of the United States and won the Nobel Peace Prize in 2009."

# Process sentence
doc = nlp(sentence)

# Extract Named Entities
print("Named Entities:")
for ent in doc.ents:
    print(f"Text: {ent.text}, Label: {ent.label_}, Start: {ent.start_char}, End: {ent.end_char}")


Named Entities:
Text: Barack Obama, Label: PERSON, Start: 0, End: 12
Text: 44th, Label: ORDINAL, Start: 27, End: 31
Text: the United States, Label: GPE, Start: 45, End: 62
Text: the Nobel Peace Prize, Label: WORK_OF_ART, Start: 71, End: 92
Text: 2009, Label: DATE, Start: 96, End: 100


# 1. How does NER differ from POS tagging in NLP?
NER (Named Entity Recognition) identifies real-world entities like people, locations, organizations, and dates in text.

Example: "Barack Obama" → PERSON, "2009" → DATE

POS (Part-of-Speech) Tagging assigns grammatical roles to words like nouns, verbs, adjectives.

Example: "Obama" → NNP (Proper Noun), "served" → VBD (Verb, Past Tense)

# 2. Describe two applications that use NER in the real world:
Financial News Monitoring

Detects companies, stock symbols, and market events.

E.g., Bloomberg/Reuters extract named entities from news articles to identify market movers.

Search Engines

Improves query understanding and result relevance.

E.g., Google understands “Apple CEO” → Recognizes “Apple” as a company, not fruit.



# Q 3 Scaled Dot-Product Attention

In [8]:
import numpy as np
from scipy.special import softmax

def scaled_dot_product_attention(Q, K, V):
    d = Q.shape[-1]  # dimension of key (same as query here)

    # Step 1: Dot product Q × Kᵀ
    scores = np.dot(Q, K.T)

    # Step 2: Scale the scores by sqrt(d)
    scaled_scores = scores / np.sqrt(d)

    # Step 3: Apply softmax to get attention weights
    attention_weights = softmax(scaled_scores, axis=1)

    # Step 4: Multiply attention weights by V
    output = np.dot(attention_weights, V)

    print("Attention Weights (after softmax):")
    print(attention_weights)

    print("\nFinal Output Matrix:")
    print(output)

# Test inputs
Q = np.array([[1, 0, 1, 0], [0, 1, 0, 1]])
K = np.array([[1, 0, 1, 0], [0, 1, 0, 1]])
V = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

# Run function
scaled_dot_product_attention(Q, K, V)


Attention Weights (after softmax):
[[0.73105858 0.26894142]
 [0.26894142 0.73105858]]

Final Output Matrix:
[[2.07576569 3.07576569 4.07576569 5.07576569]
 [3.92423431 4.92423431 5.92423431 6.92423431]]


# 1. Why do we divide the attention score by √d in the scaled dot-product attention formula?
When the dimensionality d is large, dot products tend to grow large in magnitude.

This causes the softmax function to have extremely small gradients, making learning harder.

Dividing by √d normalizes the scores, preventing softmax saturation and helping stable training.



# 2. How does self-attention help the model understand relationships between words in a sentence?
Self-attention lets each word attend to every other word, allowing the model to:

Capture contextual dependencies regardless of their distance.

Learn relationships like subject-verb agreement, coreference (e.g., "he" refers to "John"), etc.

It enables dynamic weighting — important words influence the representation more.

# Q4 - Sentiment Analysis using HuggingFace Transformers

In [9]:
!pip install transformers --quiet

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

# Input sentence
sentence = "Despite the high price, the performance of the new MacBook is outstanding."

# Run prediction
result = sentiment_pipeline(sentence)[0]

# Print results
print(f"Sentiment: {result['label']}")
print(f"Confidence Score: {round(result['score'], 4)}")


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


Sentiment: POSITIVE
Confidence Score: 0.9998



#1. What is the main architectural difference between BERT and GPT?


BERT uses only the Encoder part of the Transformer and processes the input bidirectionally (looks at both left and right context).

GPT uses only the Decoder part of the Transformer and processes text in a unidirectional manner (left-to-right).

#2. Why use pre-trained models like BERT or GPT?


Pre-trained models save time and resources by leveraging knowledge from large-scale training.

They improve accuracy and performance, especially on limited labeled data.

Fine-tuning is easier and more efficient than training from scratch.