# Useful NLP Libraries & Networks

# Question 1: Compare and contrast NLTK and spaCy in terms of features, ease of use, and performance.

### Answer:

| Feature | NLTK | spaCy |
|----------|--------|--------|
| Purpose | Academic & Research | Production & Industry |
| Speed | Slower | Very Fast |
| Ease of Use | Beginner Friendly | Cleaner & Efficient API |
| Pretrained Models | Limited | Strong Built-in Models |
| Best For | Learning NLP Concepts | Real-world Applications |

NLTK is mainly used for learning and research purposes.  
spaCy is optimized for production systems and large-scale NLP tasks.

# Question 2: What is TextBlob and how does it simplify common NLP tasks like sentiment analysis and translation?

### Answer:

TextBlob is a simple NLP library built on top of NLTK and Pattern.

It simplifies:
- Sentiment Analysis
- Translation
- POS Tagging
- Noun Phrase Extraction

It allows performing NLP tasks using very few lines of code.

# Question 3: Explain the role of Standford NLP in academic and industry NLP Projects.

### Answer:

Stanford NLP (CoreNLP) is widely used in academic research and enterprise systems.

Academic Use:
- Linguistic research
- Deep learning experimentation

Industry Use:
- Named Entity Recognition
- Sentiment Analysis
- Dependency Parsing
- Coreference Resolution

# Question 4: Describe the architecture and functioning of a Recurrent Natural Network (RNN).

### Answer:

RNN (Recurrent Neural Network) is designed for sequential data.

Architecture:
- Input Layer
- Hidden Layer (with memory loop)
- Output Layer

RNN remembers previous inputs using hidden states, making it suitable for text processing.

# Question 5: What is the key difference between LSTM and GRU networks in NLP applications?

### Answer:

| Feature | LSTM | GRU |
|----------|--------|--------|
| Gates | 3 (Input, Forget, Output) | 2 (Update, Reset) |
| Complexity | More | Less |
| Speed | Slower | Faster |
| Performance | Better for long sequences | Efficient for smaller datasets |

# Question 6: Write a Python program using TextBlob to perform sentiment analysis on the following paragraph of text:
“I had a great experience using the new mobile banking app. The interface is intuitive,
and customer support was quick to resolve my issue. However, the app did crash once
during a transaction, which was frustrating"
Your program should print out the polarity and subjectivity scores.

In [1]:
!pip install textblob

from textblob import TextBlob

text = """I had a great experience using the new mobile banking app.
The interface is intuitive, and customer support was quick to resolve my issue.
However, the app did crash once during a transaction, which was frustrating."""

blob = TextBlob(text)

print("Polarity:", blob.sentiment.polarity)
print("Subjectivity:", blob.sentiment.subjectivity)

Polarity: 0.21742424242424244
Subjectivity: 0.6511363636363636


# Question 7: Given the sample paragraph below, perform string tokenization and frequency distribution using Python and NLTK:
“Natural Language Processing (NLP) is a fascinating field that combines linguistics,
computer science, and artificial intelligence. It enables machines to understand,
interpret, and generate human language. Applications of NLP include chatbots,
sentiment analysis, and machine translation. As technology advances, the role of NLP
in modern solutions is becoming increasingly critical.”

In [2]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

nltk.download('punkt')
nltk.download('punkt_tab')

text = """Natural Language Processing (NLP) is a fascinating field that combines
linguistics, computer science, and artificial intelligence. It enables machines
to understand, interpret, and generate human language. Applications of NLP
include chatbots, sentiment analysis, and machine translation. As technology
advances, the role of NLP in modern solutions is becoming increasingly critical."""

tokens = word_tokenize(text.lower())

fdist = FreqDist(tokens)

print("Top 10 Most Common Words:")
print(fdist.most_common(10))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Top 10 Most Common Words:
[(',', 7), ('.', 4), ('nlp', 3), ('and', 3), ('language', 2), ('is', 2), ('of', 2), ('natural', 1), ('processing', 1), ('(', 1)]


# Question 8: Implement a basic LSTM model in Keras for a text classification task using
the following dummy dataset. Your model should classify sentences as either positive
(1) or negative (0).
# Dataset
texts = [
“I love this project”, #Positive
“This is an amazing experience”, #Positive
“I hate waiting in line”, #Negative
“This is the worst service”, #Negative
“Absolutely fantastic!” #Positive
]
labels = [1, 1, 0, 0, 1]
Preprocess the text, tokenize it, pad sequences, and build an LSTM model to train on
this data. You may use Keras with TensorFlow backend.

In [3]:
!pip install tensorflow

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Dataset
texts = [
"I love this project",
"This is an amazing experience",
"I hate waiting in line",
"This is the worst service",
"Absolutely fantastic!"
]

labels = [1, 1, 0, 0, 1]

# Tokenization
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)

max_length = max(len(seq) for seq in sequences)
padded = pad_sequences(sequences, maxlen=max_length)

labels = np.array(labels)

# Build Model
model = Sequential()
model.add(Embedding(input_dim=50, output_dim=8, input_length=max_length))
model.add(LSTM(16))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(padded, labels, epochs=20, verbose=1)

Epoch 1/20




[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - accuracy: 0.2000 - loss: 0.6942
Epoch 2/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 73ms/step - accuracy: 0.6000 - loss: 0.6930
Epoch 3/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 141ms/step - accuracy: 0.6000 - loss: 0.6918
Epoch 4/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 55ms/step - accuracy: 0.6000 - loss: 0.6906
Epoch 5/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 57ms/step - accuracy: 0.6000 - loss: 0.6895
Epoch 6/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step - accuracy: 0.6000 - loss: 0.6883
Epoch 7/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step - accuracy: 0.6000 - loss: 0.6872
Epoch 8/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step - accuracy: 0.6000 - loss: 0.6860
Epoch 9/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48m

<keras.src.callbacks.history.History at 0x7b00aa381f10>

# Question 9: Using spaCy, build a simple NLP pipeline that includes tokenization, lemmatization, and entity recognition. Use the following paragraph as your dataset:
“Homi Jehangir Bhaba was an Indian nuclear physicist who played a key role in the
development of India’s atomic energy program. He was the founding director of the Tata
Institute of Fundamental Research (TIFR) and was instrumental in establishing the
Atomic Energy Commission of India.”

Write a Python program that processes this text using spaCy, then prints tokens, their
lemmas, and any named entities found.

In [4]:
!pip install spacy
!python -m spacy download en_core_web_sm

import spacy

nlp = spacy.load("en_core_web_sm")

text = """Homi Jehangir Bhaba was an Indian nuclear physicist who played a key role
in the development of India’s atomic energy program. He was the founding director
of the Tata Institute of Fundamental Research (TIFR) and was instrumental in
establishing the Atomic Energy Commission of India."""

doc = nlp(text)

print("Tokens and Lemmas:")
for token in doc:
    print(token.text, "->", token.lemma_)

print("\nNamed Entities:")
for ent in doc.ents:
    print(ent.text, "-", ent.label_)

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m74.3 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
Tokens and Lemmas:
Homi -> Homi
Jehangir -> Jehangir
Bhaba -> Bhaba
was -> be
an -> an
Indian -> indian
nuclear -> nuclear
physicist -> physicist
who -> who
played -> play
a -> a
key -> key
role -> role

 -> 

in -> in
the -> the
development -> development
of -> of
India -> India
’s -> ’s
atomic -> atomic
energy -> energy
program -> 

# Question 10: You are working on a chatbot for a mental health platform. Explain how you would leverage LSTM or GRU networks along with libraries like spaCy or Stanford NLP to understand and respond to user input effectively. Detail your architecture, data preprocessing pipeline, and any ethical considerations.

In [6]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense

# Define model
model = Sequential()
model.add(Embedding(input_dim=100, output_dim=8, input_length=10))
model.add(GRU(16))
model.add(Dense(1, activation='sigmoid'))

# Build the model
model.build(input_shape=(None, 10))

# Show summary
model.summary()

