<a href="https://colab.research.google.com/github/theabhinav0231/NLP-Assignment/blob/main/NLP_Assignment_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Install necessary Libraries**

In [1]:
!pip install nltk



In [6]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer

In [7]:
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

# **Ques 1: Take a custom paragraph, perform the entire pipeline and Print results at each step.**

Tokenization → Stopword Removal → Stemming → Lemmatization.

In [4]:
custom_paragraph = "So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her."
print(f"Custom Paragraph from Text File:\n'{custom_paragraph}'\n")

Custom Paragraph from Text File:
'So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.'



### **Tokenization**

In [8]:
tokens = word_tokenize(custom_paragraph)
print("\nTokens:\n", tokens)


Tokens:
 ['So', 'she', 'was', 'considering', 'in', 'her', 'own', 'mind', '(', 'as', 'well', 'as', 'she', 'could', ',', 'for', 'the', 'hot', 'day', 'made', 'her', 'feel', 'very', 'sleepy', 'and', 'stupid', ')', ',', 'whether', 'the', 'pleasure', 'of', 'making', 'a', 'daisy-chain', 'would', 'be', 'worth', 'the', 'trouble', 'of', 'getting', 'up', 'and', 'picking', 'the', 'daisies', ',', 'when', 'suddenly', 'a', 'White', 'Rabbit', 'with', 'pink', 'eyes', 'ran', 'close', 'by', 'her', '.']


### **Stopword Removal**

In [9]:
stop_words = set(stopwords.words('english'))
filtered_tokens = [w for w in tokens if w.lower() not in stop_words and w.isalpha()]
print("\nAfter Stopword Removal:\n", filtered_tokens)


After Stopword Removal:
 ['considering', 'mind', 'well', 'could', 'hot', 'day', 'made', 'feel', 'sleepy', 'stupid', 'whether', 'pleasure', 'making', 'would', 'worth', 'trouble', 'getting', 'picking', 'daisies', 'suddenly', 'White', 'Rabbit', 'pink', 'eyes', 'ran', 'close']


### **Stemming**

In [10]:
ps = PorterStemmer()
stemmed = [ps.stem(w) for w in filtered_tokens]
print("\nAfter Stemming:\n", stemmed)


After Stemming:
 ['consid', 'mind', 'well', 'could', 'hot', 'day', 'made', 'feel', 'sleepi', 'stupid', 'whether', 'pleasur', 'make', 'would', 'worth', 'troubl', 'get', 'pick', 'daisi', 'suddenli', 'white', 'rabbit', 'pink', 'eye', 'ran', 'close']


### **Lemmatization**

In [11]:
lemmatizer = WordNetLemmatizer()
lemmatized = [lemmatizer.lemmatize(w) for w in filtered_tokens]
print("\nAfter Lemmatization:\n", lemmatized)


After Lemmatization:
 ['considering', 'mind', 'well', 'could', 'hot', 'day', 'made', 'feel', 'sleepy', 'stupid', 'whether', 'pleasure', 'making', 'would', 'worth', 'trouble', 'getting', 'picking', 'daisy', 'suddenly', 'White', 'Rabbit', 'pink', 'eye', 'ran', 'close']


# **Ques 2: Define NLP and its real time application in a specific domain base.**

**Natural Language Processing (NLP):**
NLP is a branch of Artificial Intelligence (AI) that enables computers to understand, process, and generate human language.  
It bridges the gap between **human communication** (natural languages like English, Hindi, etc.) and **machine understanding** (structured data and logic).

NLP involves:
- **Syntax** (structure of language: grammar, sentence formation)  
- **Semantics** (meaning of words and sentences)  
- **Pragmatics** (context and intent behind words)


#### **Core Capabilities of NLP:**
- Text preprocessing (tokenization, stemming, lemmatization)  
- Sentiment analysis (positive, negative, neutral opinions)  
- Named Entity Recognition (NER) (extracting names, places, organizations, etc.)  
- Machine translation (English → French, etc.)  
- Speech recognition and synthesis  
- Summarization and Question Answering  


#### **Real-Time Applications of NLP in Different Domains**

1. **Healthcare**  
   - Extracts key details from clinical notes and medical histories.  
   - Supports chatbots that answer patient queries.  
   - Detects drug interactions by analyzing research papers.  
   - Example: IBM Watson Health analyzing cancer research and patient data to assist doctors.

2. **Finance**  
   - Reads financial news to predict stock market trends.  
   - Automates compliance checks in banking.  
   - Detects fraud through unusual transaction language.  
   - Example: AI chatbots in banks explaining credit card charges to customers.

3. **Education**  
   - Automated grading of essays using sentiment and grammar analysis.  
   - Personalized learning assistants that answer students’ questions.  
   - Language learning apps that correct grammar and pronunciation.  
   - Example: Duolingo using NLP for adaptive language learning.

4. **Customer Support**  
   - Virtual assistants (chatbots) for resolving user complaints.  
   - Understanding intent from customer emails.  
   - Example: ChatGPT-style bots used in e-commerce websites.

5. **Law / Legal Industry**  
   - Summarizes lengthy legal documents.  
   - Extracts relevant cases and judgments.  
   - Example: AI legal assistants that provide case references based on user queries.

# **Ques 3: What is NLU and NLG?**

**Natural Language Understanding (NLU):**  
- NLU is a subfield of NLP focused on helping machines *understand* the meaning, context, and intent behind human language.  
- It answers: **“What did the user mean?”**  
- It deals with ambiguity (same sentence can mean different things in different contexts).  

**Tasks in NLU:**
- Intent Recognition (e.g., "Book a cab" → intent = booking)  
- Named Entity Recognition (NER) (extracting names, dates, places, etc.)  
- Sentiment Analysis (positive/negative/neutral opinions)  
- Context understanding (e.g., “He went there” → who is *he*, where is *there*?)  

*Example:*  
User says → **“Remind me to call mom at 7 pm.”**  
NLU extracts:  
- Intent = reminder  
- Action = call  
- Person = mom  
- Time = 7 pm  


**Natural Language Generation (NLG):**  
- NLG is a subfield of NLP that enables machines to *produce* human-like language from structured data or representations.  
- It answers: **“How can the system express this information in natural language?”**  

**Tasks in NLG:**
- Text Summarization (long articles → short summary)  
- Report Generation (data → sentences)  
- Dialogue generation (chatbots responding to queries)  
- Storytelling and creative writing  

*Example:*  
Input Data → {city: "Delhi", temperature: 35°C, condition: "Sunny"}  
NLG Output → **“Today in Delhi, it’s sunny with a temperature of 35°C.”**


#### **Key Differences Between NLU and NLG**
| Feature | NLU | NLG |
|---------|-----|-----|
| Focus   | Understanding language | Generating language |
| Input   | Human text/speech | Structured data / machine representation |
| Output  | Machine-readable meaning (intents, entities, context) | Human-readable sentences or speech |
| Example | “Book me a flight to Goa” → intent: book flight, destination: Goa | Generates sentence: “Your flight to Goa has been booked successfully.” |
