<a href="https://colab.research.google.com/github/trapti-singh/portfolio1/blob/main/nlp%20trial%202.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# --- NLTK Tokenization and POS Tagging ---
import nltk
from nltk.tokenize import word_tokenize

# Download necessary NLTK data
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('punkt_tab')


sentence = "I love you , you love me, we are one big family"
words = word_tokenize(sentence)
print("Tokenized words:", words)
print("POS tags:", nltk.pos_tag(words))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


Tokenized words: ['I', 'love', 'you', ',', 'you', 'love', 'me', ',', 'we', 'are', 'one', 'big', 'family']
POS tags: [('I', 'PRP'), ('love', 'VBP'), ('you', 'PRP'), (',', ','), ('you', 'PRP'), ('love', 'VBP'), ('me', 'PRP'), (',', ','), ('we', 'PRP'), ('are', 'VBP'), ('one', 'CD'), ('big', 'JJ'), ('family', 'NN')]


In [2]:
# --- spaCy Named Entity Recognition ---
import spacy

# Try to load spaCy model, download if not present
try:
    nlp = spacy.load("en_core_web_sm")
except OSError:
    import os
    os.system("python -m spacy download en_core_web_sm")
    nlp = spacy.load("en_core_web_sm")

text = "hi this is me your mom, r u joining us for dinner at hyatt mumbai, working at apple, as a ceo"

doc = nlp(text)
print("\nEntities found (all):")
for ent in doc.ents:
    print(ent.text, ent.label_)

print("\nEntities found (filtered):")



Entities found (all):
mumbai GPE

Entities found (filtered):


In [3]:
for ent in doc.ents:
    if ent.label_ in ["GPE", "ORG", "PERSON"]:
        print(ent.text, ent.label_)

# --- Transformers: Sentiment Analysis ---
from transformers import pipeline

sentiment = pipeline("sentiment-analysis")
print("\nSentiment analysis:", sentiment("it was wow movie ever watched"))





mumbai GPE


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Device set to use cpu



Sentiment analysis: [{'label': 'POSITIVE', 'score': 0.999757707118988}]


In [4]:
# --- Transformers: Translation ---
# English to French
translator_fr = pipeline("translation_en_to_fr")
print("\nFrench translation:", translator_fr("I love you , you love me, we are one big family"))

# English to Hindi
translator_hi = pipeline("translation", model="Helsinki-NLP/opus-mt-en-hi")
text = "I love you, you love me, we are one big family"
result_hi = translator_hi(text)
print(f"\nOriginal: {text}")
print(f"Hindi: {result_hi[0]['translation_text']}")

# English to Urdu
translator_ur = pipeline("translation", model="Helsinki-NLP/opus-mt-en-ur")
result_ur = translator_ur(text)
print(f"\nOriginal: {text}")
print(f"Urdu: {result_ur[0]['translation_text']}")

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu



French translation: [{'translation_text': 'Je vous aime , vous me aimez, nous sommes une grande famille'}]


Device set to use cpu



Original: I love you, you love me, we are one big family
Hindi: मैं तुमसे प्यार करता हूँ, तुम मुझसे प्यार करते हो, हम एक बड़ा परिवार हैं


Device set to use cpu



Original: I love you, you love me, we are one big family
Urdu: میں تم سے محبت کرتا ہوں، تم مجھ سے محبت کرتے ہو، ہم ایک بڑا خاندان ہیں


In [5]:
# --- Transformers: Text Generation (GPT-2) ---
generator = pipeline("text-generation", model="gpt2")
print("\nText generation:", generator("once upon a time in mumbai", truncation=True))



# --- Note on pyttsx3 ---
# If you intend to use pyttsx3 for text-to-speech, ensure eSpeak or eSpeak-ng is installed in your system.
# It is not supported in Google Colab or environments without system-level TTS engines.
# Remove/comment the pyttsx3 part if running in restricted environments like Colab.

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Text generation: [{'generated_text': 'once upon a time in mumbai. We had to spend most of our time there.\n\n"We always used to go out, go out in the morning and do things in the afternoon. When I was 15, I went to Pune and got a job. At that time, the government and the police didn\'t give a crap about us.\n\n"But after 20 years of this, we got a job and got a job at a railway station. It\'s a good life. When we had a job, we didn\'t want to go out. We went to a temple and went to the Jagan temple and got a job. We moved to Delhi and got a job at a hotel.\n\n"After 20 years of this, we got a job at a railway station.\n\n"We were staying at a restaurant on the outskirts of Delhi, and we got a job at a railway station. When we got a job at a railway station, we stayed in Delhi. We were staying in a hotel at Jahnpur. We got a job at a restaurant. When we got a job at a railway station, we stayed in Delhi."\n\nHe said he had not received any further updates on his story that he had been 

In [2]:
# --- Conversational Chatbot with DialoGPT ---
from transformers import pipeline, set_seed
import torch

# Set seed for reproducibility
set_seed(42)
torch.manual_seed(42)

# Use text-generation pipeline with the DialoGPT model
# We will manage conversation history manually
chatbot = pipeline("text-generation", model="microsoft/DialoGPT-medium")

# Initialize conversation history
conversation_history = ""

while True:
    question = input("You: ")
    if question.lower() in ['exit', 'quit', 'bye']:
        print("Robot: Bye! See you soon 🤖")
        break

    # Append user input to the conversation history
    # DialoGPT expects input in a specific format, often ending with the model's turn indicator
    # A common format is "user input<|endoftext|>model response<|endoftext|>user input..."
    # For the first turn, just add the user input. For subsequent turns, add the model response first.
    if conversation_history:
        conversation_history += chatbot.tokenizer.eos_token + question
    else:
        conversation_history = question

    # Generate response based on the conversation history
    # We need to encode the conversation history and generate text
    inputs = chatbot.tokenizer.encode(conversation_history + chatbot.tokenizer.eos_token, return_tensors='pt')
    outputs = chatbot.model.generate(inputs, max_length=1000, pad_token_id=chatbot.tokenizer.eos_token_id)
    generated_text = chatbot.tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract the model's response
    # The model's response is the part of the generated text after the last user input
    try:
        # Find the position after the last user input
        # This assumes the model's output includes the input
        last_user_input_start = generated_text.rfind(question)
        if last_user_input_start != -1:
             # Adjust the start position to be after the user's last input
            response_start = last_user_input_start + len(question)
            model_response = generated_text[response_start:].strip()
        else:
             # If the user input is not found in the generated text (unlikely but a fallback)
            model_response = generated_text.strip()

        # Append the model's response to the conversation history for the next turn
        conversation_history = generated_text # Update history with full turn

        print("\nRobot:", model_response)

    except Exception as e:
        print(f"Error processing response: {e}")
        print("\nRobot:", generated_text) # Print full generated text as fallback

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


pytorch_model.bin:   0%|          | 0.00/863M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/863M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


You: what is AI


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.



Robot: Artificial Intelligence
You: how many types of AI we have

Robot: I think we have a few
You: name them

Robot: I think we have a few
You: annoying

Robot: I think you mean AnnoyingAI


ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipython-input-2-227564753.py", line 17, in <cell line: 0>
    question = input("You: ")
               ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 1177, in raw_input
    return self._input_request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 1219, in _input_request
    raise KeyboardInterrupt("Interrupted by user") from None
KeyboardInterrupt: Interrupted by user

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 2099, in showtraceback
    stb = value._render_traceback_()
          ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Keyb

TypeError: object of type 'NoneType' has no len()