
**Tauha Imran** | _Buildables AI Fellowship – Week 2_  

[LinkedIn](https://www.linkedin.com/in/tauha-imran-6185b3280/) · [GitHub](https://github.com/tauhaimran) · [Portfolio](https://tauhaimran.github.io/)  

---

### Assignment 1: LLM Understanding

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.

Encoder-only models focus on understanding input text by creating contextual embeddings (good for classification or sentiment analysis).
Decoder-only models generate text autoregressively, predicting the next token (good for text completion or chatbots).
Encoder-decoder models first encode input into a representation, then decode it into output (good for translation or summarization).

Example usage:

Encoder-only: BERT → Sentiment analysis

Decoder-only: GPT-3 → Story generation

Encoder-decoder: T5 → Text summarization

---


### Assignment 2: STT/TTS Exploration

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:
  * What it does.
  * One possible application.

STT Model: DeepSpeech (Mozilla)

What it does: Converts spoken audio into written text using RNN-based acoustic and language models.

Application: Transcribing lectures or meetings into text.

TTS Model: Tacotron 2 (Google)

What it does: Generates natural-sounding speech from text by combining a sequence-to-sequence model with a vocoder.

Application: Creating realistic voice assistants or audiobook narration.

---

###  Assignment 3: Build a Chatbot with Memory

* Write a Python program that:

  * Takes user input in a loop.
  * Sends it to Groq API.
  * Stores the last 5 messages in memory.
  * Ends when user types `"quit"`.

In [None]:
# basic chatbot with memmory
import os
from collections import deque
from dotenv import load_dotenv
from groq import Groq

#loading environment variable
load_dotenv()

#getting & verifiying the key
if not os.getenv("GROQ_API_KEY"):
    raise RuntimeError( "GROQ API KEY NOT FOUND - plz check .env configuration")
else:
    GROP_API_KEY = os.getenv("GROQ_API_KEY")
    #print(GROP_API_KEY)

#loading model and prompt
MODEL = "llama-3.3-70b-versatile"
SYSTEM_PROMPT = "You are a concise, helpful assistant."

def chatbot():
    client = Groq() #creating a groq client
    memory = deque(maxlen=5) # to store last 5 mssgs


    print("Type \"quit\" to exit.\n")

    while True:
        user_input = input("Your Mssg: ").strip()

        if user_input.lower() == "quit":
            print("you : 'quit' ")
            print("groq: bye bye")
            print("-------------------------------------------\n\n")
            break

        #appending to memory - user-input
        memory.append({"role": "user" ,  "content" : user_input})
        # construct context: system + rolling memory
        messages = [ {"role": "user" ,  "content" : SYSTEM_PROMPT}] + list(memory)

        try:
            resp = client.chat.completions.create( model=MODEL,messages=messages)
            assistant_text = resp.choices[0].message.content.strip()
        except Exception as e:
            assistant_text = f"(Error calling Groq API: {e})"



        print(f"You: {user_input}\n")
        print(f"Bot: {assistant_text}\n")
        print("-------------------------------------------\n\n")

        # push assistant reply into memory
        memory.append({"role": "assistant", "content": assistant_text})
    
        


#TESTING THIS CHATBOT
chatbot()

Type "quit" to exit.



---

### Assignment 4: Preprocessing Function

* Write a function to clean user input:

  * Lowercase text.
  * Remove punctuation.
  * Strip extra spaces.

Test with: `"  HELLo!!!  How ARE you?? "`

In [1]:
# file: preprocess_basic.py
import re

def simple_clean(text: str) -> str:
    # 1) lowercase
    text = text.lower()
    # 2) remove punctuation (keep letters, digits, whitespace)
    text = re.sub(r"[^\w\s]", " ", text)
    # 3) collapse multiple spaces + strip ends
    text = re.sub(r"\s+", " ", text).strip()
    return text

if __name__ == "__main__":
    s = "  HELLo!!!  How ARE you?? "
    print(simple_clean(s))  # -> "hello how are you"


hello how are you


---

### Assignment 5: Text Preprocessing

* Write a function that:

    * Converts text to lowercase.
    * Removes punctuation & numbers.
    * Removes stopwords (`the, is, and...`).
    * Applies stemming or lemmatization.
    * Removes words shorter than 3 characters.
    * Keeps only nouns, verbs, and adjectives (using POS tagging).

pip install spacy
python -m spacy download en_core_web_sm

In [None]:
# file: preprocess_advanced.py
from typing import List
import spacy
from spacy.lang.en.stop_words import STOP_WORDS

# load once (small English model provides POS + lemmas)
_nlp = spacy.load("en_core_web_sm")

_ALLOWED_POS = {"NOUN", "VERB", "ADJ"}

def preprocess_text_spacy(text: str) -> List[str]:
    """
    Returns a list of cleaned tokens after:
      - lowercasing
      - removing punctuation & numbers
      - removing stopwords
      - lemmatizing
      - removing tokens shorter than 3 chars
      - keeping only nouns, verbs, adjectives
    """
    # lowercase early (helps consistency)
    doc = _nlp(text.lower())

    cleaned = []
    for tok in doc:
        # keep only alphabetic tokens (no punctuation/numbers)
        if not tok.is_alpha:
            continue

        # pos filter
        if tok.pos_ not in _ALLOWED_POS:
            continue

        lemma = tok.lemma_.lower()

        # remove stopwords
        if lemma in STOP_WORDS:
            continue

        # min length
        if len(lemma) < 3:
            continue

        cleaned.append(lemma)

    return cleaned

#--------------------------------------

#TESTING
sample = "AI-driven systems are transforming 2025 industries rapidly!!!"
print(preprocess_text_spacy(sample))
# Example output: ['ai', 'drive', 'system', 'transform', 'industry', 'rapid']