# **This notebook documents the step by step process and python codes used in vibe-coding.**
**Goal: Teach the computer to tell emotions from texts**


## Step 1: The setup

We‚Äôll need a library called TextBlob ‚Äî it‚Äôs like a tiny language helper that can ‚Äúfeel‚Äù the mood of words.

In [2]:
# install the library (you run this once)
# pip install textblob

from textblob import TextBlob

## Step 2: Give it some sentences to analyze


In [5]:
sentences = [
    "I‚Äôm feeling amazing today!",
    "I hate waking up early.",
    "What a beautiful morning!",
    "I‚Äôm so tired and bored.",
    "This is the best day ever!"
]

Step 3: Let the AI check the¬†vibe


In [4]:
for text in sentences:
    blob = TextBlob(text)
    mood_score = blob.sentiment.polarity  # ranges from -1 (sad) to +1 (happy)

    if mood_score > 0:
        vibe = "üòä Positive Vibe"
    elif mood_score < 0:
        vibe = "üò¢ Negative Vibe"
    else:
        vibe = "üòê Neutral Vibe"

    print(f"{text} --> {vibe} (score:¬†{mood_score})")

I‚Äôm feeling amazing today! --> üòä Positive Vibe (score:¬†0.7500000000000001)
I hate waking up early. --> üò¢ Negative Vibe (score:¬†-0.35000000000000003)
What a beautiful morning! --> üòä Positive Vibe (score:¬†1.0)
I‚Äôm so tired and bored. --> üò¢ Negative Vibe (score:¬†-0.45)
This is the best day ever! --> üòä Positive Vibe (score:¬†1.0)


What‚Äôs happening behind the scenes:

	‚Ä¢	TextBlob looks at the words and how they‚Äôre used.
	‚Ä¢	It knows that ‚Äúamazing,‚Äù ‚Äúbeautiful,‚Äù and ‚Äúbest‚Äù are positive words.
	‚Ä¢	It also knows ‚Äúhate,‚Äù ‚Äútired,‚Äù and ‚Äúbored‚Äù usually mean negative feelings.
	‚Ä¢	Then it gives each sentence a ‚Äúvibe score.‚Äù

That‚Äôs vibe coding in baby form ‚Äî detecting emotional tone from words.

## To make this vibe detector a little smarter ‚Äî for example, one that can detect sarcasm or mixed emotions

Now we‚Äôre moving from a baby vibe detector ‚Üí to a teenage vibe detector ‚Äî one that can spot sarcasm, mixed emotions, and stronger mood swings.


### Step 1: The Problem

Basic vibe coding (like TextBlob) only looks at positive or negative words,
but humans are trickier.

Examples:

‚ÄúOh great, another Monday.‚Äù
‚ÄúYeah, I totally love doing homework.‚Äù

Those sound positive on the surface (‚Äúgreat,‚Äù ‚Äúlove‚Äù)‚Ä¶
but the vibe is clearly sarcastic üòí

So we need an AI that understands context ‚Äî not just words.

### Step 2: Bring in a Smarter Brain (Transformers ü¶æ)

Modern AI uses transformer models like BERT, RoBERTa, or DistilBERT ‚Äî these are like super brains trained on millions of examples of real human speech, tweets, reviews, etc.

They can ‚Äúfeel‚Äù tone, detect sarcasm, and sense emotional complexity.

We can use a ready-made one from a library called Hugging Face Transformers.



In [7]:
# install first if needed
# pip install transformers torch

from transformers import pipeline

# load a pre-trained emotion detection model
emotion_analyzer = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", return_all_scores=False)

# try it out!
sentences = [
    "Oh great, another Monday.",
    "I can‚Äôt believe I failed again üòî",
    "I‚Äôm so proud of myself today!",
    "Sure, because everything always works perfectly (eye roll).",
    "That movie was sad but kind of beautiful."
]

for text in sentences:
    result = emotion_analyzer(text)[0]
    print(f"{text} ‚Üí {result['label']} ({result['score']:.2f})")

Device set to use cpu


Oh great, another Monday. ‚Üí joy (0.92)
I can‚Äôt believe I failed again üòî ‚Üí sadness (0.74)
I‚Äôm so proud of myself today! ‚Üí joy (0.85)
Sure, because everything always works perfectly (eye roll). ‚Üí neutral (0.82)
That movie was sad but kind of beautiful. ‚Üí sadness (0.97)


Results aren't exactly what we're expecting because the model we used ‚Äî j-hartmann/emotion-english-distilroberta-base ‚Äî is trained mainly on explicit emotions (like joy, sadness, anger, fear, disgust, etc.), not sarcasm.

So when it sees:

‚ÄúOh great, another Monday.‚Äù
it notices the word ‚Äúgreat,‚Äù which is usually positive, and doesn‚Äôt fully catch the sarcastic tone ‚Äî because it lacks vocal or situational clues.

Hence:

‚Üí joy (0.92)

The model isn‚Äôt ‚Äúwrong‚Äù ‚Äî it‚Äôs doing what it was trained for ‚Äî but it‚Äôs not vibe-aware enough yet.

To make our results more vibe-aware we can do either of the following:
1. Use a model trained specifically for sarcasm
2. Use a multi-label emotion model (some models can say multiple vibes at once - like "sad, but hopeful")
3. Combine Models (Vibe stacking)

Real-world systems (like Spotify or TikTok) often combine:
	‚Ä¢	Emotion model (text emotion)
	‚Ä¢	Sarcasm model
	‚Ä¢	Sentiment model
	‚Ä¢	Context model (user history or audio tone)

Together, that fusion gives a much stronger ‚Äúvibe sense.‚Äù

For the purpose of this exercise we would try out a multi-label emotion model

In [9]:
# install first if needed
# pip install transformers torch

from transformers import pipeline

# load a pre-trained emotion detection model
from transformers import pipeline
sarcasm_detector = pipeline("text-classification", model="mrm8488/t5-base-finetuned-sarcasm-twitter")
sarcasm_detector("Oh great, another Monday.")
# try it out!
sentences = [
    "Oh great, another Monday.",
    "I can‚Äôt believe I failed again üòî",
    "I‚Äôm so proud of myself today!",
    "Sure, because everything always works perfectly (eye roll).",
    "That movie was sad but kind of beautiful."
]

for text in sentences:
    result = emotion_analyzer(text)[0]
    print(f"{text} ‚Üí {result['label']} ({result['score']:.2f})")

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

Some weights of T5ForSequenceClassification were not initialized from the model checkpoint at mrm8488/t5-base-finetuned-sarcasm-twitter and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Device set to use cpu


Oh great, another Monday. ‚Üí joy (0.99)
I can‚Äôt believe I failed again üòî ‚Üí sadness (1.00)
I‚Äôm so proud of myself today! ‚Üí joy (1.00)
Sure, because everything always works perfectly (eye roll). ‚Üí joy (1.00)
That movie was sad but kind of beautiful. ‚Üí sadness (0.98)


Why sarcasm is hard for AI

Sarcasm needs context, tone, or history ‚Äî things text alone can‚Äôt fully show.

For example:

‚ÄúWow, you‚Äôre so early‚Ä¶‚Äù
If the person actually came late, the vibe is sarcastic. But unless the AI knows that context, it assumes ‚Äúwow‚Äù + ‚Äúso‚Äù + ‚Äúearly‚Äù = positive.

That‚Äôs why sarcasm detection is an advanced branch of vibe coding. It often needs:
  More context (previous sentences, speaker style)
  Tone of voice (audio input)
  Or models trained on sarcastic datasets (like ‚ÄúTwitter Sarcasm Corpus‚Äù)

### **Following this train of thought we would like to see how we could take this same idea and make a music or video vibe detector next, so the AI can feel the mood of a song or clip, not just text.**

### **Step 1: The New Problem ‚Äî Feeling the Mood of Music & Video**


*Imagine these:*

üéµ *A slow piano ballad ‚Üí feels melancholy, even if no lyrics.*

üé• *A fast-cut action trailer with deep bass ‚Üí feels intense or thrilling.*

üé∂ *A pop song with major chords and upbeat tempo ‚Üí feels joyful.*

üßç *A TikTok clip where someone is dancing but the music is ironic ‚Üí mixed vibes.*

Unlike text, here the ‚Äúvibe‚Äù lives in sound + visuals + lyrics.
So, our vibe detector must understand:

üîä Audio features ‚Üí tempo, pitch, energy, melody

üìù Lyrics (if present) ‚Üí using the text vibe detector we built

üëÅ Visual cues ‚Üí colors, brightness, facial expressions, motion

### **Step 2: Bring in Audio/Video Brains**

| Modality        | Common Tools / Models                                                                             |
| --------------- | ------------------------------------------------------------------------------------------------- |
| **Audio**       | OpenL3 (audio embeddings), YAMNet (sound classification), Musicnn (genre/mood), Wav2Vec2 (speech) |
| **Lyrics/Text** | Transformers (like before)                                                                        |
| **Visuals**     | CLIP (image+text), OpenCV (basic), or ViT (Vision Transformer) for emotion scenes                 |


We can combine these into a multimodal vibe pipeline

### **Step 3: A Mini Music Vibe Detector (Demo Code)**

Below is a simplified Python example using üü° librosa to get audio features + a simple rule-based mood guess.
(You could later swap the ‚Äúmood detector‚Äù with a real model like musicnn.)

üìù This focuses on instrumental audio mood, not lyrics.

In [11]:
# üéß Mini Music Vibe Detector (Beginner Demo)
# pip install librosa numpy

import librosa
import numpy as np

def detect_music_vibe(audio_path):
    # 1Ô∏è‚É£ Load the audio file
    y, sr = librosa.load(audio_path, duration=60)  # load first 60 seconds

    # 2Ô∏è‚É£ Extract basic features
    tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
    energy = np.mean(librosa.feature.rms(y=y))
    brightness = np.mean(librosa.feature.spectral_centroid(y=y, sr=sr))

    # 3Ô∏è‚É£ Simple vibe classification rules
    if tempo > 120 and energy > 0.05:
        vibe = "üéâ Upbeat / Energetic"
    elif brightness < 2000 and energy < 0.03:
        vibe = "üåß Sad / Calm / Reflective"
    else:
        vibe = "üé∂ Mixed / Neutral Mood"

    return {
        "tempo": round(tempo, 2),
        "energy": float(energy),
        "brightness": float(brightness),
        "detected_vibe": vibe
    }


This little demo extracts tempo, energy, and brightness ‚Äî super simple audio mood cues.
Later, you can plug in models like:

musicnn
 ‚Üí pre-trained for music mood/genre.

OpenL3
 ‚Üí to embed audio ‚Üí classify moods with a custom model.

### **Step 4: Video Vibe Detector (Concept)**

For videos, you combine:

Audio track ‚Üí use the music detector above

Transcript (speech) ‚Üí use a speech-to-text model like Whisper ‚Üí then feed the text to our teenage vibe detector üß†

Frames ‚Üí sample 1 frame per second ‚Üí run through a Vision Transformer (ViT) or CLIP to detect scene emotion (e.g., bright colors, facial expressions, action).

Mini demo for video frames (concept):

In [12]:
# pip install opencv-python
import cv2

def extract_frames(video_path, frame_rate=1):
    cap = cv2.VideoCapture(video_path)
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    frame_interval = int(fps / frame_rate)
    frames = []
    i = 0
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        if i % frame_interval == 0:
            frames.append(frame)
        i += 1
    cap.release()
    return frames

frames = extract_frames("clip.mp4")
print(f"Extracted {len(frames)} frames for vibe analysis üñº")


Extracted 0 frames for vibe analysis üñº


You‚Äôd then send these frames into a CLIP model or emotion classifier (e.g., ‚Äúhappy scene,‚Äù ‚Äútense,‚Äù ‚Äúromantic,‚Äù ‚Äúdark‚Äù), and blend that with the audio+text analysis.

### **Step 5: Fuse All the Vibes**

Finally, we combine the scores from:

üé∂ Audio mood

üìù Lyrics / Speech vibe

üëÅ Visual scene vibe

Example (pseudo):

In [14]:
# final_vibe = weighted_average([audio_vibe, text_vibe, visual_vibe])

# This is a conceptual representation of combining different vibe scores.
# In a real implementation, you would define how to combine the scores
# from audio, text, and visual analysis (e.g., using numerical scores and weights).

There are already some multimodal models you can experiment with:

üß† CLAP (Contrastive Language-Audio Pretraining) ‚Üí audio + text mood understanding

üß† VideoCLIP / ViViT ‚Üí video mood/scene analysis

üß† AudioSpectrogram Transformers ‚Üí detect music genre, mood, or emotion directly.

## **In summary:**

Text ‚Üí Teenage vibe detector üß†

Music ‚Üí Adds emotional rhythm and sonic energy üéß

Video ‚Üí Adds visual emotion and atmosphere üëÅ

‚ûï Fusion = A full sensory vibe intelligence üåàü§ñ