# Song Sentiment Analysis - Introduction

This notebook provides an introduction to the sentiment analysis project for song lyrics.

## Overview

This project analyzes sentiment in song lyrics using various natural language processing techniques.

## Setup

First, let's import the necessary libraries and our custom modules.

In [None]:
import sys
import os

# Add src directory to path
sys.path.append(os.path.abspath('../src'))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sentiment_analysis.data_utils import create_sample_data
from sentiment_analysis.preprocessing import clean_text, preprocess_lyrics_dataframe
from sentiment_analysis.analyzer import SentimentAnalyzer
from sentiment_analysis.visualization import (
    plot_sentiment_distribution,
    plot_sentiment_scores,
    generate_wordcloud
)

%matplotlib inline
print("Libraries imported successfully!")

## Load Sample Data

Let's create and load some sample data to demonstrate the functionality.

In [None]:
# Create sample data
df = create_sample_data()
print(f"Loaded {len(df)} songs")
df.head()

## Data Preprocessing

Clean and preprocess the lyrics data.

In [None]:
# Preprocess the lyrics
df = preprocess_lyrics_dataframe(df, text_column='lyrics')
df[['song_title', 'lyrics', 'cleaned_lyrics', 'word_count']].head()

## Sentiment Analysis

Analyze sentiment using VADER (Valence Aware Dictionary and sEntiment Reasoner).

In [None]:
# Initialize sentiment analyzer
analyzer = SentimentAnalyzer(method='vader')

# Analyze sentiment
df = analyzer.analyze_dataframe(df, text_column='lyrics')
df[['song_title', 'artist', 'sentiment', 'compound', 'pos', 'neg', 'neu']].head()

## Visualization

Visualize the sentiment analysis results.

In [None]:
# Plot sentiment distribution
fig = plot_sentiment_distribution(df)
plt.show()

In [None]:
# Plot sentiment scores
fig = plot_sentiment_scores(df, score_column='compound')
plt.show()

In [None]:
# Generate word cloud from all lyrics
all_lyrics = ' '.join(df['cleaned_lyrics'].values)
fig = generate_wordcloud(all_lyrics, title='Most Common Words in Lyrics')
plt.show()

## Summary Statistics

In [None]:
# Display summary statistics
print("Sentiment Distribution:")
print(df['sentiment'].value_counts())
print("\nAverage Sentiment Scores:")
print(df[['compound', 'pos', 'neg', 'neu']].describe())

## Next Steps

1. Load your own song lyrics data
2. Explore advanced sentiment analysis techniques
3. Build machine learning models for classification
4. Analyze sentiment trends over time or across genres