# Sentiment Analysis for Songs

In this assignment you're going to try assigning sentiments to songs from the songs dataset.


## Loading the Song Lyrics Dataset

```python
from google.colab import drive
import pandas as pd

drive.mount('/content/gdrive')
df = pd.read_csv('/content/gdrive/My Drive/datasets/songs.csv')
```

## Getting out the lyrics of every song

```python
all_terms = df['Lyrics'].str.cat(sep=" ").split()
```

## Getting the terms out of a specific song

```python
tylor_df = df[df["Artist"] == "Taylor Swift"]
lover_song_lyrics = taylor_df[df["Title"] == "Lover"].iloc[0]["Lyrics"]
lover_song_terms = lover_song_lyrics.split()
```

## The Sentiment Analysis Example from our Slides

```python
import spacy
from textblob import TextBlob

nlp = spacy.load("en_core_web_sm")
sentence = "I am very unhappy with this product."
doc = nlp(sentence)

blob = TextBlob(doc.text)

# Note, positive polarity means positive sentiment, negative means negative sentiment.
print(f"Sentiment Polarity: {blob.sentiment.polarity}")
print(f"Sentiment Subjectivity: {blob.sentiment.subjectivity}")
print(f"Assessments: {blob.sentiment_assessments.assessments}")
```

## Applying a Function to Every Row to Create a new Series

You may want to treat sentiment as a new column in your dataset. You can do this with pandas using the `.apply` method!

Here's an example of me calculating a simple word count column for the songs dataframe.

```python
def lyrics_count(row):
    words = row["Lyrics"].split()
    num_words = len(words)
    return num_words

df["Lyrics Word Count"] = df.apply(lyrics_count, axis=1)
```

## Preprocessing

Copied and pasted from our last assignment

```python
import spacy
from nltk.stem import PorterStemmer

nlp = spacy.load("en_core_web_sm")
stemmer = PorterStemmer()

def preprocess(doc_str, with_stemming=False, with_lemmatization=False):
    """preprocess takes a string, doc_str, and returns the string preprocessed.

    By default, preprocessing means lowercasing, removing punctuation, and
    removing stop words.
    
    Optionally, you may stem or lemmatize as well by passing with_stemming=True
    or with_lemmatization=True.
    """
    # Lowercase
    doc_str = doc_str.lower()
    doc = nlp(doc_str)  # Initialize as a spaCy object (list of tokens)
    words = []
    for token in doc:
        # Skip punctuation and stop words
        if not token.is_punct and not token.is_stop:
            text = token.text
            if with_lemmatization:
                text = token.lemma_
            if with_stemming:
                text = stemmer.stem(text)
            words.append(text)

    # Turn them back into one string
    doc_str = " ".join(words)
    return doc_str
```

In [4]:
from google.colab import drive
import pandas as pd

drive.mount('/content/gdrive')
df = pd.read_csv('/content/gdrive/My Drive/datasets/songs.csv')
df

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


Unnamed: 0,Artist,Title,Lyrics
0,Taylor Swift,cardigan,"Vintage tee, brand new phone\nHigh heels on co..."
1,Taylor Swift,exile,"I can see you standing, honey\nWith his arms a..."
2,Taylor Swift,Lover,We could leave the Christmas lights up 'til Ja...
3,Taylor Swift,the 1,"I'm doing good, I'm on some new shit\nBeen say..."
4,Taylor Swift,Look What You Made Me Do,I don't like your little games\nDon't like you...
...,...,...,...
740,George Michael,The First Time Ever I Saw Your Face,The first time ever I saw your face\nI thought...
741,George Michael,Waiting For That Day/You Can’t Always Get What...,Now every day I see you in some other face\nTh...
742,George Michael,Shoot the Dog,"GTI, Hot Shot\nHe parks it there, just to piss..."
743,George Michael,Star People,"Maybe your mama gave you up, boy\nMaybe your d..."


In [5]:
def get_lyrics(songs_df, artist, title):
    """Given the songs.csv dataframe, pulls out the lyrics for a particular artist and song.
    """
    return songs_df[songs_df["Artist"] == artist][df["Title"] == title].iloc[0]["Lyrics"]

get_lyrics(df, "Taylor Swift", "Lover")

  return songs_df[songs_df["Artist"] == artist][df["Title"] == title].iloc[0]["Lyrics"]


"We could leave the Christmas lights up 'til January\nAnd this is our place, we make the rules\nAnd there's a dazzling haze, a mysterious way about you, dear\nHave I known you twenty seconds or twenty years?\n\nCan I go where you go?\nCan we always be this close?\nForever and ever, ah\nTake me out, and take me home\nYou're my, my, my, my lover\n\nWe could let our friends crash in the living room\nThis is our place, we make the call\nAnd I'm highly suspicious that everyone who sees you wants you\nI've loved you three summers now, honey, but I want 'em all\n\nCan I go where you go?\nCan we always be this close?\nForever and ever, ah\nTake me out, and take me home (Forever and ever)\nYou're my, my, my, my lover\nLadies and gentlemen, will you please stand?\nWith every guitar string scar on my hand\nI take this magnetic force of a man to be my lover\nMy heart's been borrowed and yours has been blue\nAll's well that ends well to end up with you\nSwear to be overdramatic and true to my lover

In [11]:
#!spacy download en_core_web_lg

import spacy
from textblob import TextBlob

nlp = spacy.load("en_core_web_lg")
song_lyrics = get_lyrics(df, "Taylor Swift", "Lover")
doc = nlp(song_lyrics)
blob = TextBlob(doc.text)

# Note, positive polarity means positive sentiment, negative means negative sentiment.
print(f"Sentiment Polarity: {blob.sentiment.polarity}")
print(f"Sentiment Subjectivity: {blob.sentiment.subjectivity}")
print(f"Assessments: {blob.sentiment_assessments.assessments}")

nlp = spacy.load("en_core_web_lg")

def get_polarity(row):
    #doc = nlp(row["Lyrics"])
    blob = TextBlob(row["Lyrics"])
    return blob.sentiment.polarity

taylor_df = df[df["Artist"] == "Taylor Swift"]
taylor_df["Polarity"] = taylor_df.apply(get_polarity, axis=1)

taylor_df.sort_values("Polarity")

# assessments_df = pd.DataFrame(blob.sentiment_assessments.assessments, columns=["words", "polity", "subj", "uknown"])
# assessments_df


  return songs_df[songs_df["Artist"] == artist][df["Title"] == title].iloc[0]["Lyrics"]


Sentiment Polarity: 0.3085714285714286
Sentiment Subjectivity: 0.5985714285714286
Assessments: [(['dazzling'], 0.75, 1.0, None), (['mysterious'], 0.0, 1.0, None), (['highly'], 0.16, 0.5399999999999999, None), (['wants'], 0.2, 0.1, None), (['loved'], 0.7, 0.8, None), (['blue'], 0.0, 0.1, None), (['true'], 0.35, 0.65, None)]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  taylor_df["Polarity"] = taylor_df.apply(get_polarity, axis=1)


Unnamed: 0,Artist,Title,Lyrics,Polarity
29,Taylor Swift,mad woman,What did you think I'd say to that?\nDoes a sc...,-0.299194
47,Taylor Swift,Bad Blood,"’Cause baby, now we've got bad blood\nYou know...",-0.258543
23,Taylor Swift,this is me trying,I've been having a hard time adjusting\nI had ...,-0.21037
32,Taylor Swift,The Man,"I would be complex, I would be cool\nThey'd sa...",-0.148677
5,Taylor Swift,betty,"Betty, I won't make assumptions\nAbout why you...",-0.140934
42,Taylor Swift,epiphany,"Keep your helmet, keep your life, son\nJust a ...",-0.118681
16,Taylor Swift,Cruel Summer,"(Yeah, yeah, yeah, yeah)\n\nFever dream high i...",-0.104993
48,Taylor Swift,Cornelia Street,We were in the backseat\nDrunk on something st...,-0.088902
7,Taylor Swift,End Game,I wanna be your end game\nI wanna be your firs...,-0.086001
22,Taylor Swift,illicit affairs,Make sure nobody sees you leave\nHood over you...,-0.083847


In [16]:
# !spacy download en_core_web_lg

import spacy
import numpy as np

nlp = spacy.load("en_core_web_lg")  # Load spaCy model
rome = nlp("Rome").vector
italy = nlp("Italy").vector
france = nlp("France").vector

guess_paris = rome - italy + france
actual_paris = nlp("Paris").vector

print(f"Distance is {np.linalg.norm(guess_paris - actual_paris)}")


Distance is 43.222904205322266
