# Text Analytic for Taylor Swift Songs

### Try some of these resources for extra help as you work:

* On October 21, 2022 at midnight, Taylor Swift released her new album, Midnights. We wanted to know if she had dropped any hints about it before it went live, so we analyzed all of her lyrics. In this project, you will explore her references to ‘midnight’ leading up to this point, then you’ll expand your search to track her references to day, night, and time in general.

* Then you will use sentiment analysis to study her tone and vibe over the course of her career. Then you’ll put it all together to determine if she writes more favorably about day or night. Throughout the whole project, you’ll get to practice your nlp and text analysis skills on a real corpus.


[The original dataset was curated by Jan Llenzl Dagohoy and published on Kaggle](https://www.kaggle.com/datasets/thespacefreak/taylor-swift-song-lyrics-all-albums)


In [1]:
%matplotlib inline

import pandas as pd
import string
import seaborn as sns
import matplotlib.pyplot as plt
import collections
import nltk
from nltk.tokenize import word_tokenize
from nltk.sentiment import SentimentIntensityAnalyzer


# 1. Load the dataset

After running the first cell to load all necessary libraries, we need to load our dataset. 
* Using pandas, load the dataset taylorR2.xlsx and save it as song. 

In [2]:
# load dataset
song = pd.read_excel("TaylorR2.xlsx")

In [3]:
song.head()

Unnamed: 0,album_name,track_title,track_n,lyric,line
0,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,There's something 'bout the way,1
1,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,The street looks when it's just rained,2
2,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"There's a glow off the pavement, you walk me t...",3
3,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,And you know I wanna ask you to dance right there,4
4,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"In the middle of the parking lot, yeah",5


In [4]:
#get info about the DataFrame
song.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8358 entries, 0 to 8357
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   album_name   8358 non-null   object
 1   track_title  8358 non-null   object
 2   track_n      8358 non-null   int64 
 3   lyric        8358 non-null   object
 4   line         8358 non-null   int64 
dtypes: int64(2), object(3)
memory usage: 326.6+ KB


In [5]:
song.tail()

Unnamed: 0,album_name,track_title,track_n,lyric,line
8353,Red (Deluxe Edition),State of Grace (Acoustic Version),22,And I’ll never be the same,32
8354,Red (Deluxe Edition),State of Grace (Acoustic Version),22,This is a state of grace,33
8355,Red (Deluxe Edition),State of Grace (Acoustic Version),22,This is a worthwhile fight,34
8356,Red (Deluxe Edition),State of Grace (Acoustic Version),22,Love is a ruthless game,35
8357,Red (Deluxe Edition),State of Grace (Acoustic Version),22,Unless you play it good and right,36



* The Taylor Swift lyrics dataset consists of comprehensive data on the lyrics from her albums 2020. There are no missing values. We can see the lyric details along the column axis and each lyric along the row axis. We can find the name  of each album, the track, and the line in which the lyric appears. We do not have year associated with this dataset. 


### 2. Add essential data

We know that we want to explore her use of terms over years, but this dataset doesn't have years. We can either merge a dataset or manually create a new column. We have created a function for you that fills in the year based on the album name. 

* Apply this function to the `song` DataFrame to create a new column.
* Inspect the first few rows of the DataFrame

In [6]:
# get a list of all the albums in this collection
print(song.album_name.unique())

['Fearless (Taylor’s Version)' 'Taylor Swift' 'Speak Now (Deluxe)'
 '1989 (Deluxe)' 'reputation' 'Lover' 'folklore (deluxe version)'
 'evermore (deluxe version)' 'Red (Deluxe Edition)']


In [7]:
# this is a function to map the name of the album to the year it was released
def album_release(row):  
    if row['album_name'] == 'Taylor Swift':
        return '2006'
    elif row['album_name'] == 'Fearless (Taylor’s Version)':
        return '2008'
    elif row['album_name'] == 'Speak Now (Deluxe)':
        return '2010'
    elif row['album_name'] == 'Red (Deluxe Edition)':
        return '2012'
    elif row['album_name'] == '1989 (Deluxe)':
        return '2014'
    elif row['album_name'] == 'reputation':
        return '2017'
    elif row['album_name'] == 'Lover':
        return '2019'
    elif row['album_name'] == 'evermore (deluxe version)':
        return '2020'
    #ok, we know folklore was actually released in Dec 2020, but this will make our analysis easier
    elif row['album_name'] == 'folklore (deluxe version)':
        return '2021'
    #this is slightly differently formatted because the album name is recorded two ways.
    elif 'midnights' in row['album_name']:
        return '2022'
    
    return 'No Date'


# apply the function to the album
song['album_year'] = song.apply(lambda row: album_release(row), axis=1)

# inspect the first few rows of the DataFrame
song.head()

Unnamed: 0,album_name,track_title,track_n,lyric,line,album_year
0,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,There's something 'bout the way,1,2008
1,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,The street looks when it's just rained,2,2008
2,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"There's a glow off the pavement, you walk me t...",3,2008
3,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,And you know I wanna ask you to dance right there,4,2008
4,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"In the middle of the parking lot, yeah",5,2008


### 3. Clean the lyric text

To accurately count keyword mentions, we need to make everything lowercase everything, remove punctuation, and exclude stop words. 

* Change everything to lower case and save the result in a new column called `clean_lyric_song`.
* Remove punctuation and save the result to the existing `clean_lyric_song` column.
* Run the code we provided to remove stopwords.
* Check to be sure you have what you expect by viewing the first few rows.

In [8]:
#lowercase
## YOUR CODE HERE ##
song['clean_lyric_song'] = song['lyric'].str.lower()

In [9]:
#remove punctuation
#will replace any character in the string that is not a word character or whitespace with an empty string, 
#effectively removing non-alphanumeric and non-whitespace characters from the string.
song['clean_lyric_song']= song['clean_lyric_song'].str.replace('[^\w\s]','',regex=True)
song.head()

Unnamed: 0,album_name,track_title,track_n,lyric,line,album_year,clean_lyric_song
0,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,There's something 'bout the way,1,2008,theres something bout the way
1,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,The street looks when it's just rained,2,2008,the street looks when its just rained
2,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"There's a glow off the pavement, you walk me t...",3,2008,theres a glow off the pavement you walk me to ...
3,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,And you know I wanna ask you to dance right there,4,2008,and you know i wanna ask you to dance right there
4,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"In the middle of the parking lot, yeah",5,2008,in the middle of the parking lot yeah


In [10]:
#remove stopwords (see the next cell for illustration)
#create a small list of English stop words, feel free to edit this list
stop = ['the', 'a', 'this', 'that', 'to', 'is', 'am', 'was', 'were', 'be', 'being', 'been']

In [11]:
#there are three steps in one here - explained below
#we make a list of words with `.split()`
#then we remove all the words in our list
#then we join the words back together into a string
song['clean_lyric_song'] = song['clean_lyric_song'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))
song.head()

Unnamed: 0,album_name,track_title,track_n,lyric,line,album_year,clean_lyric_song
0,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,There's something 'bout the way,1,2008,theres something bout way
1,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,The street looks when it's just rained,2,2008,street looks when its just rained
2,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"There's a glow off the pavement, you walk me t...",3,2008,theres glow off pavement you walk me car
3,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,And you know I wanna ask you to dance right there,4,2008,and you know i wanna ask you dance right there
4,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"In the middle of the parking lot, yeah",5,2008,in middle of parking lot yeah


In [12]:
#see what `.split()` does
song['clean_lyric_list_song'] = song['clean_lyric_song'].apply(lambda x: x.split())
print(song.head())

                    album_name                  track_title  track_n  \
0  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
1  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
2  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
3  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
4  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   

                                               lyric  line album_year  \
0                    There's something 'bout the way     1       2008   
1             The street looks when it's just rained     2       2008   
2  There's a glow off the pavement, you walk me t...     3       2008   
3  And you know I wanna ask you to dance right there     4       2008   
4             In the middle of the parking lot, yeah     5       2008   

                                 clean_lyric_song  \
0                       theres something bout way   
1               street

In [13]:
#see what `.join()` does
song['clean_lyric_list_rejoined'] = song['clean_lyric_list_song'].apply(lambda x: ' '.join(x))
print(song.head())

                    album_name                  track_title  track_n  \
0  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
1  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
2  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
3  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
4  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   

                                               lyric  line album_year  \
0                    There's something 'bout the way     1       2008   
1             The street looks when it's just rained     2       2008   
2  There's a glow off the pavement, you walk me t...     3       2008   
3  And you know I wanna ask you to dance right there     4       2008   
4             In the middle of the parking lot, yeah     5       2008   

                                 clean_lyric_song  \
0                       theres something bout way   
1               street

In [14]:
#remove those illustration columns
song.drop(['clean_lyric_list_song', 'clean_lyric_list_rejoined'], axis=1, inplace=True)
print(song.head())

                    album_name                  track_title  track_n  \
0  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
1  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
2  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
3  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   
4  Fearless (Taylor’s Version)  Fearless (Taylor’s Version)        1   

                                               lyric  line album_year  \
0                    There's something 'bout the way     1       2008   
1             The street looks when it's just rained     2       2008   
2  There's a glow off the pavement, you walk me t...     3       2008   
3  And you know I wanna ask you to dance right there     4       2008   
4             In the middle of the parking lot, yeah     5       2008   

                                 clean_lyric_song  
0                       theres something bout way  
1               street l

In [15]:
#there are many pre-built lists of stopwords, including one from sklearn.
#Most exclude too many words to be appropriate for song lyric analysis.
from sklearn.feature_extraction import text
skl_stop = text.ENGLISH_STOP_WORDS
print(skl_stop)

frozenset({'wherever', 'all', 'so', 'thereupon', 'twenty', 'up', 'anyhow', 'top', 'system', 'there', 'which', 'often', 'other', 'ever', 'now', 'might', 'name', 'one', 'become', 'thereafter', 'part', 'per', 'otherwise', 'found', 'most', 'on', 'upon', 'own', 'those', 'serious', 'either', 'whereas', 'something', 'thru', 'whether', 'would', 'made', 'twelve', 'who', 'towards', 'its', 'fifty', 'perhaps', 'nor', 'whose', 'see', 'beyond', 'becoming', 'else', 'have', 'out', 'yours', 'themselves', 'elsewhere', 'among', 'through', 're', 'your', 'her', 'six', 'himself', 'as', 'someone', 'noone', 'must', 'further', 'except', 'always', 'within', 'much', 'eleven', 'cant', 'take', 'being', 'yourself', 'our', 'until', 'whence', 'though', 'eg', 'show', 'done', 'may', 'between', 'meanwhile', 'because', 'or', 'nevertheless', 'hereupon', 'across', 'here', 'first', 'interest', 'onto', 'some', 'be', 'amongst', 'even', 'hereby', 'nobody', 'seem', 'back', 'seemed', 'was', 'several', 'bottom', 'also', 'beforeha

### 4. Find keyword mentions

Because we are most interested in "midnight", we will first count how many times midnight occurs in Taylor Swift's lyrics. 

* Create a new column to indicate if a lyric has "midnight" in it.
* Check how many times midnight occurrs

In [16]:
#create a new column to reflect if the lyrics contain midnight
song['midnight'] = song['clean_lyric_song'].str.contains('midnight')
sum(song['midnight'])

9

* Midnight is not very common -- it only appeared 9 times! 

### 5. Expand the keyword list

Midnight might not be the only way that Taylor Swift has talked about night. We need to expand our list. We've made a list of night words and day words. Feel free to add or remove words to make the list your own. 

* Join the lists into a regular expression string using the `.join()` function and the `|` to indicate "or"
* Create a new column for each word category (day, night, time) that evaluates the clean lyrics for the presence of the words in your regular expression.
* Count how many times the words appeared and print the result to the screen
* Inspect the first few rows of the lyrics DataFrame to be sure it looks as you expect

In [17]:
#night, day, and other time-related words
night = ['night','midnight', 'dawn', 'dusk', 'evening', 'late', 'dark', '1am', '2am', '3am', '4am']
day = ['day', 'morning', 'light', 'sun', 'dawn', 'noon', 'golden', 'bright']
time = ['today', 'tomorrow', 'yesterday']

In [18]:
#create a regular expression string for each list of words
night_regex = '|'.join(night)
day_regex = '|'.join(day)
time_regex = '|'.join(time)

In [19]:
#create a new column for each category of words
song['night'] = song['clean_lyric_song'].str.contains(night_regex)
song['day'] = song['clean_lyric_song'].str.contains(day_regex)
song['time'] = song['clean_lyric_song'].str.contains(time_regex)

In [20]:
#count the number of times each category of word appears in the lyrics
night_count = sum(song['night'])
day_count = sum(song['day'])
time_count = sum(song['time'])


In [21]:
#print the count of each word category
print("night words: ", night_count)
print("day words: ", day_count)
print("time words: ", time_count)

night words:  240
day words:  363
time words:  35


In [22]:
song.head()

Unnamed: 0,album_name,track_title,track_n,lyric,line,album_year,clean_lyric_song,midnight,night,day,time
0,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,There's something 'bout the way,1,2008,theres something bout way,False,False,False,False
1,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,The street looks when it's just rained,2,2008,street looks when its just rained,False,False,False,False
2,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"There's a glow off the pavement, you walk me t...",3,2008,theres glow off pavement you walk me car,False,False,False,False
3,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,And you know I wanna ask you to dance right there,4,2008,and you know i wanna ask you dance right there,False,False,False,False
4,Fearless (Taylor’s Version),Fearless (Taylor’s Version),1,"In the middle of the parking lot, yeah",5,2008,in middle of parking lot yeah,False,False,False,False


* Taylor Swift has referenced time alot, there's enough mentions for us to potentially spot some trends. Also, she has mentioned day far more than night. This might be because of the search terms, so we might consider revisiting the list. Or it could be because she is actually more focused on day and light than she is on night and dark. 


# Task Group 2

### 6. Visualize how Taylor Swift's mentions of time have changed over time.

Now that we know that she has referenced time, we can see if that has changed, and if she has been dropping any hints about her upcoming album. 
* Create a new dataframe called `yearly_mentions` that groups her mentions by year, and takes the sum of the other columns. Reset the index on this DataFrame to make it easier to use with matplotlib
* Create a line chart in matplotlib showing the night mentions over time.

In [23]:
#create a new dataframe for yearly mentions that groups mentions by year
yearly_mentions = song.groupby('album_year').sum().reset_index()
yearly_mentions

TypeError: can only concatenate str (not "int") to str

In [24]:
#create a new dataframe for yearly mentions that groups mentions by year
yearly_mentions = song.groupby('album_year').sum().reset_index()
yearly_mentions['album_year'] = yearly_mentions['album_year'].astype(str)
yearly_mentions

TypeError: can only concatenate str (not "int") to str

In [None]:
#plot the mentions of night over years
plt.plot(yearly_mentions['album_year'], yearly_mentions['night'])
plt.title("Night")
plt.show()

In [None]:
#plot the mentions of midnight over years
plt.plot(yearly_mentions['album_year'], yearly_mentions['midnight'])
plt.title("Midnight")
plt.show()

In [None]:
#plot the mentions of day over years
plt.plot(yearly_mentions['album_year'], yearly_mentions['day'])
plt.title("Day")
plt.show()

In [None]:
#plot the mentions of time over years
plt.plot(yearly_mentions['album_year'], yearly_mentions['time'])
plt.title("TIME")
plt.show()

<i>

It looks like there was a peak in her mentioning night around 2012. It seems like those mentions have become less prevalent over time, with an actual dip in 2019! We had expected midnight to appear more often around then, based on rumors that she plans everything 3 years out. How about others mentioned such as day and time?
</i>

### 7. What albums are the most night/day focused?
Now that we have a table of counts, it's very easy to figure out which albums have the most mentions of day or night. But our DataFrame only has the years, not the album names. We need to add those back in.
* Run the code that reinstates the album name for this dataframe.
* Use `.sort_values()` to order the yearly_mentions table and find which year has the most mentions of night.
* Sort the day column as well.

In [None]:
#reinstate the album name
#read the album_year_name.csv
year_name = pd.read_excel('albumR2.xlsx')

In [None]:
#sort both dataframes by year
yearly_mentions.sort_values(by='album_year', ascending=True, inplace=True)
year_name.sort_values(by='album_year', ascending=True, inplace=True)

In [None]:
#add the new column for album name
yearly_mentions['album_name'] = year_name['album_name']

In [None]:
#sort the lyrics by the night column to find the albums with the most night references
yearly_mentions.sort_values(by='night', ascending=False)


* Her most <b>"nightly" </b> album was reputation in 2012, whereas her least nightly album was also the one right before <b>Midnight</b>. Has she been saving all her night-related lyrics for the past 3 years?


In [None]:
#sort the lyrics by the day column to find the albums with the most day references
yearly_mentions.sort_values(by='day', ascending=False)

It seems like her older work focused on day much more than her more recent. There was a period of time (2008-2012) when she was really focused on day, but it seems to have become less central to her work more recently. 

### 8. Compare day to night mentions
* Create another line chart with both the night mentions and the day mentions

In [None]:
#create a plot with one line showing number of night,  references by year 
#and another line with the number of day and time references by year

plt.plot(yearly_mentions['album_year'], yearly_mentions['night'], label = 'night')
plt.plot(yearly_mentions['album_year'], yearly_mentions['day'], label = 'day')
plt.plot(yearly_mentions['album_year'], yearly_mentions['time'], label = 'time')
plt.legend()
plt.show()


Well this is interesting. We see a similar dip in referencing time around 2017, but a sharp increase in both 2019 and 2020 to reference day. 


### 9. Investigate position of day vs. night mentions within albums
Maybe her references to time aren't always about the year that she released the album, but the ebb and flow of the album itself. Let's plot where, within each album, references to day and night occur.

* Create a variable that indicates the position of a lyric within an album. Call this 'position'.
* Create a new dataframe called positional_mentions that groups her mentions by album position, and takes the sum of the other columns. Reset the index on this DataFrame to make it easier to use with matplotlib.
* Create a line chart in matplotlib showing the night and day mentions as a position within the album.

*Remember that you can always print the head of the DataFrame to check your work*

In [None]:
#create a position variable that includes both the track number and line number
song['position'] = song['track_n'] + (song['line']/1000)

In [None]:
#create a new DataFrame that is grouped by position
positional_mentions = song.groupby('position').sum().reset_index()

In [None]:
#increase the size of the plot 
fig = plt.gcf()
fig.set_size_inches(25,10)

#create a plot with two lines to show frequency of day vs. night references by position in the album
plt.plot(positional_mentions['position'], positional_mentions['night'], label = 'night')
plt.plot(positional_mentions['position'], positional_mentions['day'], label = 'day')
plt.legend()

plt.show()

Honestly not that much -- we can't see any strong trends here. Best to move on. 

# Task Group 3

### 10. Tokenize the Lyrics
It's great to know how much she has focused on day and night, but we would also like to do a little more sophisticated analysis. Before we can work with our cleaned lyrics, we will have to tokenize them. Tokenization is a special way of breaking up words that is a little more nuanced than just using white space. The output is a list of words that we can then perform text analysis on.

We will use the `word_tokenize` function from NLTK (the Natural Language ToolKit), and apply it to every row in our DataFrame with a lambda expression.

* Run the cell to tokenize the cleaned lyrics.
* Inspect the first few rows of the lyrics DataFrame
* Create a list of all the tokens in the lyrics_tok column into one list
* Use the Counter function from the collections package to count the number of times each word appears
* Sort the resulting dictionary


In [None]:
#run this cell to tokenize the words in the clean_lyric column
song['lyrics_tok'] = song['clean_lyric_song'].str.split(' ')

In [None]:
#inspect the first few lines
song.head()

In [None]:
#determine what words overall are the most frequently used words
#create a list of all the words in the lyrics_tok column
word_list = [word for list_ in song['lyrics_tok'] for word in list_]

In [None]:
word_list

In [None]:
#use the counter function to count the number of times each word appears
word_frequency = collections.Counter(word_list)
print(word_frequency)

In [None]:
#sort the word frequencies to find out the most common words she's used. 
word_frequency = sorted(word_frequency.items(), key=lambda x: x[1], reverse=True)

In [None]:
word_frequency

* She mostly talks about you and then herself, while that's not surprising, it also tells us that she's mostly writing about relationships and her realtionships with others. 


### 11. Analyze Lyric Sentiment

To better understand how she thinks and feels about day and night, we want to know which one she writes about more favorably. 

We are going to use the pre-trained sentiment classifier that comes with NLTK. It's it trained on tweets and therefore best for short text. It return 4 values: positive, negative, neutral, and compound. Compound will be of special interest to us. It is the normalized sum of positive and negative. 

* Run the sample sentiment analyzer cell to see how the SentimentIntensityAnalyzer works.
* Apply the sia sentiment analyzer to the clean_lyric column of the DataFrame with a lambda expression
* Run the cell that transforms the dictionary into columns of the DataFrame

In [None]:
#run this cell to add a package from NLTK for our sentiment analyzer.
nltk.download('vader_lexicon')

In [None]:
#run this cell to see how the sentiment analyzer works
sia = SentimentIntensityAnalyzer()
sia.polarity_scores("I love Taylor Swift!")

In [None]:
#run this cell to see how the sentiment analyzer works
sia = SentimentIntensityAnalyzer()
sia.polarity_scores("I love love love Taylor Swift!")

In [None]:
#create a new column called polarity and apply the sia method to the clean_lyric column with a lambda expression
song['polarity'] = song['clean_lyric_song'].apply(lambda x: sia.polarity_scores(x))
song.head()

In [None]:
#run this cell to transform the polarity dictionary into columns of the DataFrame
song[['neg', 'neu', 'pos', 'compound']] = song['polarity'].apply(pd.Series)

In [None]:
song.head()

In [None]:
song.drop('polarity', axis=1)

### 12. Corpus Sentiment Analysis

Now that we have sentiment for all of her lyrics, let's learn more about her overall sentiment, as well as how that has change throughout her career. 
* Use the pos, neg, and compound columns to calculate the overall sentiment of her entire collection, and print the result.
* Use the groupby function and matplotlib to visualize the overall sentiment of her albums over time 

In [None]:
#calculate overall sentiment for pos, neg, sentiment
pos = sum(song['pos'])
neg = sum(song['neg'])
compound = sum(song['compound'])

In [None]:
#print the overall sentiments
print("positive: ", pos)
print("negative: ", neg)
print("compound: ", compound)

In [None]:
#create a new DataFrame using the groupby method for the album_year
yearly_sentiment = song.groupby('album_year').sum().reset_index()

#visualize sentiment over time 
plt.plot(yearly_sentiment['album_year'], yearly_sentiment['compound'])
plt.title ("Sentiment over the years")
plt.show()

### 13. Day or Night? Positive or Negative
Finally the big reveal. Does Taylor Swift write more positively about day or night? 
* Create two DataFrames: one for all the lyrics that mention night and one for all the lyrics that mention day.
* Check that each DataFrame has the right number of values with the `len()` function
* Calculate the sentiment for both night and day using the compound value and print them.

In [None]:
#create a DataFrame filtered for only night mentions
night = song[song['night']==True]
night

In [None]:
#create a DataFrame filtered for only day mentions
day = song[song['day']==True]
day

In [None]:
#print the length of the night and day DataFrames
print("night: ",len(night))
print("day: ",len(day))

In [None]:
#calculate the sentiment of each day and night DataFrame from the compound values
night_sentiment = night['compound'].sum()
day_sentiment = day['compound'].sum()

In [None]:
#print the results
print("night sentiment: ", night_sentiment)
print("day_sentiment: ", day_sentiment)

She's still overall more positive in her sentiment about day than night! 
    
You have the tools to explore if that sentiment has changed over time, or if her feelings towards night have grown more positive or negative over time. And of course, we would love to see your analysis of her new album, Midnight. Have her feelings changed?
