# Web Scraping

First, we need to import requests and BeautifulSoup for web scraping.

In [108]:
import requests
from bs4 import BeautifulSoup

We need to enter the website from which we're about to extract information, and use html_string with BeautifulSoup.

In [109]:
response = requests.get("https://genius.com/albums/Adele/25-target-exclusive")
html_string = response.text
document = BeautifulSoup(html_string, "html.parser")

Then, we need to find the titles of the songs with the find_all() function.

In [110]:
song_title_tags = document.find_all("h3")
song_title_tags

[<h3 class="chart_row-content-title">
               Hello
               <span class="chart_row-content-title-subtitle">Lyrics</span>
 </h3>,
 <h3 class="chart_row-content-title">
               Send My Love (To Your New Lover)
               <span class="chart_row-content-title-subtitle">Lyrics</span>
 </h3>,
 <h3 class="chart_row-content-title">
               I Miss You
               <span class="chart_row-content-title-subtitle">Lyrics</span>
 </h3>,
 <h3 class="chart_row-content-title">
               When We Were Young
               <span class="chart_row-content-title-subtitle">Lyrics</span>
 </h3>,
 <h3 class="chart_row-content-title">
               Remedy
               <span class="chart_row-content-title-subtitle">Lyrics</span>
 </h3>,
 <h3 class="chart_row-content-title">
               Water Under the Bridge
               <span class="chart_row-content-title-subtitle">Lyrics</span>
 </h3>,
 <h3 class="chart_row-content-title">
               River Lea
               <

If we want only the text contained between those tags, we can use .text to extract just the text.

In [111]:
titles_list = [song.text for song in song_title_tags]
titles_list

['\n              Hello\n              Lyrics\n',
 '\n              Send My Love (To Your New Lover)\n              Lyrics\n',
 '\n              I Miss You\n              Lyrics\n',
 '\n              When We Were Young\n              Lyrics\n',
 '\n              Remedy\n              Lyrics\n',
 '\n              Water Under the Bridge\n              Lyrics\n',
 '\n              River Lea\n              Lyrics\n',
 '\n              Love in the Dark\n              Lyrics\n',
 '\n              Million Years Ago\n              Lyrics\n',
 '\n              All I Ask\n              Lyrics\n',
 '\n              Sweetest Devotion\n              Lyrics\n',
 "\n              Can't Let Go\n              Lyrics\n",
 '\n              Lay Me Down\n              Lyrics\n',
 '\n              Why Do You Love Me\n              Lyrics\n',
 'Translations',
 '30',
 '30 (Target Exclusive)']

Next, we filter out the useless information and only leave the titles in the list. The useless information is different from the titles, so we can use startswith() function.

In [112]:
raw_title=[item for item in titles_list if item.startswith('\n')]
raw_title

['\n              Hello\n              Lyrics\n',
 '\n              Send My Love (To Your New Lover)\n              Lyrics\n',
 '\n              I Miss You\n              Lyrics\n',
 '\n              When We Were Young\n              Lyrics\n',
 '\n              Remedy\n              Lyrics\n',
 '\n              Water Under the Bridge\n              Lyrics\n',
 '\n              River Lea\n              Lyrics\n',
 '\n              Love in the Dark\n              Lyrics\n',
 '\n              Million Years Ago\n              Lyrics\n',
 '\n              All I Ask\n              Lyrics\n',
 '\n              Sweetest Devotion\n              Lyrics\n',
 "\n              Can't Let Go\n              Lyrics\n",
 '\n              Lay Me Down\n              Lyrics\n',
 '\n              Why Do You Love Me\n              Lyrics\n']

The titles left still need to be cleaned with replace() function (filter out the term "Lyrics") and strip() function (remove spaces).

In [113]:
song_titles=[item.replace('Lyrics', '').strip() for item in raw_title]
song_titles

['Hello',
 'Send My Love (To Your New Lover)',
 'I Miss You',
 'When We Were Young',
 'Remedy',
 'Water Under the Bridge',
 'River Lea',
 'Love in the Dark',
 'Million Years Ago',
 'All I Ask',
 'Sweetest Devotion',
 "Can't Let Go",
 'Lay Me Down',
 'Why Do You Love Me']

Now, we get a list of titles. Then, we need to extract the corresponding link of each song. The process is similar to that above. We find the location of the links first.

In [114]:
links = document.find_all("div", class_="chart_row-content")
links

[<div class="chart_row-content">
 <a class="u-display_block" href="https://genius.com/Adele-hello-lyrics">
 <h3 class="chart_row-content-title">
               Hello
               <span class="chart_row-content-title-subtitle">Lyrics</span>
 </h3>
 </a>
 </div>,
 <div class="chart_row-content">
 <a class="u-display_block" href="https://genius.com/Adele-send-my-love-to-your-new-lover-lyrics">
 <h3 class="chart_row-content-title">
               Send My Love (To Your New Lover)
               <span class="chart_row-content-title-subtitle">Lyrics</span>
 </h3>
 </a>
 </div>,
 <div class="chart_row-content">
 <a class="u-display_block" href="https://genius.com/Adele-i-miss-you-lyrics">
 <h3 class="chart_row-content-title">
               I Miss You
               <span class="chart_row-content-title-subtitle">Lyrics</span>
 </h3>
 </a>
 </div>,
 <div class="chart_row-content">
 <a class="u-display_block" href="https://genius.com/Adele-when-we-were-young-lyrics">
 <h3 class="chart_row-cont

We extract the links and append them to a new list.

In [115]:
song_links=[link.find('a', class_="u-display_block")['href'] for link in links]
song_links

['https://genius.com/Adele-hello-lyrics',
 'https://genius.com/Adele-send-my-love-to-your-new-lover-lyrics',
 'https://genius.com/Adele-i-miss-you-lyrics',
 'https://genius.com/Adele-when-we-were-young-lyrics',
 'https://genius.com/Adele-remedy-lyrics',
 'https://genius.com/Adele-water-under-the-bridge-lyrics',
 'https://genius.com/Adele-river-lea-lyrics',
 'https://genius.com/Adele-love-in-the-dark-lyrics',
 'https://genius.com/Adele-million-years-ago-lyrics',
 'https://genius.com/Adele-all-i-ask-lyrics',
 'https://genius.com/Adele-sweetest-devotion-lyrics',
 'https://genius.com/Adele-cant-let-go-lyrics',
 'https://genius.com/Adele-lay-me-down-lyrics',
 'https://genius.com/Adele-why-do-you-love-me-lyrics']

Now we have two lists and we can get a Dataframe with these two lists.

In [116]:
import pandas as pd
songs_df1=pd.DataFrame({'Filename':song_titles, 'Link':song_links})
songs_df1

Unnamed: 0,Filename,Link
0,Hello,https://genius.com/Adele-hello-lyrics
1,Send My Love (To Your New Lover),https://genius.com/Adele-send-my-love-to-your-...
2,I Miss You,https://genius.com/Adele-i-miss-you-lyrics
3,When We Were Young,https://genius.com/Adele-when-we-were-young-ly...
4,Remedy,https://genius.com/Adele-remedy-lyrics
5,Water Under the Bridge,https://genius.com/Adele-water-under-the-bridg...
6,River Lea,https://genius.com/Adele-river-lea-lyrics
7,Love in the Dark,https://genius.com/Adele-love-in-the-dark-lyrics
8,Million Years Ago,https://genius.com/Adele-million-years-ago-lyrics
9,All I Ask,https://genius.com/Adele-all-i-ask-lyrics


I was about to collect the lyrics with these links, add them to the data frame, and separate the lyrics into different text files. However, the HTML text of the lyrics was too complicated to comprehend (It looks quite different from the HTML text above. Maybe it's javascript.). Therefore, I had to give up and extract the scripts manually.  

Now that I failed in extracting lyrics, I intended to continue with preprocessing and annotating my text files and merge the generated Dataframe and  songs_df1 into a new Dataframe at last.

# Installing, Importing and Preprocessing

In [117]:
# Install and import spacy and plotly.
!pip install spaCy
!pip install plotly
!pip install nbformat==5.1.2



In [118]:
# Import spacy
import spacy

# Install English language model
!spacy download en_core_web_sm

# Import os to upload documents and metadata
import os

# Load spaCy visualizer
from spacy import displacy

# Import pandas DataFrame packages
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

# Import graphing package
import plotly.graph_objects as go
import plotly.express as px

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     - -------------------------------------- 0.4/12.8 MB 8.5 MB/s eta 0:00:02
     -- ------------------------------------- 0.9/12.8 MB 11.4 MB/s eta 0:00:02
     ---- ----------------------------------- 1.4/12.8 MB 11.1 MB/s eta 0:00:02
     ------ --------------------------------- 1.9/12.8 MB 11.3 MB/s eta 0:00:01
     ------- -------------------------------- 2.4/12.8 MB 10.8 MB/s eta 0:00:01
     -------- ------------------------------- 2.8/12.8 MB 11.1 MB/s eta 0:00:01
     ---------- ----------------------------- 3.3/12.8 MB 10.9 MB/s eta 0:00:01
     ----------- ---------------------------- 3.7/12.8 MB 11.2 MB/s eta 0:00:01
     ------------ --------------------------- 4.1/12.8 MB 11.0 MB/s eta 0:00:01
     -------------- -------------

In [119]:
# Create empty lists for file names and contents
texts = []
file_names = []

# Iterate through each file in the folder
for _file_name in os.listdir('Adele_Album 25_Songs'):
# Look for only text files
    if _file_name.endswith('.txt'):
    # Append contents of each text file to text list
        texts.append(open('Adele_Album 25_Songs' + '/' + _file_name, 'r', encoding='utf-8').read())
        # Append name of each file to file name list
        file_names.append(_file_name)
file_names

['All I Ask.txt',
 "Can't Let Go.txt",
 'Hello.txt',
 'I Miss You.txt',
 'Lay Me Down.txt',
 'Love in the Dark.txt',
 'Million Years Ago.txt',
 'Remedy.txt',
 'River Lea.txt',
 'Send My Love (To Your New Lover).txt',
 'Sweetest Devotion.txt',
 'Water Under the Bridge.txt',
 'When We Were Young.txt',
 'Why Do You Love Me.txt']

In [120]:
# Create a dataframe associating each file name with its text
text_df = pd.DataFrame({'Filename':file_names,'Document':texts})
text_df

Unnamed: 0,Filename,Document
0,All I Ask.txt,I will leave my heart at the door\nI won't say...
1,Can't Let Go.txt,When did it go wrong? I will never know\nI hav...
2,Hello.txt,"Hello, it's me\nI was wondering if, after all ..."
3,I Miss You.txt,I want every single piece of you\nI want your ...
4,Lay Me Down.txt,I would never lie to you unless you tell me to...
5,Love in the Dark.txt,Take your eyes off of me so I can leave\nI'm f...
6,Million Years Ago.txt,"\nI only wanted to have fun\nLearning to fly, ..."
7,Remedy.txt,I remember all of the things that I thought I ...
8,River Lea.txt,Everybody tells me it's 'bout time that I move...
9,Send My Love (To Your New Lover).txt,"This was all you, none of it me\nYou put your ..."


In [121]:
# Remove extra spaces from documents
text_df['Text'] = text_df['Document'].str.replace('\s+', ' ', regex=True).str.strip()
text_df

Unnamed: 0,Filename,Document,Text
0,All I Ask.txt,I will leave my heart at the door\nI won't say...,I will leave my heart at the door I won't say ...
1,Can't Let Go.txt,When did it go wrong? I will never know\nI hav...,When did it go wrong? I will never know I have...
2,Hello.txt,"Hello, it's me\nI was wondering if, after all ...","Hello, it's me I was wondering if, after all t..."
3,I Miss You.txt,I want every single piece of you\nI want your ...,I want every single piece of you I want your h...
4,Lay Me Down.txt,I would never lie to you unless you tell me to...,I would never lie to you unless you tell me to...
5,Love in the Dark.txt,Take your eyes off of me so I can leave\nI'm f...,Take your eyes off of me so I can leave I'm fa...
6,Million Years Ago.txt,"\nI only wanted to have fun\nLearning to fly, ...","I only wanted to have fun Learning to fly, lea..."
7,Remedy.txt,I remember all of the things that I thought I ...,I remember all of the things that I thought I ...
8,River Lea.txt,Everybody tells me it's 'bout time that I move...,Everybody tells me it's 'bout time that I move...
9,Send My Love (To Your New Lover).txt,"This was all you, none of it me\nYou put your ...","This was all you, none of it me You put your h..."


Now we can merge the metadata and text_df into a new Dataframe.

In [122]:
# Load metadata.
metadata_df = pd.read_csv('C:/Users/14359/Desktop/metadata.csv')

# Remove .txt from title of each paper
text_df['Filename'] = text_df['Filename'].str.replace('.txt', '', regex=True)

# Rename column from Title to Filename
metadata_df.rename(columns={"Title": "Filename"}, inplace=True)

# Merge metadata and papers into new DataFrame
# Will only keep rows where both essay and metadata are present
songs_df2 = metadata_df.merge(text_df,on='Filename')
songs_df2

Unnamed: 0,Filename,Track,Producer,Document,Text
0,Hello,1,Greg Kurstin,"Hello, it's me\nI was wondering if, after all ...","Hello, it's me I was wondering if, after all t..."
1,Send My Love (To Your New Lover),2,Shellback & Max Martin,"This was all you, none of it me\nYou put your ...","This was all you, none of it me You put your h..."
2,I Miss You,3,Paul Epworth,I want every single piece of you\nI want your ...,I want every single piece of you I want your h...
3,When We Were Young,4,Ariel Rechtshaid,Everybody loves the things you do\nFrom the wa...,Everybody loves the things you do From the way...
4,Remedy,5,Ryan Tedder,I remember all of the things that I thought I ...,I remember all of the things that I thought I ...
5,Water Under the Bridge,6,Greg Kurstin,If you're not the one for me\nThen how come I ...,If you're not the one for me Then how come I c...
6,River Lea,7,Danger Mouse,Everybody tells me it's 'bout time that I move...,Everybody tells me it's 'bout time that I move...
7,Love in the Dark,8,Samuel Dixon,Take your eyes off of me so I can leave\nI'm f...,Take your eyes off of me so I can leave I'm fa...
8,Million Years Ago,9,Greg Kurstin,"\nI only wanted to have fun\nLearning to fly, ...","I only wanted to have fun Learning to fly, lea..."
9,All I Ask,10,The Smeezingtons,I will leave my heart at the door\nI won't say...,I will leave my heart at the door I won't say ...


# Text Enrichment with spaCy

The lyrics are clean. Therefore, we continue to produce the columns of tokens, lemmas, and parts of speech. First, we need to load nlp pipelines. 

In [123]:
# Load nlp pipeline
nlp = spacy.load('en_core_web_sm')

# Check what functions it performs
print(nlp.pipe_names)

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']


In [124]:
# Define a function that runs the nlp pipeline on any given input text
def process_text(text):
    return nlp(text)

Now we can apply the function to the Document column and restore the result in a new column called Doc. Based on Doc, we can get tokens, lemmas, and parts of speech by using different functions.

In [125]:
# Apply the function to the "Document" column, so that the nlp pipeline is called on each song
songs_df2['Doc'] = songs_df2['Text'].apply(process_text)
songs_df2

Unnamed: 0,Filename,Track,Producer,Document,Text,Doc
0,Hello,1,Greg Kurstin,"Hello, it's me\nI was wondering if, after all ...","Hello, it's me I was wondering if, after all t...","(Hello, ,, it, 's, me, I, was, wondering, if, ..."
1,Send My Love (To Your New Lover),2,Shellback & Max Martin,"This was all you, none of it me\nYou put your ...","This was all you, none of it me You put your h...","(This, was, all, you, ,, none, of, it, me, You..."
2,I Miss You,3,Paul Epworth,I want every single piece of you\nI want your ...,I want every single piece of you I want your h...,"(I, want, every, single, piece, of, you, I, wa..."
3,When We Were Young,4,Ariel Rechtshaid,Everybody loves the things you do\nFrom the wa...,Everybody loves the things you do From the way...,"(Everybody, loves, the, things, you, do, From,..."
4,Remedy,5,Ryan Tedder,I remember all of the things that I thought I ...,I remember all of the things that I thought I ...,"(I, remember, all, of, the, things, that, I, t..."
5,Water Under the Bridge,6,Greg Kurstin,If you're not the one for me\nThen how come I ...,If you're not the one for me Then how come I c...,"(If, you, 're, not, the, one, for, me, Then, h..."
6,River Lea,7,Danger Mouse,Everybody tells me it's 'bout time that I move...,Everybody tells me it's 'bout time that I move...,"(Everybody, tells, me, it, 's, 'bout, time, th..."
7,Love in the Dark,8,Samuel Dixon,Take your eyes off of me so I can leave\nI'm f...,Take your eyes off of me so I can leave I'm fa...,"(Take, your, eyes, off, of, me, so, I, can, le..."
8,Million Years Ago,9,Greg Kurstin,"\nI only wanted to have fun\nLearning to fly, ...","I only wanted to have fun Learning to fly, lea...","(I, only, wanted, to, have, fun, Learning, to,..."
9,All I Ask,10,The Smeezingtons,I will leave my heart at the door\nI won't say...,I will leave my heart at the door I won't say ...,"(I, will, leave, my, heart, at, the, door, I, ..."


Based on Doc, we can get tokens, lemmas, and parts of speech by using different functions.

#### Tokenization

In [126]:
# Define a function to retrieve tokens from a Doc object
def get_token(doc):
    return [(token.text) for token in doc]
    
# Run the token retrieval function on the doc objects in the dataframe
songs_df2['Tokens'] = songs_df2['Doc'].apply(get_token)
songs_df2

Unnamed: 0,Filename,Track,Producer,Document,Text,Doc,Tokens
0,Hello,1,Greg Kurstin,"Hello, it's me\nI was wondering if, after all ...","Hello, it's me I was wondering if, after all t...","(Hello, ,, it, 's, me, I, was, wondering, if, ...","[Hello, ,, it, 's, me, I, was, wondering, if, ..."
1,Send My Love (To Your New Lover),2,Shellback & Max Martin,"This was all you, none of it me\nYou put your ...","This was all you, none of it me You put your h...","(This, was, all, you, ,, none, of, it, me, You...","[This, was, all, you, ,, none, of, it, me, You..."
2,I Miss You,3,Paul Epworth,I want every single piece of you\nI want your ...,I want every single piece of you I want your h...,"(I, want, every, single, piece, of, you, I, wa...","[I, want, every, single, piece, of, you, I, wa..."
3,When We Were Young,4,Ariel Rechtshaid,Everybody loves the things you do\nFrom the wa...,Everybody loves the things you do From the way...,"(Everybody, loves, the, things, you, do, From,...","[Everybody, loves, the, things, you, do, From,..."
4,Remedy,5,Ryan Tedder,I remember all of the things that I thought I ...,I remember all of the things that I thought I ...,"(I, remember, all, of, the, things, that, I, t...","[I, remember, all, of, the, things, that, I, t..."
5,Water Under the Bridge,6,Greg Kurstin,If you're not the one for me\nThen how come I ...,If you're not the one for me Then how come I c...,"(If, you, 're, not, the, one, for, me, Then, h...","[If, you, 're, not, the, one, for, me, Then, h..."
6,River Lea,7,Danger Mouse,Everybody tells me it's 'bout time that I move...,Everybody tells me it's 'bout time that I move...,"(Everybody, tells, me, it, 's, 'bout, time, th...","[Everybody, tells, me, it, 's, 'bout, time, th..."
7,Love in the Dark,8,Samuel Dixon,Take your eyes off of me so I can leave\nI'm f...,Take your eyes off of me so I can leave I'm fa...,"(Take, your, eyes, off, of, me, so, I, can, le...","[Take, your, eyes, off, of, me, so, I, can, le..."
8,Million Years Ago,9,Greg Kurstin,"\nI only wanted to have fun\nLearning to fly, ...","I only wanted to have fun Learning to fly, lea...","(I, only, wanted, to, have, fun, Learning, to,...","[I, only, wanted, to, have, fun, Learning, to,..."
9,All I Ask,10,The Smeezingtons,I will leave my heart at the door\nI won't say...,I will leave my heart at the door I won't say ...,"(I, will, leave, my, heart, at, the, door, I, ...","[I, will, leave, my, heart, at, the, door, I, ..."


#### Lemmatization


In [127]:
# Define a function to retrieve lemmas from a doc object
def get_lemma(doc):
    return [(token.lemma_) for token in doc]

# Run the lemma retrieval function on the doc objects in the dataframe
songs_df2['Lemmas'] = songs_df2['Doc'].apply(get_lemma)
songs_df2

Unnamed: 0,Filename,Track,Producer,Document,Text,Doc,Tokens,Lemmas
0,Hello,1,Greg Kurstin,"Hello, it's me\nI was wondering if, after all ...","Hello, it's me I was wondering if, after all t...","(Hello, ,, it, 's, me, I, was, wondering, if, ...","[Hello, ,, it, 's, me, I, was, wondering, if, ...","[hello, ,, it, be, I, I, be, wonder, if, ,, af..."
1,Send My Love (To Your New Lover),2,Shellback & Max Martin,"This was all you, none of it me\nYou put your ...","This was all you, none of it me You put your h...","(This, was, all, you, ,, none, of, it, me, You...","[This, was, all, you, ,, none, of, it, me, You...","[this, be, all, you, ,, none, of, it, I, you, ..."
2,I Miss You,3,Paul Epworth,I want every single piece of you\nI want your ...,I want every single piece of you I want your h...,"(I, want, every, single, piece, of, you, I, wa...","[I, want, every, single, piece, of, you, I, wa...","[I, want, every, single, piece, of, you, I, wa..."
3,When We Were Young,4,Ariel Rechtshaid,Everybody loves the things you do\nFrom the wa...,Everybody loves the things you do From the way...,"(Everybody, loves, the, things, you, do, From,...","[Everybody, loves, the, things, you, do, From,...","[everybody, love, the, thing, you, do, from, t..."
4,Remedy,5,Ryan Tedder,I remember all of the things that I thought I ...,I remember all of the things that I thought I ...,"(I, remember, all, of, the, things, that, I, t...","[I, remember, all, of, the, things, that, I, t...","[I, remember, all, of, the, thing, that, I, th..."
5,Water Under the Bridge,6,Greg Kurstin,If you're not the one for me\nThen how come I ...,If you're not the one for me Then how come I c...,"(If, you, 're, not, the, one, for, me, Then, h...","[If, you, 're, not, the, one, for, me, Then, h...","[if, you, be, not, the, one, for, I, then, how..."
6,River Lea,7,Danger Mouse,Everybody tells me it's 'bout time that I move...,Everybody tells me it's 'bout time that I move...,"(Everybody, tells, me, it, 's, 'bout, time, th...","[Everybody, tells, me, it, 's, 'bout, time, th...","[everybody, tell, I, it, be, 'bout, time, that..."
7,Love in the Dark,8,Samuel Dixon,Take your eyes off of me so I can leave\nI'm f...,Take your eyes off of me so I can leave I'm fa...,"(Take, your, eyes, off, of, me, so, I, can, le...","[Take, your, eyes, off, of, me, so, I, can, le...","[take, your, eye, off, of, I, so, I, can, leav..."
8,Million Years Ago,9,Greg Kurstin,"\nI only wanted to have fun\nLearning to fly, ...","I only wanted to have fun Learning to fly, lea...","(I, only, wanted, to, have, fun, Learning, to,...","[I, only, wanted, to, have, fun, Learning, to,...","[I, only, want, to, have, fun, Learning, to, f..."
9,All I Ask,10,The Smeezingtons,I will leave my heart at the door\nI won't say...,I will leave my heart at the door I won't say ...,"(I, will, leave, my, heart, at, the, door, I, ...","[I, will, leave, my, heart, at, the, door, I, ...","[I, will, leave, my, heart, at, the, door, I, ..."


#### Part of Speech Tagging

In [128]:
# Define a function to retrieve parts of speech from a doc object
def get_pos(doc):
    #Return the coarse- and fine-grained part of speech text for each token in the doc
    return [(token.pos_, token.tag_) for token in doc]

songs_df2['POS'] = songs_df2['Doc'].apply(get_pos)
songs_df2

Unnamed: 0,Filename,Track,Producer,Document,Text,Doc,Tokens,Lemmas,POS
0,Hello,1,Greg Kurstin,"Hello, it's me\nI was wondering if, after all ...","Hello, it's me I was wondering if, after all t...","(Hello, ,, it, 's, me, I, was, wondering, if, ...","[Hello, ,, it, 's, me, I, was, wondering, if, ...","[hello, ,, it, be, I, I, be, wonder, if, ,, af...","[(INTJ, UH), (PUNCT, ,), (PRON, PRP), (AUX, VB..."
1,Send My Love (To Your New Lover),2,Shellback & Max Martin,"This was all you, none of it me\nYou put your ...","This was all you, none of it me You put your h...","(This, was, all, you, ,, none, of, it, me, You...","[This, was, all, you, ,, none, of, it, me, You...","[this, be, all, you, ,, none, of, it, I, you, ...","[(PRON, DT), (AUX, VBD), (PRON, DT), (PRON, PR..."
2,I Miss You,3,Paul Epworth,I want every single piece of you\nI want your ...,I want every single piece of you I want your h...,"(I, want, every, single, piece, of, you, I, wa...","[I, want, every, single, piece, of, you, I, wa...","[I, want, every, single, piece, of, you, I, wa...","[(PRON, PRP), (VERB, VBP), (DET, DT), (ADJ, JJ..."
3,When We Were Young,4,Ariel Rechtshaid,Everybody loves the things you do\nFrom the wa...,Everybody loves the things you do From the way...,"(Everybody, loves, the, things, you, do, From,...","[Everybody, loves, the, things, you, do, From,...","[everybody, love, the, thing, you, do, from, t...","[(PRON, NN), (VERB, VBZ), (DET, DT), (NOUN, NN..."
4,Remedy,5,Ryan Tedder,I remember all of the things that I thought I ...,I remember all of the things that I thought I ...,"(I, remember, all, of, the, things, that, I, t...","[I, remember, all, of, the, things, that, I, t...","[I, remember, all, of, the, thing, that, I, th...","[(PRON, PRP), (VERB, VBP), (PRON, DT), (ADP, I..."
5,Water Under the Bridge,6,Greg Kurstin,If you're not the one for me\nThen how come I ...,If you're not the one for me Then how come I c...,"(If, you, 're, not, the, one, for, me, Then, h...","[If, you, 're, not, the, one, for, me, Then, h...","[if, you, be, not, the, one, for, I, then, how...","[(SCONJ, IN), (PRON, PRP), (AUX, VBP), (PART, ..."
6,River Lea,7,Danger Mouse,Everybody tells me it's 'bout time that I move...,Everybody tells me it's 'bout time that I move...,"(Everybody, tells, me, it, 's, 'bout, time, th...","[Everybody, tells, me, it, 's, 'bout, time, th...","[everybody, tell, I, it, be, 'bout, time, that...","[(PRON, NN), (VERB, VBZ), (PRON, PRP), (PRON, ..."
7,Love in the Dark,8,Samuel Dixon,Take your eyes off of me so I can leave\nI'm f...,Take your eyes off of me so I can leave I'm fa...,"(Take, your, eyes, off, of, me, so, I, can, le...","[Take, your, eyes, off, of, me, so, I, can, le...","[take, your, eye, off, of, I, so, I, can, leav...","[(VERB, VB), (PRON, PRP$), (NOUN, NNS), (ADP, ..."
8,Million Years Ago,9,Greg Kurstin,"\nI only wanted to have fun\nLearning to fly, ...","I only wanted to have fun Learning to fly, lea...","(I, only, wanted, to, have, fun, Learning, to,...","[I, only, wanted, to, have, fun, Learning, to,...","[I, only, want, to, have, fun, Learning, to, f...","[(PRON, PRP), (ADV, RB), (VERB, VBD), (PART, T..."
9,All I Ask,10,The Smeezingtons,I will leave my heart at the door\nI won't say...,I will leave my heart at the door I won't say ...,"(I, will, leave, my, heart, at, the, door, I, ...","[I, will, leave, my, heart, at, the, door, I, ...","[I, will, leave, my, heart, at, the, door, I, ...","[(PRON, PRP), (AUX, MD), (VERB, VB), (PRON, PR..."


If we don't know what these detailed taggings mean, we can call spacy.explain()

In [129]:
spacy.explain('UH')

'interjection'

#### Named Entity Recognition

In [130]:
# Define function to extract named entities from doc objects
def extract_named_entities(doc):
    return [ent.label_ for ent in doc.ents]

# Apply function to Doc column and store resulting named entities in new column
songs_df2['Named Entities'] = songs_df2['Doc'].apply(extract_named_entities)
songs_df2['Named Entities']

0     [GPE, QUANTITY, CARDINAL, CARDINAL, CARDINAL, ...
1                                                    []
2     [PRODUCT, PRODUCT, PERSON, PRODUCT, PRODUCT, P...
3                                           [TIME, GPE]
4                                         [WORK_OF_ART]
5                            [WORK_OF_ART, WORK_OF_ART]
6             [LOC, LOC, DATE, LOC, LOC, LOC, LOC, LOC]
7                                            [ORG, ORG]
8                              [DATE, DATE, DATE, DATE]
9                                                    []
10                                      [LOC, LOC, LOC]
11                                                [ORG]
12                                   [TIME, TIME, TIME]
13                                            [ORDINAL]
Name: Named Entities, dtype: object

Let's take a look at the named entities in the song called I Miss You.

In [131]:
# Extract the first Doc object
doc = songs_df2['Doc'][2]

# Visualize named entity tagging in a single paper
displacy.render(doc, style='ent', jupyter=True)

# Download Enriched Dataset

Now we can combine songs_df1 (generated by web scraping) with songs_df2 and download it as a csv file.

In [133]:
final_songs_df=songs_df1.merge(songs_df2, on="Filename")
final_songs_df.to_csv('Adele_Album 25_Songs.csv')