# GNOD week 6

## LAB | Web Scraping Single Page (GNOD part 1)

- Check the case_study_gnod.md file.
- Make sure you've understood the big picture of your project:
    - the goal of the company (Gnod),
    - their current product (Gnoosic),
    - their strategy, and
    - how your project fits into this context.
- Re-read the business case and the e-mail from the CTO.

**Instructions - Scraping popular songs** <br>
Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will also enjoy a recommendation of another song that is popular at the moment.

You have to find data on the internet about currently popular songs. Popvortex maintains a weekly Top 100 of "hot" songs here: http://www.popvortex.com/music/charts/top-100-songs.php.

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

In [1]:
# import libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd

import random
from time import sleep

In [2]:
# find url and store it in variable
url = "https://www.popvortex.com/music/charts/top-100-songs.php"

In [3]:
# download html with GET req and check status code
response = requests.get(url)
response.status_code


200

In [4]:
# create the soup
soup = BeautifulSoup(response.content, "html.parser")
# soup

In [5]:
# check that everything is okay
# print(soup.prettify())

In [6]:
# retrieve desired info
# for song in soup.select("body > div.container > div:nth-child(4) > div.col-xs-12.col-md-8 > div.chart-wrapper > div.feed-item.music-chart.flex-row"):
#     print(song.cite.get_text(), song.em.get_text())

In [7]:
# init empty lists
songs = []
artists = []

# save copied selector into a var
path = "body > div.container > div:nth-child(4) > div.col-xs-12.col-md-8 > div.chart-wrapper > div.feed-item.music-chart.flex-row"

# grab necessary items and append it to respective list
for i in soup.select(path):
    songs.append(i.cite.get_text())
    artists.append(i.em.get_text())

In [40]:
# create the df with organised info
top_100 = pd.DataFrame({"song": songs,
                        "artist": artists})

top_100

Unnamed: 0,song,artist
0,Margaritaville,Jimmy Buffett
1,Come Monday,Jimmy Buffett
2,Rich Men North of Richmond,Oliver Anthony Music
3,Cheeseburger In Paradise,Jimmy Buffett
4,"Changes In Latitudes, Changes In Attitudes",Jimmy Buffett
...,...,...
95,Can't Get Enough of You Baby,Smash Mouth
96,Spirit In the Sky,Norman Greenbaum
97,bad idea right?,Olivia Rodrigo
98,Whiteboyz,Tom MacDonald & Adam Calhoun


## LAB | Web Scraping Multiple Pages

**Expand the project** <br>
If you're done, you can try to expand the project on your own. 
- Chosen option: expand by using Eurovision songs in the last 20 years.

In [9]:
# find url and store it in var
url = "https://en.wikipedia.org/wiki/List_of_Eurovision_Song_Contest_winners"

In [10]:
# dowload html with get req
response = requests.get(url)
response.status_code

200

In [11]:
## make good soup
soup = BeautifulSoup(response.content, "html.parser")

In [12]:
euwinners = soup.select("table")[0]

In [13]:
# This will find all `a` tags under the third(2nd index) `td` of it's type
eusongs = []

for tag in euwinners.select("td:nth-of-type(2) a"):
    eusongs.append(tag["href"])

In [14]:
# # send request with full song link
# url = "https://en.wikipedia.org" + eusongs[0]
# response = requests.get(url)
# print(response.status_code)

# # make soup
# soup = BeautifulSoup(response.content, "html.parser")
# soup.select("table.infobox")


200


[<table class="infobox"><tbody><tr><th class="infobox-above" colspan="2" style="background: #BFDFFF;"><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Switzerland" title="Switzerland"><img alt="Switzerland" class="mw-file-element" data-file-height="512" data-file-width="512" decoding="async" height="16" src="//upload.wikimedia.org/wikipedia/commons/thumb/0/08/Flag_of_Switzerland_%28Pantone%29.svg/16px-Flag_of_Switzerland_%28Pantone%29.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/0/08/Flag_of_Switzerland_%28Pantone%29.svg/24px-Flag_of_Switzerland_%28Pantone%29.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/08/Flag_of_Switzerland_%28Pantone%29.svg/32px-Flag_of_Switzerland_%28Pantone%29.svg.png 2x" width="16"/></a></span></span> "Refrain"</th></tr><tr><td class="infobox-image" colspan="2"><span typeof="mw:File"><a class="mw-file-description" href="/wiki/File:Lys_Assia_-_Refrain.jpg"><img class="mw-file-element" data-fi

In [65]:
# # 2. find url and store it in a variable

# eusongs_soups = []

# # if it stop sdue to long timeout err, use the index of the leftovers to fill it  
# for song in eusongs:
#     # send request
#     url = "https://en.wikipedia.org" + song
#     response = requests.get(url)
#     print(song, response.status_code)

#     # parse & store html
#     soup = BeautifulSoup(response.content, "html.parser")
#     eusongs_soups.append(soup.select("table.infobox"))

#     # respectful nap:
#     wait_time = random.randint(1,4)
#     print("I will sleep for " + str(wait_time) + " second/s.")
#     sleep(wait_time)

In [16]:
### get song, country and artist from infocards ready for extraction
## standard approach:   
# eusongs_soups[5][0].find("th", string = "Country").parent.select("a")[0].get_text()
# eusongs_soups[5][0].find("th", string = "Artist(s)").parent.select("a")[0].get_text()
# eusongs_soups[5][0].find("th", string = "Artist(s)").parent.select("div")[0].get_text()

# ## if main card is extended and some details are not in standard order - usually in hits:
# eusongs_soups[57][0].select("th.infobox-above")[0].get_text()                           # single title
# eusongs_soups[57][0].select("th.infobox-header")[0].select("a")[1].get_text()           # artist name

# ## if details are in 2nd infocard:
# eusongs_soups[57][1].find("th", string = "Country").parent.select("a")[0].get_text()    # country in 2nd card

# ## if they have pseudonyms or groups and individuals are both shown:
# eusongs[0][0].find("th", string = "As").parent.select("a")[0].get_text()                # group name after members list


In [41]:
# extract song name, country and artists
eusongs = []
euartists = []
eucountries = []

for song in eusongs_soups:
    try:
        eusongs.append(song[0].select("th.infobox-above")[0].get_text())
    except:
        eusongs.append("NA")
    try:
        euartists.append(song[0].find("th", string = "Artist(s)").parent.select("a")[0].get_text())
    except:
        try:
            euartists.append(song[0].select("th.infobox-header")[0].select("a")[1].get_text())
        except:
            try:
                euartists.append(song[0].find("th", string = "As").parent.select("a")[0].get_text())
            except:
                try:
                    euartists.append(song[0].find("th", string = "Artist(s)").parent.select("div")[0].get_text())
                except:
                    euartists.append("NA")
    try:
        eucountries.append(song[0].find("th", string = "Country").parent.select("a")[0].get_text())
    except:
        try:
            eucountries.append(song[1].find("th", string = "Country").parent.select("a")[0].get_text())
        except:
            eucountries.append("NA")

euwinners_df = pd.DataFrame({"song":eusongs,
                             "artist":euartists,
                             "country":eucountries})


In [None]:
# drop leftover covid row and country to concat
euwinners_df = euwinners_df.drop(index = 67, columns = 'country').reset_index(drop=True)

In [None]:
# remove quotation marks from eu songs for consistency + language-specific special chars that affect search
euwinners_df['song'] = euwinners_df.song.str.replace('"', '')
euwinners_df['song'] = euwinners_df.song.replace(r'^ | $','',regex=True)

In [47]:
# concat the two dfs
hot_songs = pd.concat([top_100, euwinners_df]).reset_index(drop=True)
hot_songs

Unnamed: 0,song,artist
0,Margaritaville,Jimmy Buffett
1,Come Monday,Jimmy Buffett
2,Rich Men North of Richmond,Oliver Anthony Music
3,Cheeseburger In Paradise,Jimmy Buffett
4,"Changes In Latitudes, Changes In Attitudes",Jimmy Buffett
...,...,...
165,Toy,Netta
166,Arcade,Duncan Laurence
167,Zitti e buoni,Måneskin
168,Stefania,Kalush Orchestra


In [66]:
# save data to csv to stop it from changing every day we run
hot_songs.to_csv('hot_songs.csv')

## Song recommendation

In [1]:
# import libraries and data from here on to not run the scraping again
from bs4 import BeautifulSoup
import requests
import pandas as pd

import random

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from random import randint
from time import sleep

from pandas import json_normalize
pd.set_option("display.max_columns", 0) #no limit to cols we want to see

hot_songs = pd.read_csv('hot_songs.csv')
playlist = pd.read_pickle('playlist_clustered.pkl')

### 1st iteration

In [88]:
## user input to search song
search = input("Want a recommendation?").capitalize()
print(search)

Tattoo


In [89]:
# receive a song and recommend another random one from the df
if search.capitalize() in hot_songs.song.values:
    random_idx = random.randint(0, len(hot_songs)-1)   # randomise each time we recommend
    print("-Paris Hilton voice- That's hot! Here's another recommendation for you:", hot_songs['song'].iloc[random_idx], 
        "by", 
        hot_songs['artist'].iloc[random_idx])
else:
    print("Get better taste, babes.")
    # get a random song from that cluster

-Paris Hilton voice- That's hot! Here's another recommendation for you: Everyway That I Can by Sertab Erener


In [None]:
def recommender(query, spotsearch, df1, df2):
    if query.capitalize() in df1.song.values:
        random_idx = random.randint(0, len(df1)-1)   # randomise each time we recommend
        print("-Paris Hilton voice- That's hot! Here's another recommendation for you:", df1['song'].iloc[random_idx], 
        "by", 
        df1['artist'].iloc[random_idx])
    else:
        spotsearch(query, df2)

***

### 2nd iteration

##### Spotify API Auth

In [2]:
# auth path with creds text file
secrets_file = open('secrets.txt','r') 

# read and split to ready for creds
string = secrets_file.read()
string.split('\n')

# use this to make it a dictionary
secrets_dict={}
for line in string.split('\n'):
    if len(line) > 0:
        #print(line.split(':'))
        secrets_dict[line.split(':')[0]]=line.split(':')[1].strip()

In [3]:
# spotipy init with user credentials
sp = spotipy.Spotify(auth_manager = SpotifyClientCredentials(client_id = secrets_dict['clientid'],
                                                            client_secret = secrets_dict['clientsecret']))

##### Queries and extraction

TASK: you need to create a collection of songs with their audio features - as large as possible! you could start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. Or you could grab other playlists from spotify. Up to you!

In [7]:
# for the most diverse options, we will use the top10k songs of all time
playlist = sp.user_playlist_tracks('spotify', '1G8IpkZKobrIlXcVPoSIuf') #we're limiting to 50 items returned

{'href': 'https://api.spotify.com/v1/playlists/1G8IpkZKobrIlXcVPoSIuf/tracks?offset=0&limit=100&additional_types=track',
 'items': [{'added_at': '2020-11-29T15:02:07Z',
   'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/acclaimedmusic'},
    'href': 'https://api.spotify.com/v1/users/acclaimedmusic',
    'id': 'acclaimedmusic',
    'type': 'user',
    'uri': 'spotify:user:acclaimedmusic'},
   'is_local': False,
   'primary_color': None,
   'track': {'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/74ASZWbe4lXaubB36ztrGX'},
       'href': 'https://api.spotify.com/v1/artists/74ASZWbe4lXaubB36ztrGX',
       'id': '74ASZWbe4lXaubB36ztrGX',
       'name': 'Bob Dylan',
       'type': 'artist',
       'uri': 'spotify:artist:74ASZWbe4lXaubB36ztrGX'}],
     'available_markets': ['AR',
      'AU',
      'AT',
      'BE',
      'BO',
      'BR',
      'BG',
      'CA',
      'CL',
      'CO',
      'CR',
     

In [8]:
# assess structure
playlist.keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [16]:
# we want tracks inside of items
playlist['items'][0].keys()
# playlist['items'][0]['track']['artists'][0]['name'] # to get the first artist
# playlist['items'][0]['track']['name'] # to get the song


dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])

In [14]:
len(playlist['items']) #page limit for reading

100

In [17]:
# get all 10k songs
def get_playlist_tracks(playlist_id):
    results = sp.user_playlist_tracks("spotify",playlist_id)
    tracks = results['items']
    while results['next']!=None: # while there is a val in playlist[next]
        results = sp.next(results)
        tracks = tracks + results['items'] #instead of append, it concats only items
        sleep(randint(1,3000)/1000) # respectful nap bc some website apis put a call limit
    return tracks

In [18]:
playlist_tracks = get_playlist_tracks('1G8IpkZKobrIlXcVPoSIuf')
len(playlist_tracks)

10000

In [20]:
tracks_norm = json_normalize(playlist_tracks)

In [32]:
# some songs and artists info is null - we drop them instead of filling as it's a rec app, we need the info straight from the source
tracks_norm.isna().sum()
tracks_norm.dropna(subset = 'track.name', inplace=True)

In [33]:
# transform artists in tracks as a dictionary of dicttionaries for easier iteration - dictioception
def list_to_dict(x):
    return {i: x[i] for i in range(len(x))}

tracks_norm['artist_dict'] = tracks_norm['track.artists'].apply(list_to_dict)

In [34]:
# expand the dictioception to add a col connecting the artist collaborating and their profiles
def expand_list(row):
    df = pd.DataFrame.from_dict(row['artist_dict'], orient='index')
    df['song_id'] = row['track.id']
    return df

tracks_norm['artists_dfs'] = tracks_norm.apply(expand_list, axis=1)

In [35]:
# we create a new df with all these artists + keep an eye on uri
artists = pd.DataFrame(columns=['external_urls', 'href', 'id', 'name', 'type', 'uri'])
for mini_df in tracks_norm['artists_dfs']:
    artists = pd.concat([artists, mini_df], axis=0)

In [36]:
# we merge the two dfs
merged = pd.merge(left = tracks_norm,
                right = artists,
                how = 'inner',
                left_on = 'track.id',
                right_on = 'song_id')

In [37]:
playlist_merged = merged[['track.name', 'name', 'song_id']]

Unnamed: 0,track.name,name,song_id
0,Like a Rolling Stone,Bob Dylan,3AhXZa8sUQht0UEdBJgpGc
1,Smells Like Teen Spirit,Nirvana,3oTlkzk1OtrhH8wBAduVEi
2,A Day In The Life - Remastered,The Beatles,3ZFBeIyP41HhnALjxWy1pR
3,Good Vibrations (Mono),The Beach Boys,5Qt4Cc66g24QWwGP3YYV9y
4,Johnny B Goode,Chuck Berry,7MH2ZclofPlTrZOkPzZKhK
...,...,...,...
13526,Into The Valley,Skids,2QSD3K3b3BJ8DPhGhQfDPW
13527,Tonight's Da Night,Redman,49XnDVsYOHgV4gFZeCojKj
13528,Figure 8,FKA twigs,5Y9IIH8Xmo1nuk0gfFjc4Q
13529,Like An Angel,The Mighty Lemon Drops,0ya0JYEFoXNviB8RMeHDtW


In [42]:
# we have 2601 nans in song_id - we will drop them 
playlist_merged.isna().sum()
playlist_merged.dropna(subset = 'song_id', inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  playlist_merged.dropna(subset='song_id', inplace=True)


In [44]:
playlist_merged

Unnamed: 0,track.name,name,song_id
0,Like a Rolling Stone,Bob Dylan,3AhXZa8sUQht0UEdBJgpGc
1,Smells Like Teen Spirit,Nirvana,3oTlkzk1OtrhH8wBAduVEi
2,A Day In The Life - Remastered,The Beatles,3ZFBeIyP41HhnALjxWy1pR
3,Good Vibrations (Mono),The Beach Boys,5Qt4Cc66g24QWwGP3YYV9y
4,Johnny B Goode,Chuck Berry,7MH2ZclofPlTrZOkPzZKhK
...,...,...,...
13526,Into The Valley,Skids,2QSD3K3b3BJ8DPhGhQfDPW
13527,Tonight's Da Night,Redman,49XnDVsYOHgV4gFZeCojKj
13528,Figure 8,FKA twigs,5Y9IIH8Xmo1nuk0gfFjc4Q
13529,Like An Angel,The Mighty Lemon Drops,0ya0JYEFoXNviB8RMeHDtW


In [6]:
# get audio feats for each of the songs using the song_id
chunks = [(i, i+100) for i in range(0, len(playlist_merged), 100)]
audio_featslist = []

for chunk in chunks:
    id_100 = playlist_merged['song_id'][chunk[0]:chunk[1]] 
    audio_featslist = audio_featslist + sp.audio_features(id_100)
    sleep(randint(1,3000)/1000)


In [46]:
len(audio_featslist)

10930

In [47]:
audio_feats_norm = json_normalize(audio_featslist)

In [48]:
audio_feats_norm

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.482,0.721,0,-6.839,1,0.0321,0.731000,0.000000,0.1890,0.557,95.263,audio_features,3AhXZa8sUQht0UEdBJgpGc,spotify:track:3AhXZa8sUQht0UEdBJgpGc,https://api.spotify.com/v1/tracks/3AhXZa8sUQht...,https://api.spotify.com/v1/audio-analysis/3AhX...,369600,4
1,0.485,0.863,1,-9.027,1,0.0495,0.000012,0.016200,0.1380,0.767,116.835,audio_features,3oTlkzk1OtrhH8wBAduVEi,spotify:track:3oTlkzk1OtrhH8wBAduVEi,https://api.spotify.com/v1/tracks/3oTlkzk1Otrh...,https://api.spotify.com/v1/audio-analysis/3oTl...,300977,4
2,0.364,0.457,4,-14.162,0,0.0675,0.290000,0.000106,0.9220,0.175,163.219,audio_features,3ZFBeIyP41HhnALjxWy1pR,spotify:track:3ZFBeIyP41HhnALjxWy1pR,https://api.spotify.com/v1/tracks/3ZFBeIyP41Hh...,https://api.spotify.com/v1/audio-analysis/3ZFB...,337413,4
3,0.398,0.413,1,-10.934,1,0.0388,0.082200,0.000025,0.0891,0.331,133.574,audio_features,5Qt4Cc66g24QWwGP3YYV9y,spotify:track:5Qt4Cc66g24QWwGP3YYV9y,https://api.spotify.com/v1/tracks/5Qt4Cc66g24Q...,https://api.spotify.com/v1/audio-analysis/5Qt4...,219147,4
4,0.518,0.756,10,-10.851,1,0.0915,0.735000,0.000062,0.3170,0.968,166.429,audio_features,7MH2ZclofPlTrZOkPzZKhK,spotify:track:7MH2ZclofPlTrZOkPzZKhK,https://api.spotify.com/v1/tracks/7MH2ZclofPlT...,https://api.spotify.com/v1/audio-analysis/7MH2...,160893,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10925,0.332,0.707,7,-12.698,1,0.0361,0.000012,0.006120,0.1100,0.652,144.815,audio_features,2QSD3K3b3BJ8DPhGhQfDPW,spotify:track:2QSD3K3b3BJ8DPhGhQfDPW,https://api.spotify.com/v1/tracks/2QSD3K3b3BJ8...,https://api.spotify.com/v1/audio-analysis/2QSD...,199467,4
10926,0.464,0.749,6,-8.564,1,0.4800,0.224000,0.000046,0.3510,0.879,181.121,audio_features,49XnDVsYOHgV4gFZeCojKj,spotify:track:49XnDVsYOHgV4gFZeCojKj,https://api.spotify.com/v1/tracks/49XnDVsYOHgV...,https://api.spotify.com/v1/audio-analysis/49Xn...,201800,4
10927,0.694,0.710,2,-9.793,1,0.3400,0.527000,0.001350,0.0697,0.415,119.964,audio_features,5Y9IIH8Xmo1nuk0gfFjc4Q,spotify:track:5Y9IIH8Xmo1nuk0gfFjc4Q,https://api.spotify.com/v1/tracks/5Y9IIH8Xmo1n...,https://api.spotify.com/v1/audio-analysis/5Y9I...,183040,4
10928,0.332,0.800,1,-9.746,1,0.0326,0.000368,0.001600,0.0850,0.832,149.240,audio_features,0ya0JYEFoXNviB8RMeHDtW,spotify:track:0ya0JYEFoXNviB8RMeHDtW,https://api.spotify.com/v1/tracks/0ya0JYEFoXNv...,https://api.spotify.com/v1/audio-analysis/0ya0...,222160,4


In [49]:
# prep for merging
# reset index of first df before merging
playlist_merged.reset_index(drop=True, inplace=True)

# drop duplicates from audio feats
audio_feats_norm.drop_duplicates(inplace=True)


In [50]:
playlist_merged

Unnamed: 0,track.name,name,song_id
0,Like a Rolling Stone,Bob Dylan,3AhXZa8sUQht0UEdBJgpGc
1,Smells Like Teen Spirit,Nirvana,3oTlkzk1OtrhH8wBAduVEi
2,A Day In The Life - Remastered,The Beatles,3ZFBeIyP41HhnALjxWy1pR
3,Good Vibrations (Mono),The Beach Boys,5Qt4Cc66g24QWwGP3YYV9y
4,Johnny B Goode,Chuck Berry,7MH2ZclofPlTrZOkPzZKhK
...,...,...,...
10925,Into The Valley,Skids,2QSD3K3b3BJ8DPhGhQfDPW
10926,Tonight's Da Night,Redman,49XnDVsYOHgV4gFZeCojKj
10927,Figure 8,FKA twigs,5Y9IIH8Xmo1nuk0gfFjc4Q
10928,Like An Angel,The Mighty Lemon Drops,0ya0JYEFoXNviB8RMeHDtW


In [51]:
audio_feats_norm

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.482,0.721,0,-6.839,1,0.0321,0.731000,0.000000,0.1890,0.557,95.263,audio_features,3AhXZa8sUQht0UEdBJgpGc,spotify:track:3AhXZa8sUQht0UEdBJgpGc,https://api.spotify.com/v1/tracks/3AhXZa8sUQht...,https://api.spotify.com/v1/audio-analysis/3AhX...,369600,4
1,0.485,0.863,1,-9.027,1,0.0495,0.000012,0.016200,0.1380,0.767,116.835,audio_features,3oTlkzk1OtrhH8wBAduVEi,spotify:track:3oTlkzk1OtrhH8wBAduVEi,https://api.spotify.com/v1/tracks/3oTlkzk1Otrh...,https://api.spotify.com/v1/audio-analysis/3oTl...,300977,4
2,0.364,0.457,4,-14.162,0,0.0675,0.290000,0.000106,0.9220,0.175,163.219,audio_features,3ZFBeIyP41HhnALjxWy1pR,spotify:track:3ZFBeIyP41HhnALjxWy1pR,https://api.spotify.com/v1/tracks/3ZFBeIyP41Hh...,https://api.spotify.com/v1/audio-analysis/3ZFB...,337413,4
3,0.398,0.413,1,-10.934,1,0.0388,0.082200,0.000025,0.0891,0.331,133.574,audio_features,5Qt4Cc66g24QWwGP3YYV9y,spotify:track:5Qt4Cc66g24QWwGP3YYV9y,https://api.spotify.com/v1/tracks/5Qt4Cc66g24Q...,https://api.spotify.com/v1/audio-analysis/5Qt4...,219147,4
4,0.518,0.756,10,-10.851,1,0.0915,0.735000,0.000062,0.3170,0.968,166.429,audio_features,7MH2ZclofPlTrZOkPzZKhK,spotify:track:7MH2ZclofPlTrZOkPzZKhK,https://api.spotify.com/v1/tracks/7MH2ZclofPlT...,https://api.spotify.com/v1/audio-analysis/7MH2...,160893,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10925,0.332,0.707,7,-12.698,1,0.0361,0.000012,0.006120,0.1100,0.652,144.815,audio_features,2QSD3K3b3BJ8DPhGhQfDPW,spotify:track:2QSD3K3b3BJ8DPhGhQfDPW,https://api.spotify.com/v1/tracks/2QSD3K3b3BJ8...,https://api.spotify.com/v1/audio-analysis/2QSD...,199467,4
10926,0.464,0.749,6,-8.564,1,0.4800,0.224000,0.000046,0.3510,0.879,181.121,audio_features,49XnDVsYOHgV4gFZeCojKj,spotify:track:49XnDVsYOHgV4gFZeCojKj,https://api.spotify.com/v1/tracks/49XnDVsYOHgV...,https://api.spotify.com/v1/audio-analysis/49Xn...,201800,4
10927,0.694,0.710,2,-9.793,1,0.3400,0.527000,0.001350,0.0697,0.415,119.964,audio_features,5Y9IIH8Xmo1nuk0gfFjc4Q,spotify:track:5Y9IIH8Xmo1nuk0gfFjc4Q,https://api.spotify.com/v1/tracks/5Y9IIH8Xmo1n...,https://api.spotify.com/v1/audio-analysis/5Y9I...,183040,4
10928,0.332,0.800,1,-9.746,1,0.0326,0.000368,0.001600,0.0850,0.832,149.240,audio_features,0ya0JYEFoXNviB8RMeHDtW,spotify:track:0ya0JYEFoXNviB8RMeHDtW,https://api.spotify.com/v1/tracks/0ya0JYEFoXNv...,https://api.spotify.com/v1/audio-analysis/0ya0...,222160,4


In [52]:
# merge 2 dfs to have final product
playlist_complete = pd.merge(left=playlist_merged,
                        right=audio_feats_norm,
                        how='inner',
                        left_on='song_id', #df1
                        right_on='id') #df2

In [53]:
playlist_complete

Unnamed: 0,track.name,name,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,Like a Rolling Stone,Bob Dylan,3AhXZa8sUQht0UEdBJgpGc,0.482,0.721,0,-6.839,1,0.0321,0.731000,0.000000,0.1890,0.557,95.263,audio_features,3AhXZa8sUQht0UEdBJgpGc,spotify:track:3AhXZa8sUQht0UEdBJgpGc,https://api.spotify.com/v1/tracks/3AhXZa8sUQht...,https://api.spotify.com/v1/audio-analysis/3AhX...,369600,4
1,Smells Like Teen Spirit,Nirvana,3oTlkzk1OtrhH8wBAduVEi,0.485,0.863,1,-9.027,1,0.0495,0.000012,0.016200,0.1380,0.767,116.835,audio_features,3oTlkzk1OtrhH8wBAduVEi,spotify:track:3oTlkzk1OtrhH8wBAduVEi,https://api.spotify.com/v1/tracks/3oTlkzk1Otrh...,https://api.spotify.com/v1/audio-analysis/3oTl...,300977,4
2,A Day In The Life - Remastered,The Beatles,3ZFBeIyP41HhnALjxWy1pR,0.364,0.457,4,-14.162,0,0.0675,0.290000,0.000106,0.9220,0.175,163.219,audio_features,3ZFBeIyP41HhnALjxWy1pR,spotify:track:3ZFBeIyP41HhnALjxWy1pR,https://api.spotify.com/v1/tracks/3ZFBeIyP41Hh...,https://api.spotify.com/v1/audio-analysis/3ZFB...,337413,4
3,Good Vibrations (Mono),The Beach Boys,5Qt4Cc66g24QWwGP3YYV9y,0.398,0.413,1,-10.934,1,0.0388,0.082200,0.000025,0.0891,0.331,133.574,audio_features,5Qt4Cc66g24QWwGP3YYV9y,spotify:track:5Qt4Cc66g24QWwGP3YYV9y,https://api.spotify.com/v1/tracks/5Qt4Cc66g24Q...,https://api.spotify.com/v1/audio-analysis/5Qt4...,219147,4
4,Johnny B Goode,Chuck Berry,7MH2ZclofPlTrZOkPzZKhK,0.518,0.756,10,-10.851,1,0.0915,0.735000,0.000062,0.3170,0.968,166.429,audio_features,7MH2ZclofPlTrZOkPzZKhK,spotify:track:7MH2ZclofPlTrZOkPzZKhK,https://api.spotify.com/v1/tracks/7MH2ZclofPlT...,https://api.spotify.com/v1/audio-analysis/7MH2...,160893,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10925,Into The Valley,Skids,2QSD3K3b3BJ8DPhGhQfDPW,0.332,0.707,7,-12.698,1,0.0361,0.000012,0.006120,0.1100,0.652,144.815,audio_features,2QSD3K3b3BJ8DPhGhQfDPW,spotify:track:2QSD3K3b3BJ8DPhGhQfDPW,https://api.spotify.com/v1/tracks/2QSD3K3b3BJ8...,https://api.spotify.com/v1/audio-analysis/2QSD...,199467,4
10926,Tonight's Da Night,Redman,49XnDVsYOHgV4gFZeCojKj,0.464,0.749,6,-8.564,1,0.4800,0.224000,0.000046,0.3510,0.879,181.121,audio_features,49XnDVsYOHgV4gFZeCojKj,spotify:track:49XnDVsYOHgV4gFZeCojKj,https://api.spotify.com/v1/tracks/49XnDVsYOHgV...,https://api.spotify.com/v1/audio-analysis/49Xn...,201800,4
10927,Figure 8,FKA twigs,5Y9IIH8Xmo1nuk0gfFjc4Q,0.694,0.710,2,-9.793,1,0.3400,0.527000,0.001350,0.0697,0.415,119.964,audio_features,5Y9IIH8Xmo1nuk0gfFjc4Q,spotify:track:5Y9IIH8Xmo1nuk0gfFjc4Q,https://api.spotify.com/v1/tracks/5Y9IIH8Xmo1n...,https://api.spotify.com/v1/audio-analysis/5Y9I...,183040,4
10928,Like An Angel,The Mighty Lemon Drops,0ya0JYEFoXNviB8RMeHDtW,0.332,0.800,1,-9.746,1,0.0326,0.000368,0.001600,0.0850,0.832,149.240,audio_features,0ya0JYEFoXNviB8RMeHDtW,spotify:track:0ya0JYEFoXNviB8RMeHDtW,https://api.spotify.com/v1/tracks/0ya0JYEFoXNv...,https://api.spotify.com/v1/audio-analysis/0ya0...,222160,4


In [54]:
# pickling our final file for future use
# playlist_complete.to_pickle('playlist.pkl')

In [56]:
# practice: to read pickles
# playlist_ex = pd.read_pickle('playlist.pkl')
# playlist_ex