# GNOD week 6

## LAB | Web Scraping Single Page (GNOD part 1)

- Check the case_study_gnod.md file.
- Make sure you've understood the big picture of your project:
    - the goal of the company (Gnod),
    - their current product (Gnoosic),
    - their strategy, and
    - how your project fits into this context.
- Re-read the business case and the e-mail from the CTO.

**Instructions - Scraping popular songs** <br>
Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will also enjoy a recommendation of another song that is popular at the moment.

You have to find data on the internet about currently popular songs. Popvortex maintains a weekly Top 100 of "hot" songs here: http://www.popvortex.com/music/charts/top-100-songs.php.

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

In [1]:
# import libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
# find url and store it in variable
url = "https://www.popvortex.com/music/charts/top-100-songs.php"

In [3]:
# download html with GET req and check status code
response = requests.get(url)
response.status_code


200

In [4]:
# create the soup
soup = BeautifulSoup(response.content, "html.parser")
soup

<!DOCTYPE html>
<html lang="en"><head><meta charset="utf-8"/><title>iTunes Top 100 Songs Chart 2023</title><meta content="width=device-width, initial-scale=1" name="viewport"/><meta content="iTunes top 100 songs chart list. The most popular hit music and trending songs of 2023. Chart of today's current iTunes top 100 songs is updated daily." name="description"/><meta content="iTunes Top 100 Songs Chart 2023" property="og:title"><meta content="Chart of the top 100 songs on iTunes. Chart list of the top 100 song downloads of 2023 is updated daily." property="og:description"><meta content="article" property="og:type"><meta content="https://www.popvortex.com/images/logo-facebook.png" property="og:image"/><meta content="PopVortex" property="og:site_name"/><meta content="https://www.popvortex.com/music/charts/top-100-songs.php" property="og:url"/><meta content="100000239962942" property="fb:admins"/><meta content="178831188827052" property="fb:app_id"/><link href="/favicon.png" rel="shortcut

In [5]:
# check that everything is okay
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   iTunes Top 100 Songs Chart 2023
  </title>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <meta content="iTunes top 100 songs chart list. The most popular hit music and trending songs of 2023. Chart of today's current iTunes top 100 songs is updated daily." name="description"/>
  <meta content="iTunes Top 100 Songs Chart 2023" property="og:title">
   <meta content="Chart of the top 100 songs on iTunes. Chart list of the top 100 song downloads of 2023 is updated daily." property="og:description">
    <meta content="article" property="og:type">
     <meta content="https://www.popvortex.com/images/logo-facebook.png" property="og:image"/>
     <meta content="PopVortex" property="og:site_name"/>
     <meta content="https://www.popvortex.com/music/charts/top-100-songs.php" property="og:url"/>
     <meta content="100000239962942" property="fb:admins"/>
     <meta content="178831188827052

In [23]:
# retrieve desired info
for song in soup.select("body > div.container > div:nth-child(4) > div.col-xs-12.col-md-8 > div.chart-wrapper > div.feed-item.music-chart.flex-row"):
    print(song.cite.get_text(), song.em.get_text())

Margaritaville Jimmy Buffett
Come Monday Jimmy Buffett
Rich Men North of Richmond Oliver Anthony Music
Cheeseburger In Paradise Jimmy Buffett
Changes In Latitudes, Changes In Attitudes Jimmy Buffett
A Pirate Looks at Forty Jimmy Buffett
It's Five O'Clock Somewhere (Live) Jimmy Buffett
Paint The Town Red Doja Cat
Last Time I Saw You Nicki Minaj
I Remember Everything (feat. Kacey Musgraves) Zach Bryan
Son of a Son of a Sailor Jimmy Buffett
Fast Car Luke Combs
Lil Boo Thang Paul Russell
Used To Be Young Miley Cyrus
I Want to go Home Oliver Anthony Music
Fins Jimmy Buffett
Cruel Summer Taylor Swift
Last Night Morgan Wallen
Need A Favor Jelly Roll
Southern Cross (Live) Jimmy Buffett
Try That In A Small Town Jason Aldean
Keep Going Up Timbaland, Nelly Furtado & Justin Timberlake
White Horse Chris Stapleton
Brown Eyed Girl (Live) Jimmy Buffett
Why Don't We Get Drunk Jimmy Buffett
Dance The Night Dua Lipa
It's Five O'Clock Somewhere (with Jimmy Buffett) Alan Jackson
90 some Chevy Oliver Anthon

In [24]:
# init empty lists
songs = []
artists = []

# save copied selector into a var
path = "body > div.container > div:nth-child(4) > div.col-xs-12.col-md-8 > div.chart-wrapper > div.feed-item.music-chart.flex-row"

# grab necessary items and append it to respective list
for i in soup.select(path):
    songs.append(i.cite.get_text())
    artists.append(i.em.get_text())

print(songs)
print(artists)

['Margaritaville', 'Come Monday', 'Rich Men North of Richmond', 'Cheeseburger In Paradise', 'Changes In Latitudes, Changes In Attitudes', 'A Pirate Looks at Forty', "It's Five O'Clock Somewhere (Live)", 'Paint The Town Red', 'Last Time I Saw You', 'I Remember Everything (feat. Kacey Musgraves)', 'Son of a Son of a Sailor', 'Fast Car', 'Lil Boo Thang', 'Used To Be Young', 'I Want to go Home', 'Fins', 'Cruel Summer', 'Last Night', 'Need A Favor', 'Southern Cross (Live)', 'Try That In A Small Town', 'Keep Going Up', 'White Horse', 'Brown Eyed Girl (Live)', "Why Don't We Get Drunk", 'Dance The Night', "It's Five O'Clock Somewhere (with Jimmy Buffett)", '90 some Chevy', 'Save Me (with Lainey Wilson)', 'Aint Gotta Dollar', 'Watermelon Moonshine', 'Indiana Jones', 'Volcano', 'Rockstar', 'Flowers', 'Thinkin’ Bout Me', 'Religiously', 'Lose Control', 'Dreams', 'Used To Be Young', 'Southern Cross', 'Calm Down', 'vampire', 'Trip Around the Sun', 'How You Remind Me', 'Son Of A Sinner', 'Beat You Th

In [25]:
# create the df with organised info
top_100 = pd.DataFrame({"song_title": songs,
                        "artist": artists})

top_100

Unnamed: 0,song_title,artist
0,Margaritaville,Jimmy Buffett
1,Come Monday,Jimmy Buffett
2,Rich Men North of Richmond,Oliver Anthony Music
3,Cheeseburger In Paradise,Jimmy Buffett
4,"Changes In Latitudes, Changes In Attitudes",Jimmy Buffett
...,...,...
95,Bring Me to Life,Evanescence
96,Demons,Doja Cat
97,Edge of Seventeen,Stevie Nicks
98,Bad Moon Rising,Creedence Clearwater Revival
