# Web Scraping with Beautiful Soup - Lab

## Introduction

Now that you've read and seen some docmentation regarding the use of Beautiful Soup, its time to practice and put that to work! In this lab you'll formalize some of our example code into functions and scrape the lyrics from an artist of your choice.

## Objectives
You will be able to:
* Scrape Static webpages
* Select specific elements from the DOM

## Link Scraping

Write a function to collect the links to each of the song pages from a given artist page.

In [39]:
#Starter Code

from bs4 import BeautifulSoup
import requests
import pandas as pd

url= 'https://www.azlyrics.com/f/falloutboy.html'


html_page = requests.get(url) #Make a get request to retrieve the page
soup = BeautifulSoup(html_page.content, 'html.parser') #Pass the page contents to beautiful soup for parsing
albums = soup.find_all('div', attrs={'class':'album'})

In [43]:
allsongs = albums[0].findNextSiblings('a')

In [97]:
def scrapesongs(url):

    html_page = requests.get(url) #Make a get request to retrieve the page
    soup = BeautifulSoup(html_page.content, 'html.parser') #Pass the page contents to beautiful soup for parsing
    albums = soup.find_all('div', attrs={'class':'album'})
    
    data = []

    for album_n in range(len(albums)):
        if album_n == len(albums)-1:
            cur_album = albums[album_n]
            album_songs = cur_album.findNextSiblings('a')
            for song in album_songs:
                page = song.get('href')
                title = song.text
                album = cur_album.text
                data.append((title, page, album))
        else:
            cur_album = albums[album_n]
            next_album = albums[album_n+1]
            saca = cur_album.findNextSiblings('a') #songs after current album
            sbna = next_album.findPreviousSiblings('a') #songs before next album
            album_songs = [song for song in saca if song in sbna] #album songs are those listed after the current album but before the next one!
            for song in album_songs:
                page = song.get('href')
                title = song.text
                album = cur_album.text
                lyrics = getlyrics(page)
                print(title)
                data.append((title, page, album, lyrics))
    return data

In [98]:
df = pd.DataFrame(scrapesongs('https://www.azlyrics.com/f/falloutboy.html'),columns=('name', 'link', 'album', 'lyrics'))

In [99]:
df

Unnamed: 0,name,link,album,lyrics
0,Honorable Mention,../lyrics/falloutboy/honorablemention.html,"album: ""Fall Out Boy's Evening Out With Your G...",\n\r\nI served out my detention\r\nAnd in the ...
1,Calm Before The Storm,../lyrics/falloutboy/calmbeforethestorm96412.html,"album: ""Fall Out Boy's Evening Out With Your G...",\n\r\nI sat outside my front window...this sto...
2,Switchblades And Infidelity,../lyrics/falloutboy/switchbladesandinfidelity...,"album: ""Fall Out Boy's Evening Out With Your G...",\n\r\nWalking out on the show is walking out o...
3,Pretty In Punk,../lyrics/falloutboy/prettyinpunk.html,"album: ""Fall Out Boy's Evening Out With Your G...",\n\r\nWalking off that stage tonight\r\nI know...
4,Growing Up,../lyrics/falloutboy/growingup.html,"album: ""Fall Out Boy's Evening Out With Your G...","\n\r\nI dried my eyes, now I crust them with s..."
5,The World's Not Waiting (For Five Tired Boys I...,../lyrics/falloutboy/theworldsnotwaitingforfiv...,"album: ""Fall Out Boy's Evening Out With Your G...",\n\r\nThis might just be a waste of time\r\nth...
6,"Short, Fast And Loud",../lyrics/falloutboy/shortfastandloud.html,"album: ""Fall Out Boy's Evening Out With Your G...",\n\r\nShe's shallow like the shoreline during ...
7,Moving Pictures,../lyrics/falloutboy/movingpictures.html,"album: ""Fall Out Boy's Evening Out With Your G...",\n\r\nLast night I saw a movie\r\nand I though...
8,Parker Lewis Can't Lose (But I'm Gonna Give It...,../lyrics/falloutboy/parkerlewiscantlosebutimg...,"album: ""Fall Out Boy's Evening Out With Your G...",\n\r\nYou laughed off my affections\r\nWhile I...
9,Tell That Mick He Just Made My List Of Things ...,../lyrics/falloutboy/tellthatmickhejustmademyl...,"album: ""Take This To Your Grave"" (2003)","\n\r\nLight that smoke, that one for giving up..."


## Text Scraping
Write a secondary function that scrapes the lyrics for each song page.

In [93]:
#Example page
def getlyrics(url):
    extension = url[2:]
    baseurl= 'https://www.azlyrics.com'
    fullurl = baseurl + extension
    
    html_page = requests.get(fullurl)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    lyrics = soup.find('div', attrs={'class':'ringtone'}).findNextSibling('div').text
    return lyrics

In [94]:
getlyrics('../lyrics/falloutboy/honorablemention.html')

"\n\r\nI served out my detention\r\nAnd in the end I got an honorable mention\n\r\nIn the movie of my life\r\nStarring you\r\nInstead of me\n\r\nWhen the moonlight\r\nHits your bright eyes I go blind\r\nAnd maybe next time\r\nI'll remember not to tell you something stupid like I'll never leave your side\n\r\nLike the oldest movie I ever saw was the one we wrote together\r\nI said I hate you but I'd never change a thing\r\nI can be your John Cusack\n\r\nI burnt out\r\nMy defensive\r\nNow everything I say is taken as offensive\n\r\nIn the movie of my life (movie of my life.. yeah)\r\nStarring you\r\nInstead of me\n\r\nWhen the moonlight\r\nHits your bright eyes I go blind\r\nAnd maybe next time\r\nI'll remember not to tell you something stupid like I'll never leave your side\n\r\nLike the oldest movie I ever saw was the one we wrote together\r\nI said I hate you but I'd never change a thing\r\nI can be your John Cusack\n\r\nOldest movie I ever saw was the one we wrote together\r\nI said 

## Synthesizing
Create a script using your two functions above to scrape all of the song lyrics for a given artist.


In [None]:
#Use this block for your code!


## Visualizing
Generate two bar graphs to compare lyrical changes for the artist of your chose. For example, the two bar charts could compare the lyrics for two different songs or two different albums.

In [None]:
#Use this block for your code!

## Level - Up

Think about how you structured the data from your web scraper. Did you scrape the entire song lyrics verbatim? Did you simply store the words and their frequency counts, or did you do something else entirely? List out a few different options for how you could have stored this data. What are advantages and disadvantages of each? Be specific and think about what sort of analyses each representation would lend itself to.

In [None]:
#Use this block for your code!

## Summary

Congratulations! You've now practiced your Beautiful Soup knowledge!