# Analysis of the popular music of the last 65 years

### Web Scrapping

The objective is to get the Billboard Year End Hot 100 singles for each year between 1960 and 2010 included.

This data is available on Wikipedia. Therefore, the first task is to extract this data from the Wikipedia pages and write it on several csv files (one per year).

The format of the useful Wikipedia URLs is the following:
https://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_{year}

##### Imports - API settings - Constants definition

In [1]:
from bs4 import BeautifulSoup # This package is used for scraping the data from Wikipedia
import urllib2
import csv

In [2]:
# Define the starting and ending years 
start_year = 2015
end_year = 2015

##### Getting the data from Wikipedia

In [3]:
# Creation of a list of integers corresponding to all the years we are interested in
years = []
for i in range(start_year, end_year + 1):
    years.append(i)

In [4]:
# For each year, load the Wikipedia page and store the page contents in the soup variable
for year in years:
    wiki = "https://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_" + str(year)
    header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
    req = urllib2.Request(wiki,headers=header)
    page = urllib2.urlopen(req)
    soup = BeautifulSoup(page)
    
    # The rankings are stored in an html table
    table = soup.find("table", { "class" : "wikitable sortable" })
    
    # This table will be written in a CSV file
    
    # Open CSV file
    f = open('CSV_data/Billboard-charts/Billboard_Year-End_Hot_100_singles_of_' + str(year) + '.csv', 'w')

    try:
        # CSV writer
        writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)

        # Table header
        header_cells = table.findAll("th", {"scope" : "col"})
        header_row_string = []
        for index, header_cell in enumerate(header_cells):
            if index == 0:
                header_row_string.append("Num")
            else:
                header_row_string.append(header_cell.find(text=True))
        writer.writerow(header_row_string)

        # Table contents
        for row in table.findAll("tr"):
            row_string = []
            header_row_cells = row.find("th", {"scope" : "row"})
            if header_row_cells != None:
                row_string.append(header_row_cells.find(text=True))
                
            cells = row.findAll("td")
            for cell in cells:
                text = "".join(cell.findAll(text=True))
                if text[0] == '"' and text[len(text) - 1] == '"':
                    text = text[1:-1]
                row_string.append(text.encode('utf-8'))
            writer.writerow(row_string)

    finally:
        # Close file
        f.close()
