# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [79]:
def scrape_events(events_page_url):
    events_page = requests.get(events_page_url)
    events_soup = BeautifulSoup(events_page.content, 'html.parser')
    events_listing = events_soup.find('div', id="event-listing")
    
    #Finding the event names
    event_titles = [h1.find('a').string for h1 in events_listing.findAll('h1')]
    
    #Finding the event venues
    event_venues = [h1.find('span').contents[1].string for h1 in events_listing.findAll('h1')]
    
    #Finding the event dates
    event_dates = [parse(article.find('time').attrs['datetime']).strftime('%d-%m-%Y') 
                   for article in events_listing.findAll('article')]
    
    #Finding the event attendees
    event_attendees = []
    for article in events_listing.findAll('article'):
        if article.find('p', class_="attending"):
            event_attendees.append(int(article.find('p', class_="attending").contents[0].string))
        else:
            event_attendees.append(0)
        
    df = pd.DataFrame([event_titles, event_venues, event_dates, event_attendees]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

## Write a Function to Retrieve the URL for the Next Page

In [80]:
def next_page(url):
    events_page = requests.get(url)
    events_soup = BeautifulSoup(events_page.content, 'html.parser')
    
    domain_url = urlparse(url).scheme + "://" + urlparse(url).netloc
    next_page_rel_url = events_soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page_url = domain_url + next_page_rel_url
    
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [87]:
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
import time
from datetime import datetime
from dateutil.parser import parse
from urllib.parse import urlparse

all_events = []
total_rows = 0
events_url = "https://www.residentadvisor.net/events"
while total_rows <= 500:
    page_events = scrape_events(events_url)
    all_events.append(page_events)
    total_rows += len(page_events)
    events_url = next_page(events_url)
    time.sleep(.2)
df = pd.concat(all_events)
df = df.iloc[:500]
print(len(df))
df.head()

500


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Percolate presents Horse Meat Disco,The Brixton Courtyard,19-08-2020,5
1,DJ Tutorials (The London DJ School),Egg London,19-08-2020,2
2,Venue MOT: Hot Desk,Venue MOT Unit 18,19-08-2020,1
3,Engage Audio presents Liquid Sessions at Costa...,The Cause,20-08-2020,20
4,Venue MOT: Hot Desk,Venue MOT Unit 18,20-08-2020,1


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!