# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

In [23]:
html_page = requests.get("https://www.residentadvisor.net/events/us/texas/week/2020-01-31")
soup = BeautifulSoup(html_page.content, 'html.parser')
items = soup.find('div', id='event-listing')
#event_name = [a.text.strip() for a in items.findAll('h1', class_='event-title')]
#date = [p.text.rstrip('/') for p in items.findAll('p', class_='eventDate date')]
attend = [float(p.text[0]) for p in items.findAll('p', class_='attending')]
print(attend)

[3.0, 2.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0]


In [24]:
def scrape_events(events_page_url):
    #Your code here
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    items = soup.find('div', id='event-listing')
    event_name = [a.text.strip() for a in items.findAll('h1', class_='event-title')]
    venues = [a.text.split(' at ') for a in items.findAll('h1', class_='event-title')]
    venue = []
    for v in venues:
        venue.append(v[1])
    event_date = [p.text.rstrip('/') for p in items.findAll('p', class_='eventDate date')]
    attendees = [float(p.text[0]) for p in items.findAll('p', class_='attending')]
    df = pd.DataFrame([event_name, venue, event_date, attendees]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [25]:
scrape_events("https://www.residentadvisor.net/events/us/texas/week/2020-01-31")

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Debbie Does Disco: Disco Dive Sessions PT. 6 a...,"Tradewinds Social Club, Dallas/Fort Worth","Fri, 31 Jan 2020",3.0
1,All We Have Is Now presents Will Clarke (Open ...,"Bauhaus, Houston","Sat, 01 Feb 2020",2.0
2,Sequence Process x [ Technorotica ] at It'll D...,"It'll Do, Dallas/Fort Worth","Wed, 05 Feb 2020",1.0
3,"Midwest Sessions with Jonene at Plush, Austin","Plush, Austin","Thu, 06 Feb 2020",1.0
4,Cirque Noir presents Chaim at The Pershing Hou...,"The Pershing House Gallery, Austin",,2.0
5,Massive Frequencies at Love's Marina,Love's Marina,,1.0
6,"Avision at Chily at Club Here I Love You, El Paso",Chily,,2.0
7,"Chaka Khan in Concert at Arena Theatre, Houston","Arena Theatre, Houston",,1.0
8,"Deep End at Bauhaus, Houston","Bauhaus, Houston",,1.0
9,"In Praise at Coconut Club, Austin","Coconut Club, Austin",,


## Write a Function to Retrieve the URL for the Next Page

In [26]:
def next_page(url):
    #Your code here
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    ext = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page = "https://www.residentadvisor.net" + ext
    return next_page

In [27]:
scrape_events(next_page("https://www.residentadvisor.net/events/us/texas/week/2020-01-31"))

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"DJ Tennis at It'll Do, Dallas/Fort Worth","It'll Do, Dallas/Fort Worth","Fri, 07 Feb 2020",5.0
1,ubiyu x Night with JT Donaldson & Brett Johnso...,TBA - Austin,"Sat, 08 Feb 2020",2.0
2,"A Very James Reed - The Dive at The Dive, Houston","The Dive, Houston","Sun, 09 Feb 2020",1.0
3,Feb 7th - Resurgence Party - Will Konitzer 2 H...,"Plush ATX, Austin","Mon, 10 Feb 2020",1.0
4,"Abstractions at Bauhaus, Houston","Bauhaus, Houston","Wed, 12 Feb 2020",1.0
5,Ubiyu x Night with JT Donaldson & Brett Johnso...,"Secret Location, Austin","Thu, 13 Feb 2020",1.0
6,The Airport Session Electro-Sax Jams with Noah...,The San Antonio International Airport,,5.0
7,"Telefon Tel Aviv at Barracuda, Austin","Barracuda, Austin",,2.0
8,Neumatic Music and AV Expression present Mr.C ...,"AV Expression, San Antonio",,1.0
9,Luna L'Amour Full Moon DnB Party at TBA - Aust...,"TBA - Austin, Austin",,1.0


## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [28]:
#Your code here
thousand_events = []
df_length = 0
html_page = "https://www.residentadvisor.net/events/us/texas/week/2020-01-31"
while df_length <= 1000:
    df = scrape_events(html_page)
    thousand_events.append(df)
    df_length = df_length + len(thousand_events)
    html_page = next_page(html_page)
events_df = pd.concat(thousand_events)
events_df.head()

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Debbie Does Disco: Disco Dive Sessions PT. 6 a...,"Tradewinds Social Club, Dallas/Fort Worth","Fri, 31 Jan 2020",3
1,All We Have Is Now presents Will Clarke (Open ...,"Bauhaus, Houston","Sat, 01 Feb 2020",2
2,Sequence Process x [ Technorotica ] at It'll D...,"It'll Do, Dallas/Fort Worth","Wed, 05 Feb 2020",1
3,"Midwest Sessions with Jonene at Plush, Austin","Plush, Austin","Thu, 06 Feb 2020",1
4,Cirque Noir presents Chaim at The Pershing Hou...,"The Pershing House Gallery, Austin",,2


In [31]:
sort_events_df = events_df.sort_values(by=['Number_of_Attendees'], ascending=False)
sort_events_df.head(15)

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
9,The Fairmount Hotel at Rules & Regs @ The Fair...,"Rules & Regs @ The Fairmount Hotel, Austin",,8
1,"Carl Cox Invites at Stereo Live, Houston","Stereo Live, Houston","Sat, 14 Mar 2020",8
7,Range Emotion x Karmina (Feat. Word of Command...,"807 Hutchins Road, Dallas/Fort Worth",,8
8,This Is How We Do It (90s/2000s Valentines Dan...,"East Austin Hotel, Austin",,6
17,Cancelled - Denied Music Afterhours (Austin Mu...,"The Oven, Austin",,6
0,Nightmares on Wax Smokers Delight 25th Anniver...,"Mohawk, Austin","Fri, 17 Apr 2020",6
5,PY1 Nights (Closing Weekend) - Underworld with...,"PY1 Pyramid, Dallas/Fort Worth",,5
10,"Jacques Greene (dj set) at It'll Do, Dallas/Fo...","It'll Do, Dallas/Fort Worth",,5
4,Midwest Sessions presents Nathanael Stewart at...,"Plush, Austin","Wed, 18 Mar 2020",5
6,The Airport Session Electro-Sax Jams with Noah...,The San Antonio International Airport,,5


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!