# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
# Load the https://www.residentadvisor.net/events page in your browser.
from bs4 import BeautifulSoup
import requests
import pandas as pd

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [4]:
request = requests.get('https://www.residentadvisor.net/events')
soup = BeautifulSoup(request.content, 'html.parser')

In [44]:
soup.find_all('h1', attrs={'class':'event-title'})[2].find('span', {'class':'grey'}).text

'TBA - Brooklyn'

In [35]:
soup.find_all('p', attrs={'class':'eventDate date'})

[<p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=2&amp;yr=2020&amp;dy=9"><span>Sun, 09 Feb 2020 /</span></a></p>,
 <p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=2&amp;yr=2020&amp;dy=10"><span>Mon, 10 Feb 2020 /</span></a></p>,
 <p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=2&amp;yr=2020&amp;dy=11"><span>Tue, 11 Feb 2020 /</span></a></p>,
 <p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=2&amp;yr=2020&amp;dy=12"><span>Wed, 12 Feb 2020 /</span></a></p>,
 <p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=2&amp;yr=2020&amp;dy=13"><span>Thu, 13 Feb 2020 /</span></a></p>,
 <p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=2&amp;yr=2020&amp;dy=14"><span>Fri, 14 Feb 2020 /</span></a></p>,
 <p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=2&amp;yr=2020&amp;dy=15"><span>Sat, 15 Feb 2020 /</span></a></p>]

In [66]:
def scrape_events(events_page_url):
    #Your code here
    request = requests.get(events_page_url)
    soup = BeautifulSoup(request.content, 'html.parser')
    Event_Name = []
    Venue = []
    Event_Date = []
    Number_of_Attendees = []
    fullevents = soup.find_all('h1', attrs={'class':'event-title'})
    for n,event in enumerate(fullevents):
        Event_Name.append(event.find_all('a')[0].text)
        print(Event_Name[n])
        try:
            Venue.append(event.find_all('a')[1].text)
        except:
            Venue.append(event.find('span', {'class':'grey'}))
        try:
            Number_of_Attendees.append(event.nextSibling.nextSibling.text)
        except:
            Number_of_Attendees.append(0)
    print(len(Event_Name))
    print(len(Venue))
    print(len(Number_of_Attendees))
    d = {'Event_Name':Event_Name, 'Venue':Venue, 'Number_of_Attendees':Number_of_Attendees}
    df = pd.DataFrame.from_dict(d)
    return df

In [67]:
df = scrape_events('https://www.residentadvisor.net/events')

Weird Science n.18 with Antenes, DJ Voices, Amourette
dweller: Kfeelz, Bryn Barnett and Titonton Duvante
Night-Walkers *Saeed Younan
Dweller
dweller: Make Techno Black Again presents Black Sound
dweller: Black Unicorn with Adam R, DJ Wawa, Getty
Sunday Soiree: Chad Andrew, Hunter Vita, Julius
Rave Scout Cookies x Dancesafe #001: NYC
Paradisco by Occupy The Disco
Public Records presents Unity & Rythm
Rare Frequency Transmissions Happy Hour
Birdcage with Steve Travolta
Dxrk Mxttr with Marcus Logan
Ultramaroon - House Music Sundays
Public Records presents Unity & Rythm
Party Party Party
Dan Demerit // Brälle
The Office presents: Shhh Music By: Alex Ander
MASHT NYC - Spin D
Tempo with Shawn Dub, Napoleon, Pablo Romero
Small Rave - Uklon Edition
Bklyn Buttr with Jadalareign, DJ 9AM, Yung Moisture + Special Guests
After Work-Salsa, Classics, Disco, Dance
Feel Loud In NYC
Cool Runnings with Carlos Sanchez Movement
House of Vogue with MikeQ & Qween Beat
Emotion Detected No.1 David Berrie
Open 

In [68]:
df.head()

Unnamed: 0,Event_Name,Venue,Number_of_Attendees
0,"Weird Science n.18 with Antenes, DJ Voices, Am...",Magick City,68 Attending
1,"dweller: Kfeelz, Bryn Barnett and Titonton Duv...",Nowadays,20 Attending
2,Night-Walkers *Saeed Younan,[TBA - Brooklyn],2 Attending
3,Dweller,[TBA - Brooklyn],60 Attending
4,dweller: Make Techno Black Again presents Blac...,Bossa Nova Civic Club,0


## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!