# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import re
import time

In [2]:
events_page_url = "https://www.residentadvisor.net/events/us/newyork/week/2019-07-18"
html_page = requests.get(events_page_url)
soup = BeautifulSoup(html_page.content, "html.parser")

event_listings = soup.find('div', id="event-listing")
event_listings

<div class="fl col4" id="event-listing">
<ul class="list" id="items">
<li><p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=7&amp;yr=2019&amp;dy=18"><span>Thu, 18 Jul 2019 /</span></a></p></li><li class=""><article class="event-item clearfix" itemscope="" itemtype="http://data-vocabulary.org/Event"><span style="display:none;"><time datetime="2019-07-18T00:00" itemprop="startDate">2019-07-18T00:00</time></span><a href="/events/1292215"><img height="76" src="/images/events/flyer/2019/7/us-0718-1292215-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1292215" itemprop="url" title="Event details of Caché presents Mariana: JKriv, DGRO, Gian-Paul, Kate, CGC">Caché presents Mariana: JKriv, DGRO, Gian-Paul, Kate, CGC</a> <span>at <a href="/club.aspx?id=170696">The William Vale</a></span></h1><div class="grey event-lineup">JKriv, DGRO, Gian-Paul, Kate Garvey, CGC</div><p class="attending"><span>24</span> Attending</p></

In [3]:
def scrape_events(events_page_url):
    #Your code here
    results = []
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, "html.parser")
    
    entries = event_listings.findAll('li')
    
    for entry in entries:
        date = entry.find('p', class_="eventDate date")
        event = entry.find('h1', class_="event-title")
        if date:
            cur_date = pd.to_datetime(date.text.split("/")[0])
        elif event:
            details = event.text.split("at ")
            name = details[0].strip()
            venue = details[1].strip()
            try:
                n_attendees = int(entry.find('p', class_="attending").text.split(" ")[0])
            except:
                n_attendees = 0
            
            results.append([name, venue, cur_date, n_attendees])
        else:
            continue
    
    df = pd.DataFrame(results)
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [4]:
scrape_events('https://www.residentadvisor.net/events').head()

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Caché presents Mariana: JKriv, DGRO, Gian-Paul...",The William Vale,2019-07-18,24
1,Nano Mutek presents: Akufen [All Night Long],public records,2019-07-18,21
2,DJ Set: Galcher Lustwerk,Sister City,2019-07-18,19
3,"Seemingly Normal People with Hunter Vita, Rob ...",TBA Brooklyn,2019-07-18,8
4,Co Lab 001,Secret Room NYC,2019-07-18,4


## Write a Function to Retrieve the URL for the Next Page

In [7]:
def next_page(url):
    #Your code here
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, "html.parser")

    next_url_ext = soup.find('a', attrs={'ga-event-action': "Next "}).attrs['href']
    next_page_url = "https://www.residentadvisor.net" + next_url_ext
    return next_page_url
 

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [8]:
#Your code here
url = 'https://www.residentadvisor.net/events/us/newyork'
dfs = []
rows = 0

while rows < 1000:
    df = scrape_events(url)
    dfs.append(df)
    rows += len(df)
   
    url = next_page(url)
    time.sleep(.2)
    
df = pd.concat(dfs)
df = df.iloc[:1000]
print(rows)
print(len(df))
df.head()

1078
1000


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Caché presents Mariana: JKriv, DGRO, Gian-Paul...",The William Vale,2019-07-18,24
1,Nano Mutek presents: Akufen [All Night Long],public records,2019-07-18,21
2,DJ Set: Galcher Lustwerk,Sister City,2019-07-18,19
3,"Seemingly Normal People with Hunter Vita, Rob ...",TBA Brooklyn,2019-07-18,8
4,Co Lab 001,Secret Room NYC,2019-07-18,4


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!