<a href="https://colab.research.google.com/github/tinkvu/Data-Acquisition-Dublin.ie/blob/main/DublinIE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Acquisition of Dublin.ie

## Introduction

Welcome to the **Dublin.ie** Data Acquisition Project!
This initiative is dedicated to capturing the heartbeat of Dublin through its vibrant and diverse events. The focus of this project is to extract information about upcoming events in Dublin from the "https://dublin.ie/whats-on/" page. By leveraging web scraping and Beautiful Soup within a Jupyter Notebook on Google Colab, I aim to provide a comprehensive overview of the exciting events that contribute to Dublin's dynamic cultural scene.


**Project Objective**

Dublin.ie serves as the primary source for our data acquisition project, specifically targeting the "What's On" section. This is where Dublin's upcoming events are showcased, ranging from cultural festivals and entertainment to educational seminars.
This project collects datas from the website as from the date of running and makes the data in a convenient way for all to do different types of processing and analysis.

**The Essence of Dublin Events**

Dublin is a city that thrives on its cultural richness, and events play a pivotal role in shaping its character. Each event contributes to the diverse and dynamic tapestry of Dublin's cultural landscape. This project aspires to capture the essence of Dublin through the lens of its upcoming events, providing a glimpse into the vibrant and multifaceted activities that define the city.

**How It Works**

The project remains organized into sections within a Jupyter Notebook, with a dedicated focus on extracting and processing event information. Users are encouraged to explore the notebook to uncover details about specific events, venues, dates, and other relevant information. The project's success lies in its ability to provide a real-time snapshot of the events that make Dublin an exciting and lively city.

**Join the Exploration**

Whether you are planning your visit to Dublin, a local seeking new experiences, or a researcher interested in Dublin's cultural scene, this project invites you to join the exploration of upcoming events. As we delve into the data, let's uncover the richness of Dublin's events and celebrate the city's cultural diversity.

Let the event exploration begin!

##Data Acquisition Codes

Let's start grabbing datas.

First we import the requests function to have a connection to the website.

We will be using BeautifulSoup package for parsing the html to the python code

Before getting started, we have to go to the target website and check what kind of queries they have used to pass data and by checking that only, we can get to a conclusion for the methods we use.

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://dublin.ie/whats-on/"  #URL that is gonna scrape
resp = requests.get(url)
soup=BeautifulSoup(resp.text,'html.parser') #Here the 'soup' retrieves the Html of the website
soup

In [None]:
events=[] #events is our list in which we are going to store the datas collected

#The below 2 lines holds the important part of the website where the required datas are stored
events_container = soup.find('div', class_='articles events-listing autofit')
all_events = events_container.find_all('article', class_='event card')

#This for loop traverses through all the datas we need to acquire and stores it
for event in all_events:

    event_link = event.find('a', class_='overlay')['href']
    image_url = event.find('img')['src']
    event_name = event.find('h2').text
    location = event.find('p', class_='location').text.strip()
    date_range = event.find('time').text.strip()
    summary = event.find('div', class_='summary').find('p').text.strip()
    category = event['data-categories']

    #Appending the datas in the list we created
    events.append({
    'Event Name': event_name,
    'Category' : category,
    'Location': location,
    'Date Range': date_range,
    'Description': summary,
    'Image URL': image_url,
    'Event Link': event_link

})

#Lets make the list into a dataframe
events=pd.DataFrame(events)
print(events.shape)
events.head()

(114, 7)


Unnamed: 0,Event Name,Category,Location,Date Range,Description,Image URL,Event Link
0,Describe the Night,art-and-theatre,Glass Mask Theatre,Mon 20th Nov - Sat 9th Dec,A theatrical odyssey spanning 90 years. A desp...,https://cdn.dublin.ie/wp-content/uploads/WO-Gl...,https://dublin.ie/whats-on/listings/describe-t...
1,Really Good Time,music-and-comedy,The Sound House,Sat 9th Dec,"Really Good Time is a four-piece, apres-garde,...",https://cdn.dublin.ie/wp-content/uploads/WO-So...,https://dublin.ie/whats-on/listings/really-goo...
2,Gifted – The Contemporary Craft & Design Fair,"art-and-theatre,christmas,family-friendly,food...",RDS,Wed 6th Dec - Sun 10th Dec,"The Iconic Christmas Craft Fair is Back, Featu...",https://cdn.dublin.ie/wp-content/uploads/WO-Gi...,https://dublin.ie/whats-on/listings/gifted-the...
3,Sustainable Christmas Craft Market,christmas,National Botanic Gardens,Sat 9th Dec - Sun 10th Dec,The traditional Christmas Craft Market returns...,https://cdn.dublin.ie/wp-content/uploads/WO-Bo...,https://dublin.ie/whats-on/listings/christmas-...
4,A Festive Evening with Sing Ireland,"christmas,family-friendly,music-and-comedy",EPIC The Irish Emigration Museum,Sun 19th Nov - Sun 10th Dec,"This Christmas, EPIC Museum are glad to partne...",https://cdn.dublin.ie/wp-content/uploads/WO-EP...,https://dublin.ie/whats-on/listings/a-festive-...


Wuhoooo!

Now we have got 114 event datas from the website with all the major informations like Name, Category, Location, Dates and so on!

**Data Preprocessing:**

The datas we got needs to be processed since it needs to be in a better analytical forms. So here starts the preprocessing steps.

Let us make the Date Range into a DateTime Object

In [None]:
from dateutil import parser

# Function to clean and standardize the date format
def clean_date(date_range):
    date_list = date_range.split('-')
    sDate = parser.parse(date_list[0].strip())
    eDate = parser.parse(date_list[-1].strip()) if len(date_list) > 1 else sDate
    return pd.Series({'sDate': sDate, 'eDate': eDate})

# Apply the function to the 'Date Range' column
events = pd.concat([events, events['Date Range'].apply(clean_date)], axis=1)
events.drop('Date Range',axis=1, inplace=True)
#print(type(events['Date Range']))
df=events

#Now we have dropped the Date Range column and made it as 2 columns, sDate and eDate
df[["sDate", "eDate"]] = df[["sDate", "eDate"]].apply(pd.to_datetime)
df.head()

Unnamed: 0,Event Name,Category,Location,Description,Image URL,Event Link,sDate,eDate
0,Describe the Night,art-and-theatre,Glass Mask Theatre,A theatrical odyssey spanning 90 years. A desp...,https://cdn.dublin.ie/wp-content/uploads/WO-Gl...,https://dublin.ie/whats-on/listings/describe-t...,2023-11-20,2023-12-09
1,Really Good Time,music-and-comedy,The Sound House,"Really Good Time is a four-piece, apres-garde,...",https://cdn.dublin.ie/wp-content/uploads/WO-So...,https://dublin.ie/whats-on/listings/really-goo...,2023-12-09,2023-12-09
2,Gifted – The Contemporary Craft & Design Fair,"art-and-theatre,christmas,family-friendly,food...",RDS,"The Iconic Christmas Craft Fair is Back, Featu...",https://cdn.dublin.ie/wp-content/uploads/WO-Gi...,https://dublin.ie/whats-on/listings/gifted-the...,2023-12-06,2023-12-10
3,Sustainable Christmas Craft Market,christmas,National Botanic Gardens,The traditional Christmas Craft Market returns...,https://cdn.dublin.ie/wp-content/uploads/WO-Bo...,https://dublin.ie/whats-on/listings/christmas-...,2023-12-09,2023-12-10
4,A Festive Evening with Sing Ireland,"christmas,family-friendly,music-and-comedy",EPIC The Irish Emigration Museum,"This Christmas, EPIC Museum are glad to partne...",https://cdn.dublin.ie/wp-content/uploads/WO-EP...,https://dublin.ie/whats-on/listings/a-festive-...,2023-11-19,2023-12-10


Now, let us categorize the datas in the **Category column**. There are different categories the Category column and in some of the column, there are more than 1 categories. So, making it into a dictionary will help us in Analysis.

In [None]:
def convert_to_dict(category):
    categories = category.split(',')
    return {'category': categories}

# Apply the function to the 'Category' column
df['Category'] = df['Category'].apply(convert_to_dict)

In [None]:
df.head()

Unnamed: 0,Event Name,Category,Location,Description,Image URL,Event Link,sDate,eDate
0,Describe the Night,{'category': ['art-and-theatre']},Glass Mask Theatre,A theatrical odyssey spanning 90 years. A desp...,https://cdn.dublin.ie/wp-content/uploads/WO-Gl...,https://dublin.ie/whats-on/listings/describe-t...,2023-11-20,2023-12-09
1,Really Good Time,{'category': ['music-and-comedy']},The Sound House,"Really Good Time is a four-piece, apres-garde,...",https://cdn.dublin.ie/wp-content/uploads/WO-So...,https://dublin.ie/whats-on/listings/really-goo...,2023-12-09,2023-12-09
2,Gifted – The Contemporary Craft & Design Fair,"{'category': ['art-and-theatre', 'christmas', ...",RDS,"The Iconic Christmas Craft Fair is Back, Featu...",https://cdn.dublin.ie/wp-content/uploads/WO-Gi...,https://dublin.ie/whats-on/listings/gifted-the...,2023-12-06,2023-12-10
3,Sustainable Christmas Craft Market,{'category': ['christmas']},National Botanic Gardens,The traditional Christmas Craft Market returns...,https://cdn.dublin.ie/wp-content/uploads/WO-Bo...,https://dublin.ie/whats-on/listings/christmas-...,2023-12-09,2023-12-10
4,A Festive Evening with Sing Ireland,"{'category': ['christmas', 'family-friendly', ...",EPIC The Irish Emigration Museum,"This Christmas, EPIC Museum are glad to partne...",https://cdn.dublin.ie/wp-content/uploads/WO-EP...,https://dublin.ie/whats-on/listings/a-festive-...,2023-11-19,2023-12-10


Now let us export the file to a Comma Seperated Values (csv)

In [None]:
df.to_csv('Dublin.ie Events.csv')

## Filtering events by date

Now let's do some filtering.

What if I wanna know about the events between 2 specific dates?

In [None]:
start_date = pd.to_datetime('2023-12-01') #You can add any date here in the format YYY-MM-DD
end_date = pd.to_datetime('2023-12-10')  # Here the end date
filtDf = df[(df['sDate'] >= start_date) & (df['eDate'] <= end_date)]
filtDf

#Now by running this line, we will get all the events in this specific dates

Unnamed: 0,Event Name,Category,Location,Description,Image URL,Event Link,sDate,eDate
1,Really Good Time,{'category': ['music-and-comedy']},The Sound House,"Really Good Time is a four-piece, apres-garde,...",https://cdn.dublin.ie/wp-content/uploads/WO-So...,https://dublin.ie/whats-on/listings/really-goo...,2023-12-09,2023-12-09
2,Gifted – The Contemporary Craft & Design Fair,"{'category': ['art-and-theatre', 'christmas', ...",RDS,"The Iconic Christmas Craft Fair is Back, Featu...",https://cdn.dublin.ie/wp-content/uploads/WO-Gi...,https://dublin.ie/whats-on/listings/gifted-the...,2023-12-06,2023-12-10
3,Sustainable Christmas Craft Market,{'category': ['christmas']},National Botanic Gardens,The traditional Christmas Craft Market returns...,https://cdn.dublin.ie/wp-content/uploads/WO-Bo...,https://dublin.ie/whats-on/listings/christmas-...,2023-12-09,2023-12-10
5,Stillgarden Christmas Market,"{'category': ['business-and-tech', 'christmas'...",Stillgarden Distillery,A Christmas Market in the heart of Dublin 8 sh...,https://cdn.dublin.ie/wp-content/uploads/WO-St...,https://dublin.ie/whats-on/listings/stillgarde...,2023-12-09,2023-12-10
7,The Fumbally Christmas Market,"{'category': ['business-and-tech', 'christmas']}",The Fumbally,This year's market is a phenomenal cross secti...,https://cdn.dublin.ie/wp-content/uploads/WO-Th...,https://dublin.ie/whats-on/listings/the-fumbal...,2023-12-08,2023-12-10
8,IAFF – Ibero-American Film Festival Dublin,"{'category': ['festivals', 'film-and-literatur...",UCD Cinema,"From December 6 to 10, 2023, Dublin is gearing...",https://cdn.dublin.ie/wp-content/uploads/WO-UC...,https://dublin.ie/whats-on/listings/iaff-ibero...,2023-12-06,2023-12-10
9,The Irish National Youth Ballet perform Cinder...,"{'category': ['art-and-theatre', 'christmas']}",Samuel Beckett Theatre,The Irish National Youth Ballet perform Cinder...,https://cdn.dublin.ie/wp-content/uploads/WO-Sa...,https://dublin.ie/whats-on/listings/the-irish-...,2023-12-08,2023-12-10
37,Sleeping Beauty,"{'category': ['art-and-theatre', 'christmas', ...",Mill Theatre,"“Sleep my Beauty, that’s what I decree. No pre...",https://cdn.dublin.ie/wp-content/uploads/WO-Mi...,https://dublin.ie/whats-on/listings/sleeping-b...,2023-12-07,2023-01-07
40,Cinderella- The Civic Panto 2023,"{'category': ['art-and-theatre', 'christmas', ...",Civic Theatre,The Fairy Godmother Of All Pantos!\nThe team w...,https://cdn.dublin.ie/wp-content/uploads/WO-Ci...,https://dublin.ie/whats-on/listings/cinderella...,2023-12-06,2023-01-07
41,Charlie and the Chocolate Factory – The Musical,"{'category': ['christmas', 'family-friendly', ...",Bord Gáis Energy Theatre,Escape to a world of pure imagination with ROA...,https://cdn.dublin.ie/wp-content/uploads/WO-BG...,https://dublin.ie/whats-on/listings/charlie-an...,2023-12-05,2023-01-07


## Filtering by categories

Now what if I want to filter this by a specific category?

Let's see what are the different categories in the column:

In [None]:
categories_list = df['Category'].apply(lambda x: x.get('category', []))
categories_flat = [category for sublist in categories_list for category in sublist]
unique_categories = set(categories_flat)
unique_categories

{'art-and-theatre',
 'business-and-tech',
 'christmas',
 'family-friendly',
 'festivals',
 'film-and-literature',
 'food-and-drink',
 'free-events',
 'learning',
 'music-and-comedy',
 'online',
 'sports'}

There are 12 different categories in the data.

Now comes the question: Filter out the **free events**.

In [None]:
desired_categories = ['free-events']
filtDf = df[df['Category'].apply(lambda x: any(category in x['category'] for category in desired_categories))]
filtDf

Unnamed: 0,Event Name,Category,Location,Description,Image URL,Event Link,sDate,eDate
5,Stillgarden Christmas Market,"{'category': ['business-and-tech', 'christmas'...",Stillgarden Distillery,A Christmas Market in the heart of Dublin 8 sh...,https://cdn.dublin.ie/wp-content/uploads/WO-St...,https://dublin.ie/whats-on/listings/stillgarde...,2023-12-09,2023-12-10
6,The Enchanted Garden,"{'category': ['christmas', 'family-friendly', ...",Shackleton Garden,Illuminate your nights with the enchanting Fes...,https://cdn.dublin.ie/wp-content/uploads/WO-Sh...,https://dublin.ie/whats-on/listings/the-enchan...,2023-11-20,2023-12-10
15,Christmas at Farmleigh,"{'category': ['christmas', 'family-friendly', ...",Farmleigh House & Estate,Welcome to the Farmleigh Christmas Programme f...,https://cdn.dublin.ie/wp-content/uploads/WO-Xm...,https://dublin.ie/whats-on/listings/christmas-...,2023-12-02,2023-12-17
16,Bremore Castle Christmas Market,"{'category': ['christmas', 'family-friendly', ...",Bremore Castle,The perfect excuse to get into the festive spi...,https://cdn.dublin.ie/wp-content/uploads/WO-Br...,https://dublin.ie/whats-on/listings/balbriggan...,2023-11-26,2023-12-17
19,Christmas at the Castle,"{'category': ['christmas', 'family-friendly', ...",Dublin Castle,Come and join us this festive season to experi...,https://cdn.dublin.ie/wp-content/uploads/WO-Ch...,https://dublin.ie/whats-on/listings/christmas-...,2023-12-06,2023-12-19
30,Live Animal Crib at the Mansion House,"{'category': ['christmas', 'family-friendly', ...",The Mansion House,The Lord Mayor of Dublin Daithí de Róiste woul...,https://cdn.dublin.ie/wp-content/uploads/WO-Li...,https://dublin.ie/whats-on/listings/live-anima...,2023-12-06,2023-12-24
32,Dublin City Council Dublin Winter Lights,"{'category': ['christmas', 'family-friendly', ...",Dublin,Dublin City Council is delighted to announce t...,https://cdn.dublin.ie/wp-content/uploads/WO-Du...,https://dublin.ie/whats-on/listings/winter-lig...,2023-12-01,2023-12-31
38,The Moving Crib,"{'category': ['christmas', 'family-friendly', ...",St Martin Apostolate,This year The Moving Crib will be opening it’s...,https://cdn.dublin.ie/wp-content/uploads/WO-Th...,https://dublin.ie/whats-on/listings/the-moving...,2023-11-28,2023-01-07
51,Solidarity – The Dockers of Dublin Port,"{'category': ['art-and-theatre', 'film-and-lit...",The Substation,Dublin Port Company presents Solidarity - The ...,https://cdn.dublin.ie/wp-content/uploads/WO-So...,https://dublin.ie/whats-on/listings/solidarity...,2023-11-23,2023-02-04
52,‘Medici Lion’ by Siobhán Hapaska,"{'category': ['art-and-theatre', 'free-events']}",The Douglas Hyde Gallery,The Douglas Hyde is delighted to present a maj...,https://cdn.dublin.ie/wp-content/uploads/WO-Do...,https://dublin.ie/whats-on/listings/medici-lio...,2023-12-01,2023-03-10


All the events above are free! So you can just search for any particular category by changing the value in the above cell.

## Filtering by location

In [None]:
df['Location'].unique()

array(['Glass Mask Theatre', 'The Sound House', 'RDS',
       'National Botanic Gardens', 'EPIC The Irish Emigration Museum',
       'Stillgarden Distillery', 'Shackleton Garden', 'The Fumbally',
       'UCD Cinema', 'Samuel Beckett Theatre', 'Vicar Street',
       'Central Bank of Ireland', 'Richmond Barracks',
       'Goldenbridge Cemetery', 'Temple Bar Gallery + Studios',
       'Farmleigh House & Estate', 'Bremore Castle', 'Central Plaza',
       'The Casino Model Railway Museum', 'Dublin Castle',
       'Dalkey Castle & Heritage Centre', 'Smock Alley Theatre',
       'Croke Park', 'Merrion Square Park', 'Airfield Estate',
       'Dún Laoghaire', 'Malahide Castle', 'Leaves from Essex Street',
       "Bewley's Cafe Theatre", 'The Mart Gallery', 'The Mansion House',
       'The Lark Concert Hall', 'Dublin', 'The Ark', 'Millbank Theatre',
       'Dublin Zoo', 'Mill Theatre', 'St Martin Apostolate',
       'Gaiety Theatre', 'Civic Theatre', 'Bord Gáis Energy Theatre',
       'Gate Thea

Let's say I want to see all the events taking place in 'National Concert Hall' between 25 December and 02 January'24



In [None]:
start_date = pd.to_datetime('2023-12-25')
end_date = pd.to_datetime('2024-01-02')
filtDf = df[(df['Location'] == 'National Concert Hall')& (df['sDate'] >= start_date) & (df['eDate'] <= end_date)]
filtDf


Unnamed: 0,Event Name,Category,Location,Description,Image URL,Event Link,sDate,eDate
83,The Sound of Music,"{'category': ['art-and-theatre', 'christmas', ...",National Concert Hall,"After a sensational sell-out run last year, th...",https://cdn.dublin.ie/wp-content/uploads/WO-NC...,https://dublin.ie/whats-on/listings/the-sound-...,2023-12-27,2023-01-02
86,New Year’s Eve Extravaganza: Gatsby & Beyond,{'category': ['music-and-comedy']},National Concert Hall,Get your jazz hands at the ready and ring in 2...,https://cdn.dublin.ie/wp-content/uploads/WO-NC...,https://dublin.ie/whats-on/listings/new-years-...,2023-12-31,2023-12-31


So, that's only what we did in this projet.

Thank You so much for being the part of this journey.

Feel free to connect me (rinshaadc@gmail.com) for any queries of concerns about this project.

In [None]:
#Recommendation: Try with Multi Hot encoding for the Category column
