# Event Dataset Preprocessing

Using  two datasets from the city of Aarhus:
- Set of cultural events from the municipality of Aarhus (June/2012 - July/2014)
- Set of events hosted by libraries in Denmark (October/2013 - June/2015)

In [1]:
import tarfile
import pandas as pd
import os
import json
from pathlib import Path
import zipfile

## Cultural events

Edited:
- named columns based on the information from the links 
- 3 columns that contained html code, static values 


In [13]:
cultural_df = pd.read_csv('../data/cultural_events_aarhus.csv', sep=';')
cultural_df.head()

Unnamed: 0.1,Unnamed: 0,City,Name,Ticket,Price,Unnamed: 5,ID,Room,Date,Website,Unnamed: 10,Type,Logo,Genre
0,1,Aarhus C,KAMMERKONCERT,http://www.billetlugen.dk/referer/?r=266abe1b7...,85.00 - 115.00 DKK,1403593223,46773,Symfonisk Sal,2014-09-21T15:00:00,http://www.musikhusetaarhus.dk/kalender/29048/,901,Musik,http://static.billetlugen.dk/images/events/b/2...,Klassisk
1,0,Aarhus C,Sanne Salomonsen Hjem 2014,http://www.billetlugen.dk/referer/?r=266abe1b7...,295.00 DKK,1398409215,42105,Store Sal,2014-04-24T20:00:00,http://www.musikhusetaarhus.dk/kalender/31483/,230,Musik,http://static.billetlugen.dk/images/events/b/3...,"Rock,Pop"
2,0,Aarhus C,Jane & Shane og John Sheahan,http://www.billetlugen.dk/referer/?r=266abe1b7...,250.00 DKK,1398495614,42979,Lille Sal,2014-04-25T19:30:00,http://www.musikhusetaarhus.dk/kalender/31803/,292,Musik,http://static.billetlugen.dk/images/events/b/3...,Folk
3,0,Aarhus C,Scandinavian Saxophone Festival-Times Again,http://www.billetlugen.dk/referer/?r=266abe1b7...,Gratis,1397718013,46785,Kammermusiksalen,2014-04-16T19:30:00,http://www.musikhusetaarhus.dk/kalender/29210/,554,Musik,http://static.billetlugen.dk/images/events/b/2...,"Klassisk,Gratis"
4,0,Aarhus C,H.P. Lange,http://www.billetlugen.dk/referer/?r=266abe1b7...,Gratis,1397804412,45099,Caféen,2014-04-17T13:00:00,http://www.musikhusetaarhus.dk/kalender/29289/,413,Musik,http://static.billetlugen.dk/images/events/b/2...,"Koncert,Gratis"


In [14]:
print('Cultural events info')
cultural_df.describe()

Cultural events info


Unnamed: 0.1,Unnamed: 0,Unnamed: 5,ID,Unnamed: 10
count,100.0,100.0,100.0,100.0
mean,0.58,1400963000.0,46812.8,615.29
std,0.669388,2934420.0,2483.089046,186.379557
min,0.0,1397200000.0,37691.0,230.0
25%,0.0,1398496000.0,46743.75,571.25
50%,1.0,1399921000.0,47376.0,638.5
75%,1.0,1403334000.0,47918.5,677.5
max,4.0,1407308000.0,51709.0,985.0


## Library Events

Has location information


In [10]:
 
library_df = pd.read_csv('../data/aarhus_libraryEvents.csv', sep=',')
library_df.head()

Unnamed: 0,lid,city,endtime,title,url,price,changed,content,zipcode,library,imageurl,teaser,street,status,longitude,starttime,latitude,_id,id,streamtime
0,1,Aarhus,2013-11-05 17:00:00,Lav din egen 'bogæder'\t\t,https://www.aakb.dk/node/8989,0,2013-11-04T11:53:11,"\n<p><img title="""" src=""https://www.aakb.dk/fi...",8000,Hovedbiblioteket,https://www.aakb.dk/files/list_images/bogmaerk...,"Kom og vær med, når vi tirsdag eftermiddag lav...",Møllegade 1,1,10.200179,2013-11-05 15:00:00,56.156617,258,8989,2014-12-08 17:44:00
1,42,Viby J,2013-12-30 00:00:00,Keramiker Susanne Bidstrup udstiller på Viby B...,https://www.aakb.dk/node/9235,0,2013-12-02T15:05:32,\n<p>Susanne Bidstrup arbejder både med drejed...,8260,Viby Bibliotek,https://www.aakb.dk/files/list_images/susanne_...,Inspirerende udstilling,Skanderborgvej 170,2,10.164431,2013-12-02 00:00:00,56.130402,379,9235,2014-12-08 17:44:00
2,6,Beder,2014-02-27 21:00:00,Ølsmagning,https://www.aakb.dk/node/9227,70,2014-01-24T11:01:24,"\n<p><img src=""https://www.aakb.dk/files/conte...",8330,Beder-Malling Bibliotek,https://www.aakb.dk/files/list_images/oelsmagn...,Kom til ølsmagning med en ægte ølentusiast!,Kirkebakken 41,2,10.216045,2014-02-27 19:00:00,56.060321,371,9227,2014-12-08 17:44:00
3,55,Åby,2013-10-31 00:00:00,Susanne Butcher udstiller på Åby Bibliotek i o...,https://www.aakb.dk/node/8920,0,2013-10-28T14:30:25,"\n<p><img src=""https://www.aakb.dk/files/conte...",8230,Åby Bibliotek,https://www.aakb.dk/files/list_images/susanneb...,"Susanne Butcher er født i 1966, og er autodida...",Ludvig Feilbergs Vej 7,1,10.162515,2013-10-01 00:00:00,56.156438,233,8920,2014-12-08 17:44:00
4,1,Aarhus,2014-04-18 11:30:00,International Playgroup,https://www.aakb.dk/node/9325,0,2013-12-10T16:45:18,"\n<p><img title="""" src=""https://www.aakb.dk/fi...",8000,Hovedbiblioteket,https://www.aakb.dk/files/list_images/colourbo...,Aarhus Main Library in cooperation with UIC in...,Møllegade 1,1,10.200179,2014-04-18 10:30:00,56.156617,416,9325,2014-12-08 17:44:00


In [15]:
print('Cultural events info')
library_df.describe()

Cultural events info


Unnamed: 0,lid,price,zipcode,status,longitude,latitude,_id,id
count,1548.0,1548.0,1548.0,1548.0,1548.0,1548.0,1548.0,1548.0
mean,20.781008,3.989018,8191.196382,1.501938,10.178265,56.161029,787.46124,10083.644703
std,18.232371,16.32121,161.665272,0.529049,0.053129,0.037195,448.632288,1031.650942
min,1.0,0.0,8000.0,0.0,9.99869,56.042783,1.0,7827.0
25%,1.0,0.0,8000.0,1.0,10.162515,56.155665,401.75,9298.75
50%,15.0,0.0,8230.0,2.0,10.200179,56.156617,788.5,10245.5
75%,34.0,0.0,8260.0,2.0,10.200179,56.179458,1175.25,10949.0
max,55.0,150.0,8541.0,2.0,10.308101,56.271071,1562.0,11753.0
