# NOTE: This notebook has been implemented in GetBarclayEvent.ipynb already

# Data Management: Barclays Center Events Dataset

## This notebook:
1. Manage time variables and subset events beginning after 17:00
2. Returns a csv with subset events ('events_2016_subset.csv')

In [1]:
import pandas as pd
import matplotlib.pylab as plt

In [2]:
# read event data
EV = pd.read_csv('events_2016.csv')
EV.head()

Unnamed: 0.1,Unnamed: 0,EventID,Name,Start_Time
0,0,907621312715743,WWE Live Holiday Tour,2016-12-28T19:30:00-0500
1,1,1116120618427406,New York Islanders V Washington Capitals,2016-12-27T19:00:00-0500
2,2,1185458834828646,Brooklyn Nets v. Charlotte Hornets (Winter Sca...,2016-12-26T19:30:00-0500
3,3,298254823904188,Long Island Nets vs. Westchester Knicks,2016-12-26T13:30:00-0500
4,4,1580711665559981,New York Islanders V Buffalo Sabres,2016-12-23T19:00:00-0500


In [3]:
# remove timezone element
EV['Start_Time'] = [str(i)[:-5] for i in EV['Start_Time']]
EV.head()

Unnamed: 0.1,Unnamed: 0,EventID,Name,Start_Time
0,0,907621312715743,WWE Live Holiday Tour,2016-12-28T19:30:00
1,1,1116120618427406,New York Islanders V Washington Capitals,2016-12-27T19:00:00
2,2,1185458834828646,Brooklyn Nets v. Charlotte Hornets (Winter Sca...,2016-12-26T19:30:00
3,3,298254823904188,Long Island Nets vs. Westchester Knicks,2016-12-26T13:30:00
4,4,1580711665559981,New York Islanders V Buffalo Sabres,2016-12-23T19:00:00


In [4]:
# extract event start hour
EV['Start_Time'] = pd.to_datetime(EV['Start_Time'], format='%Y-%m-%dT%H:%M:%S')
EV['Start_Hour'] = EV['Start_Time'].dt.hour
EV.head()

Unnamed: 0.1,Unnamed: 0,EventID,Name,Start_Time,Start_Hour
0,0,907621312715743,WWE Live Holiday Tour,2016-12-28 19:30:00,19
1,1,1116120618427406,New York Islanders V Washington Capitals,2016-12-27 19:00:00,19
2,2,1185458834828646,Brooklyn Nets v. Charlotte Hornets (Winter Sca...,2016-12-26 19:30:00,19
3,3,298254823904188,Long Island Nets vs. Westchester Knicks,2016-12-26 13:30:00,13
4,4,1580711665559981,New York Islanders V Buffalo Sabres,2016-12-23 19:00:00,19


In [5]:
# examine late start time
EV[EV['Start_Hour']>20.0]

Unnamed: 0.1,Unnamed: 0,EventID,Name,Start_Time,Start_Hour


In [6]:
# examine early start time
EV[EV['Start_Hour']<17.0]

Unnamed: 0.1,Unnamed: 0,EventID,Name,Start_Time,Start_Hour
3,3,298254823904188,Long Island Nets vs. Westchester Knicks,2016-12-26 13:30:00,13
11,11,341845896147886,BROOKLYN HOOPS Winter Festival: Kentucky V. Ho...,2016-12-11 00:00:00,0
23,23,311179602593857,Long Island Nets v. Grand Rapids Drive,2016-11-27 12:00:00,12
25,25,1088717457889689,Brooklyn Hoops Holiday Invitational: Syracuse ...,2016-11-26 00:00:00,0
27,27,1615186512125129,2016 NIT Season Tip-off Day 2,2016-11-25 00:00:00,0
28,28,1748091105429902,2016 NIT Season Tip-off Day 1,2016-11-24 00:00:00,0
30,30,1786130644999210,Long Island Nets v. Canton Charge,2016-11-23 13:30:00,13
31,31,656637964487596,Legends Classic Basketball Doubleheader,2016-11-22 15:30:00,15
32,32,1011256238991384,Legends Classic Basketball Doubleheader,2016-11-22 00:00:00,0
34,34,1333179830043934,Brooklyn Nets v Portland Trail Blazers (Knit H...,2016-11-20 15:30:00,15


### Event start time correction
index; actual start time
- 11;	12:00
- 25;	14:30
- 27;	12:30
- 28;	12:30
- 32; duplicated event

In [7]:
EV_evening = EV[EV['Start_Hour']>=17.0]
EV_evening.head()

Unnamed: 0.1,Unnamed: 0,EventID,Name,Start_Time,Start_Hour
0,0,907621312715743,WWE Live Holiday Tour,2016-12-28 19:30:00,19
1,1,1116120618427406,New York Islanders V Washington Capitals,2016-12-27 19:00:00,19
2,2,1185458834828646,Brooklyn Nets v. Charlotte Hornets (Winter Sca...,2016-12-26 19:30:00,19
4,4,1580711665559981,New York Islanders V Buffalo Sabres,2016-12-23 19:00:00,19
5,5,1180267432015879,Brooklyn Nets v. Golden State Warriors,2016-12-22 19:30:00,19


In [8]:
EV_evening['weekday'] = [i not in [5,6] for i in EV_evening['Start_Time'].dt.weekday.values]
EV_evening

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0.1,Unnamed: 0,EventID,Name,Start_Time,Start_Hour,weekday
0,0,907621312715743,WWE Live Holiday Tour,2016-12-28 19:30:00,19,True
1,1,1116120618427406,New York Islanders V Washington Capitals,2016-12-27 19:00:00,19,True
2,2,1185458834828646,Brooklyn Nets v. Charlotte Hornets (Winter Sca...,2016-12-26 19:30:00,19,True
4,4,1580711665559981,New York Islanders V Buffalo Sabres,2016-12-23 19:00:00,19,True
5,5,1180267432015879,Brooklyn Nets v. Golden State Warriors,2016-12-22 19:30:00,19,True
6,6,1196970867053827,LIU Brooklyn Blackbirds Men's Basketball vs. N...,2016-12-21 17:00:00,17,True
7,7,238989339819391,New York Islanders V Ottawa Senators (Ugly Hol...,2016-12-18 19:00:00,19,False
8,8,1226899760668214,New York Islanders V Chicago Blackhawks,2016-12-15 19:00:00,19,True
9,9,1252195981457288,Brooklyn Nets v. Los Angeles Lakers,2016-12-14 19:30:00,19,True
10,10,973791196075229,New York Islanders v. Washington Capitals (Sta...,2016-12-13 19:00:00,19,True


In [9]:
EV_evening.drop('Unnamed: 0', axis=1, inplace=True)
EV_evening.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,EventID,Name,Start_Time,Start_Hour,weekday
0,907621312715743,WWE Live Holiday Tour,2016-12-28 19:30:00,19,True
1,1116120618427406,New York Islanders V Washington Capitals,2016-12-27 19:00:00,19,True
2,1185458834828646,Brooklyn Nets v. Charlotte Hornets (Winter Sca...,2016-12-26 19:30:00,19,True
4,1580711665559981,New York Islanders V Buffalo Sabres,2016-12-23 19:00:00,19,True
5,1180267432015879,Brooklyn Nets v. Golden State Warriors,2016-12-22 19:30:00,19,True


In [10]:
EV_evening.to_csv('events_2016_subset.csv')