# Intro To Handling Dates In Pandas
Properly handling dates in Pandas can be very useful.  For example, let's say you read in a csv of temperature data with dates and you'd like to find the monthly mean temperature.  If the date column is already recognized, by pandas, as a date type, then extracting the month from the column will be very easy.  If the date column is, instead, read in as a string type, then it will be more difficult to extract the month.

## Notebook Outline
* <a href='#parsedates'>Using the parse dates .read_csv() to automatically read in dateformats</a>
* <a href='#todatetime'>Using .to_datetime() to convert a column to a pandas datetime format</a>
* <a href='#dtattribute'>Using .dt on datetime columns</a>
* <a href='#minmaxsum'>Introduction the .min(), .max() and .sum() methods</a>
* <a href=#timezone>Setting and changing timezones</a>
* <a href=#date_range>Creating ranges of dates</a>

In [1]:
import pandas as pd
import os

<a name=parsedates></a>

# Automatically Read in Dates as Datetime Types with Pandas

When reading in a file, we can specify which columns, or which combinations of columns, we would like read in as a datetime type. We do this by defining the `parse_dates` argument when using the `read_csv()` method.

## Example 1: Quick Service Restaurant Data

For this example, I am going to introduce a new data file 'LaborSheetData.csv'. This file contains real data from a very popular fast food store. Every hour, the shift manager must enter some key data in this file, like drive through times and sales for the past hour. This will be a good dataset for us to explore in some of our lectures.

First we will load the data. Remember that you will need to change the path to point to where you place the file on your computer, after you download it.

In [2]:
filepath = os.path.join(os.getcwd(), 'data', 'ShiftManagerApp_LaborSheet.csv')
laborSheetData = pd.read_csv(filepath)

### First, let's look at the data using the `.head()` and `.info()` methods.

Exercise: complete the two cells below to use the `.head()` and `.info()` methods.

In [3]:
laborSheetData.head()

Unnamed: 0,Store_ID,Manager,Date,Ending_Hour,Projected_Sales,Sales,DT_TTL,Car_Count,KVS_Total,Scheduled_People,Actual_People,Reason_for_Labor_Diff,Reason_for_High_TTLs,Manager_Entering_Data,Timestamp,OEPE,Park_Percentage
0,4462,JillianA,2017-01-23,08:00:00,540.0,420.0,170.0,,100.0,,,,,,2017-01-23 09:52:14,,
1,4462,ZoeyD,2017-02-05,06:00:00,90.0,155.0,114.0,,78.0,,,,,,2017-02-05 11:30:48,,
2,4462,JessicaB,2017-02-05,07:00:00,173.0,182.0,106.0,,81.0,,,,,,2017-02-05 11:35:48,,
3,4462,JessicaB,2017-02-05,08:00:00,333.0,311.0,102.0,,55.0,,,,,,2017-02-05 11:52:05,,
4,4462,JessicaB,2017-02-05,09:00:00,594.0,598.0,155.0,,106.0,,,,,,2017-02-05 11:59:35,,


In [4]:
laborSheetData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25471 entries, 0 to 25470
Data columns (total 17 columns):
Store_ID                 25471 non-null int64
Manager                  25471 non-null object
Date                     25471 non-null object
Ending_Hour              25471 non-null object
Projected_Sales          25316 non-null float64
Sales                    23678 non-null float64
DT_TTL                   23647 non-null float64
Car_Count                14529 non-null float64
KVS_Total                23608 non-null float64
Scheduled_People         16175 non-null float64
Actual_People            16068 non-null float64
Reason_for_Labor_Diff    553 non-null object
Reason_for_High_TTLs     247 non-null object
Manager_Entering_Data    11057 non-null object
Timestamp                25471 non-null object
OEPE                     0 non-null float64
Park_Percentage          0 non-null float64
dtypes: float64(9), int64(1), object(7)
memory usage: 3.3+ MB


### Note that the "Date" and "Timestamp" columns are read in as strings by default
The data types are object, which means they were read in as strings. Let's double check that by grabbing a value from the data column using the .loc() method and finding its type with the type function.

In [5]:
type(laborSheetData.loc[0, 'Date'])

str

### Now, we will use the `parse_dates` argument to automatically read theses columns in as datetime columns
All we need to do is pass a list of the columns we would like pandas to try and automatically decipher as date or time objects to the parse_dates argument.  If you look above at the output from .info, you will see that the date or datetime columns are the columns 'Date' and 'Timestamp' (remember that counting starts at 0).

Note that this will cause the read_csv() method to take a little bit longer to complete.

In [6]:
# add the parse_date attribute
laborSheetData = pd.read_csv(filepath, parse_dates=["Date", "Timestamp"])

In [7]:
laborSheetData.head(2)

Unnamed: 0,Store_ID,Manager,Date,Ending_Hour,Projected_Sales,Sales,DT_TTL,Car_Count,KVS_Total,Scheduled_People,Actual_People,Reason_for_Labor_Diff,Reason_for_High_TTLs,Manager_Entering_Data,Timestamp,OEPE,Park_Percentage
0,4462,JillianA,2017-01-23,08:00:00,540.0,420.0,170.0,,100.0,,,,,,2017-01-23 09:52:14,,
1,4462,ZoeyD,2017-02-05,06:00:00,90.0,155.0,114.0,,78.0,,,,,,2017-02-05 11:30:48,,


### Let's use `.info()` to check the column datatypes
Note that the 'Date' and 'TimeStamp' columns are now datetime types!

In [8]:
laborSheetData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25471 entries, 0 to 25470
Data columns (total 17 columns):
Store_ID                 25471 non-null int64
Manager                  25471 non-null object
Date                     25471 non-null datetime64[ns]
Ending_Hour              25471 non-null object
Projected_Sales          25316 non-null float64
Sales                    23678 non-null float64
DT_TTL                   23647 non-null float64
Car_Count                14529 non-null float64
KVS_Total                23608 non-null float64
Scheduled_People         16175 non-null float64
Actual_People            16068 non-null float64
Reason_for_Labor_Diff    553 non-null object
Reason_for_High_TTLs     247 non-null object
Manager_Entering_Data    11057 non-null object
Timestamp                25471 non-null datetime64[ns]
OEPE                     0 non-null float64
Park_Percentage          0 non-null float64
dtypes: datetime64[ns](2), float64(9), int64(1), object(5)
memory usage: 3.3+ MB

### Now let's use the `parse_dates` argument to _combine_ two (or more) columns into one datetime column. 
For this, I need to give you some information about this data. The 'TimeStamp' column let's us know when the data was entered, but the 'Date' + 'Hour' column let us know what hour the data is for.  So it would be great if we could combine the Date and Hour columns into one datetime column!

We can do this using the parse_dates argument. All we need to do is include a list, of the columns we want to combine, as one of the items in the list that we we pass to the parse_dates argument.  

Note that pandas will create a _new_ column called 'Date_Hour' that combines the 'Date' and 'Hour' columns two columns.

In [9]:
# add parse_dates
laborSheetData = pd.read_csv(filepath, parse_dates=[["Date", "Ending_Hour"], "Timestamp"])

In [10]:
laborSheetData.head(2)

Unnamed: 0,Date_Ending_Hour,Store_ID,Manager,Projected_Sales,Sales,DT_TTL,Car_Count,KVS_Total,Scheduled_People,Actual_People,Reason_for_Labor_Diff,Reason_for_High_TTLs,Manager_Entering_Data,Timestamp,OEPE,Park_Percentage
0,2017-01-23 08:00:00,4462,JillianA,540.0,420.0,170.0,,100.0,,,,,,2017-01-23 09:52:14,,
1,2017-02-05 06:00:00,4462,ZoeyD,90.0,155.0,114.0,,78.0,,,,,,2017-02-05 11:30:48,,


In [11]:
laborSheetData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25471 entries, 0 to 25470
Data columns (total 16 columns):
Date_Ending_Hour         25471 non-null datetime64[ns]
Store_ID                 25471 non-null int64
Manager                  25471 non-null object
Projected_Sales          25316 non-null float64
Sales                    23678 non-null float64
DT_TTL                   23647 non-null float64
Car_Count                14529 non-null float64
KVS_Total                23608 non-null float64
Scheduled_People         16175 non-null float64
Actual_People            16068 non-null float64
Reason_for_Labor_Diff    553 non-null object
Reason_for_High_TTLs     247 non-null object
Manager_Entering_Data    11057 non-null object
Timestamp                25471 non-null datetime64[ns]
OEPE                     0 non-null float64
Park_Percentage          0 non-null float64
dtypes: datetime64[ns](2), float64(9), int64(1), object(4)
memory usage: 3.1+ MB


# Example 2: Weather Data

Firs we will load the weather data and use .head() to look at the data.  Note that the first four columns are Year, Month, Day, and Hour.

In [12]:
filepath = os.path.join(os.getcwd(), 'data', 'Philadelphia_Pennsylvania_USA', '724080-13739-2001')

headers = ['Year', 'Month', 'Day', 'Hour', 'Air Temp', 'Dew Point Temp', 'Sea Level Pressure',
           'Wind Direction', 'Wind Speed Rate',
           'Sky Condition Total Coverage Code',
           'Liquid Precipitation Depth Dimension - 1Hr Duration',
           'Liquid Precipitation Depth Dimension - Six Hour Duration']

weatherData = pd.read_csv(filepath, delim_whitespace=True, names=headers)

In [13]:
weatherData.head(1)

Unnamed: 0,Year,Month,Day,Hour,Air Temp,Dew Point Temp,Sea Level Pressure,Wind Direction,Wind Speed Rate,Sky Condition Total Coverage Code,Liquid Precipitation Depth Dimension - 1Hr Duration,Liquid Precipitation Depth Dimension - Six Hour Duration
0,2001,1,1,0,-6,-94,10146,280,57,2,0,-9999


### In Class Exercise:
#### Use parse_dates to reload the data and combine the first four columns into one datetime column

In [17]:
weatherData = pd.read_csv(filepath, delim_whitespace=True, names=headers, parse_dates=[["Year", "Month", "Day", "Hour"]]) # fill in this line

weatherData.head()

Unnamed: 0,Year_Month_Day_Hour,Air Temp,Dew Point Temp,Sea Level Pressure,Wind Direction,Wind Speed Rate,Sky Condition Total Coverage Code,Liquid Precipitation Depth Dimension - 1Hr Duration,Liquid Precipitation Depth Dimension - Six Hour Duration
0,2001-01-01 00:00:00,-6,-94,10146,280,57,2,0,-9999
1,2001-01-01 01:00:00,-11,-94,10153,280,57,4,0,-9999
2,2001-01-01 02:00:00,-17,-106,10161,290,62,2,0,-9999
3,2001-01-01 03:00:00,-28,-100,10169,260,57,0,0,-9999
4,2001-01-01 04:00:00,-28,-100,10177,260,52,0,0,-9999


<a name=todatetime></a>
# Convert a String Column to Datetimes After It Has Already Been Read in 

Sometimes, we need to convert a column after we read in the data. Maybe we have created the column during data processing. We can do this with the `to_datetime()` function in the pandas package.

I am going to read in the labor sheet data again, without using parse_dates.

In [18]:
filepath = os.path.join(os.getcwd(), 'data', 'ShiftManagerApp_LaborSheet.csv')
laborSheetData = pd.read_csv(filepath)
laborSheetData.head()

Unnamed: 0,Store_ID,Manager,Date,Ending_Hour,Projected_Sales,Sales,DT_TTL,Car_Count,KVS_Total,Scheduled_People,Actual_People,Reason_for_Labor_Diff,Reason_for_High_TTLs,Manager_Entering_Data,Timestamp,OEPE,Park_Percentage
0,4462,JillianA,2017-01-23,08:00:00,540.0,420.0,170.0,,100.0,,,,,,2017-01-23 09:52:14,,
1,4462,ZoeyD,2017-02-05,06:00:00,90.0,155.0,114.0,,78.0,,,,,,2017-02-05 11:30:48,,
2,4462,JessicaB,2017-02-05,07:00:00,173.0,182.0,106.0,,81.0,,,,,,2017-02-05 11:35:48,,
3,4462,JessicaB,2017-02-05,08:00:00,333.0,311.0,102.0,,55.0,,,,,,2017-02-05 11:52:05,,
4,4462,JessicaB,2017-02-05,09:00:00,594.0,598.0,155.0,,106.0,,,,,,2017-02-05 11:59:35,,


In [19]:
laborSheetData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25471 entries, 0 to 25470
Data columns (total 17 columns):
Store_ID                 25471 non-null int64
Manager                  25471 non-null object
Date                     25471 non-null object
Ending_Hour              25471 non-null object
Projected_Sales          25316 non-null float64
Sales                    23678 non-null float64
DT_TTL                   23647 non-null float64
Car_Count                14529 non-null float64
KVS_Total                23608 non-null float64
Scheduled_People         16175 non-null float64
Actual_People            16068 non-null float64
Reason_for_Labor_Diff    553 non-null object
Reason_for_High_TTLs     247 non-null object
Manager_Entering_Data    11057 non-null object
Timestamp                25471 non-null object
OEPE                     0 non-null float64
Park_Percentage          0 non-null float64
dtypes: float64(9), int64(1), object(7)
memory usage: 3.3+ MB


## Use `to_datetime()` to Convert the "TimeStamp" Column to a Column of Datetime Objects

In [20]:
laborSheetData.loc[:, 'Timestamp'] = pd.to_datetime(laborSheetData['Timestamp'])
laborSheetData.head()

Unnamed: 0,Store_ID,Manager,Date,Ending_Hour,Projected_Sales,Sales,DT_TTL,Car_Count,KVS_Total,Scheduled_People,Actual_People,Reason_for_Labor_Diff,Reason_for_High_TTLs,Manager_Entering_Data,Timestamp,OEPE,Park_Percentage
0,4462,JillianA,2017-01-23,08:00:00,540.0,420.0,170.0,,100.0,,,,,,2017-01-23 09:52:14,,
1,4462,ZoeyD,2017-02-05,06:00:00,90.0,155.0,114.0,,78.0,,,,,,2017-02-05 11:30:48,,
2,4462,JessicaB,2017-02-05,07:00:00,173.0,182.0,106.0,,81.0,,,,,,2017-02-05 11:35:48,,
3,4462,JessicaB,2017-02-05,08:00:00,333.0,311.0,102.0,,55.0,,,,,,2017-02-05 11:52:05,,
4,4462,JessicaB,2017-02-05,09:00:00,594.0,598.0,155.0,,106.0,,,,,,2017-02-05 11:59:35,,


In [21]:
laborSheetData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25471 entries, 0 to 25470
Data columns (total 17 columns):
Store_ID                 25471 non-null int64
Manager                  25471 non-null object
Date                     25471 non-null object
Ending_Hour              25471 non-null object
Projected_Sales          25316 non-null float64
Sales                    23678 non-null float64
DT_TTL                   23647 non-null float64
Car_Count                14529 non-null float64
KVS_Total                23608 non-null float64
Scheduled_People         16175 non-null float64
Actual_People            16068 non-null float64
Reason_for_Labor_Diff    553 non-null object
Reason_for_High_TTLs     247 non-null object
Manager_Entering_Data    11057 non-null object
Timestamp                25471 non-null datetime64[ns]
OEPE                     0 non-null float64
Park_Percentage          0 non-null float64
dtypes: datetime64[ns](1), float64(9), int64(1), object(6)
memory usage: 3.3+ MB


### We can combine columns as well, but we have to do this by combining the columns as strings and _then_ passing the result to the to_datetim() method

In [22]:
laborSheetData.loc[:, 'Date_Ending_Hour'] = pd.to_datetime(laborSheetData['Date'] + ' ' + laborSheetData['Ending_Hour'])
laborSheetData.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25471 entries, 0 to 25470
Data columns (total 18 columns):
Store_ID                 25471 non-null int64
Manager                  25471 non-null object
Date                     25471 non-null object
Ending_Hour              25471 non-null object
Projected_Sales          25316 non-null float64
Sales                    23678 non-null float64
DT_TTL                   23647 non-null float64
Car_Count                14529 non-null float64
KVS_Total                23608 non-null float64
Scheduled_People         16175 non-null float64
Actual_People            16068 non-null float64
Reason_for_Labor_Diff    553 non-null object
Reason_for_High_TTLs     247 non-null object
Manager_Entering_Data    11057 non-null object
Timestamp                25471 non-null datetime64[ns]
OEPE                     0 non-null float64
Park_Percentage          0 non-null float64
Date_Ending_Hour         25471 non-null datetime64[ns]
dtypes: datetime64[ns](2), float6

<a name=dtattribute></a>
# Using `.dt` to Operate on Datetime Columns
I am sure you are wondering why we went to the trouble to convert the columns to the datetime data type. Well it lets us manipulate the datetimes much more easily. Let's see some basics below:

First I read the data back in, correctly use the `parse_dates` argument.

In [23]:
laborSheetData = pd.read_csv(filepath, parse_dates=[["Date", "Ending_Hour"], "Timestamp"])
laborSheetData.head(2)

Unnamed: 0,Date_Ending_Hour,Store_ID,Manager,Projected_Sales,Sales,DT_TTL,Car_Count,KVS_Total,Scheduled_People,Actual_People,Reason_for_Labor_Diff,Reason_for_High_TTLs,Manager_Entering_Data,Timestamp,OEPE,Park_Percentage
0,2017-01-23 08:00:00,4462,JillianA,540.0,420.0,170.0,,100.0,,,,,,2017-01-23 09:52:14,,
1,2017-02-05 06:00:00,4462,ZoeyD,90.0,155.0,114.0,,78.0,,,,,,2017-02-05 11:30:48,,


## Accessing the Month, Day, and Year of Datetime Values

### Use `.dt `to access the datetime properties, and then use `.month` to get the month of each value in the column

In [25]:
# add .dt.month.head(2)
laborSheetData['Date_Ending_Hour'].dt.month.head(2)

0    1
1    2
Name: Date_Ending_Hour, dtype: int64

### Use `.dt` and `.year` to get the year of each columns

In [26]:
# add .dt.year.head(2)
laborSheetData['Date_Ending_Hour'].dt.year.head(2)

0    2017
1    2017
Name: Date_Ending_Hour, dtype: int64

### Use `.dt `and `.day` to get the day of each columns

In [27]:
# add .dt.day.head(2)
laborSheetData['Date_Ending_Hour'].dt.day.head(2)

0    23
1     5
Name: Date_Ending_Hour, dtype: int64

### Use `.dt` and `.hour` to get the hour of each columns

In [28]:
# add .dt.hour.head(2)
laborSheetData['Date_Ending_Hour'].dt.hour.head()

0    8
1    6
2    7
3    8
4    9
Name: Date_Ending_Hour, dtype: int64

### Use `.dt` and `.hour` and `.value_counts()` to get the row counts for each hour; do some hours get recorded less than others?

In [29]:
# add .dt.hour.value_counts()
laborSheetData['Date_Ending_Hour'].dt.hour.value_counts()

8     1615
9     1597
10    1555
11    1526
12    1502
13    1464
14    1409
15    1385
7     1349
16    1342
18    1333
6     1324
17    1317
19    1288
20    1235
21    1209
22    1147
23    1042
0      606
1      226
Name: Date_Ending_Hour, dtype: int64

### Introducing `.sort_index()`.  The `.sort_index()` method will sort the index of a dataframe (while reordering the rows appropriately)

Note that the index of the value_counts() results is made up of the values that you are counting. Since these are hours, let's use `.sort_index()` to sort  the hours in order.

In [30]:
laborSheetData['Date_Ending_Hour'].dt.hour.value_counts().sort_index() # add .sort_index() to sort the values

0      606
1      226
6     1324
7     1349
8     1615
9     1597
10    1555
11    1526
12    1502
13    1464
14    1409
15    1385
16    1342
17    1317
18    1333
19    1288
20    1235
21    1209
22    1147
23    1042
Name: Date_Ending_Hour, dtype: int64

<a name=minmaxsum></a>
# Introducing the `.min()`, `.max()` and `.sum()` Methods.
You can use these methods on any column, or dataframe to get the min, max, and sum of all numerical columns. You can also use the `.min()` and `.max()` methods to get the min and max of datetime columns

### Use `.min()` and `.max()` to find the earliest and latest year in the data.

In [31]:
print(laborSheetData['Date_Ending_Hour'].min())
print(laborSheetData['Date_Ending_Hour'].max())

2017-01-23 08:00:00
2018-07-29 01:00:00


### Combine the use of `.loc`, `.dt`, and `.sum()` to find the total sales In August across all stores.

In [35]:
# Fill in this cell as an exercise
laborSheetData.loc[laborSheetData["Date_Ending_Hour"].dt.month == 8, 'Sales'].sum()

206288.23

## Selecting Rows Greater Than (or Less Than) a Date.
Getting rows greater than, or less than (or some combination of logic tests) is fairly easy.  You use your normal logic tests: >, <, ==, etc...  and pass the value you'd like to test against as a string.  See the example below, note how we are able to use the string '2017-08-01' to get all rows with a value in the 'TimeStamp' column greater than '2017-08-01'. Easy!

In [36]:
laborSheetData.loc[laborSheetData['Timestamp'] <= '2017-08-01 00:00:00', :].head()

Unnamed: 0,Date_Ending_Hour,Store_ID,Manager,Projected_Sales,Sales,DT_TTL,Car_Count,KVS_Total,Scheduled_People,Actual_People,Reason_for_Labor_Diff,Reason_for_High_TTLs,Manager_Entering_Data,Timestamp,OEPE,Park_Percentage
0,2017-01-23 08:00:00,4462,JillianA,540.0,420.0,170.0,,100.0,,,,,,2017-01-23 09:52:14,,
1,2017-02-05 06:00:00,4462,ZoeyD,90.0,155.0,114.0,,78.0,,,,,,2017-02-05 11:30:48,,
2,2017-02-05 07:00:00,4462,JessicaB,173.0,182.0,106.0,,81.0,,,,,,2017-02-05 11:35:48,,
3,2017-02-05 08:00:00,4462,JessicaB,333.0,311.0,102.0,,55.0,,,,,,2017-02-05 11:52:05,,
4,2017-02-05 09:00:00,4462,JessicaB,594.0,598.0,155.0,,106.0,,,,,,2017-02-05 11:59:35,,


## In Class Exercise:
Create a cell below and answer the following questions:

* What were the total sales in store 18065 on 2017-07-01?
* What were the total sales in store 4462 for the month of July?
* What was the difference between the sales in store 4462 and store 18065 in October-2017?



In [55]:
problem_1 = laborSheetData.loc[(laborSheetData['Date_Ending_Hour'] >= "2017-07-01") &
                   (laborSheetData['Date_Ending_Hour'] < "2017-07-02") &
                   (laborSheetData['Store_ID'] == 18065), "Sales"].sum()

problem_2 = laborSheetData.loc[(laborSheetData['Date_Ending_Hour'].dt.month == 7) &
                   (laborSheetData['Store_ID'] == 4462), "Sales"].sum()

store_4462 = laborSheetData.loc[(laborSheetData['Date_Ending_Hour'].dt.month == 10) &
                                (laborSheetData['Date_Ending_Hour'].dt.year == 2017) &
                                (laborSheetData['Store_ID'] == 4462), "Sales"].sum()

store_18056 = laborSheetData.loc[(laborSheetData['Date_Ending_Hour'].dt.month == 10) &
                                (laborSheetData['Date_Ending_Hour'].dt.year == 2017) &
                                (laborSheetData['Store_ID'] == 18056), "Sales"].sum()

print(problem_1)
print(problem_2)
print(store_4462 - store_18056)

327.0
330213.02
4944.0


<a name=timezone></a>
# Changing the Time Zone of a Datetime Column


## First - Localize a Datetime Column (Assign it a Timezone)

Pandas includes some easy functionality to convert the timezone of a datetime column.  The first thing we need to do is 'localize' the column - this means defining what timezone the original data is in.  They data we have been using happens to be from fast food location on the west coast, so we will localize the timestamp column to the 'US/Pacific' timezone by using the `tz_localize()` method.

You can find a list of all accepted timezones in the top answer to this stackovrflow question: 
https://stackoverflow.com/questions/13866926/python-pytz-list-of-timezones

In [56]:
laborSheetData.loc[:, 'Timestamp'] = laborSheetData['Timestamp'].dt.tz_localize('US/Pacific')

In [57]:
laborSheetData.head()

Unnamed: 0,Date_Ending_Hour,Store_ID,Manager,Projected_Sales,Sales,DT_TTL,Car_Count,KVS_Total,Scheduled_People,Actual_People,Reason_for_Labor_Diff,Reason_for_High_TTLs,Manager_Entering_Data,Timestamp,OEPE,Park_Percentage
0,2017-01-23 08:00:00,4462,JillianA,540.0,420.0,170.0,,100.0,,,,,,2017-01-23 09:52:14-08:00,,
1,2017-02-05 06:00:00,4462,ZoeyD,90.0,155.0,114.0,,78.0,,,,,,2017-02-05 11:30:48-08:00,,
2,2017-02-05 07:00:00,4462,JessicaB,173.0,182.0,106.0,,81.0,,,,,,2017-02-05 11:35:48-08:00,,
3,2017-02-05 08:00:00,4462,JessicaB,333.0,311.0,102.0,,55.0,,,,,,2017-02-05 11:52:05-08:00,,
4,2017-02-05 09:00:00,4462,JessicaB,594.0,598.0,155.0,,106.0,,,,,,2017-02-05 11:59:35-08:00,,


## Then - Convert the Column To A Different Timezone Using `tz_convert`

In [58]:
laborSheetData.loc[:, 'Timestamp_UTC'] = laborSheetData['Timestamp'].dt.tz_convert('UTC')

	To accept the future behavior, pass 'dtype=object'.
	To keep the old behavior, pass 'dtype="datetime64[ns]"'.
  self.obj[key] = _infer_fill_value(value)


In [59]:
laborSheetData.head()

Unnamed: 0,Date_Ending_Hour,Store_ID,Manager,Projected_Sales,Sales,DT_TTL,Car_Count,KVS_Total,Scheduled_People,Actual_People,Reason_for_Labor_Diff,Reason_for_High_TTLs,Manager_Entering_Data,Timestamp,OEPE,Park_Percentage,Timestamp_UTC
0,2017-01-23 08:00:00,4462,JillianA,540.0,420.0,170.0,,100.0,,,,,,2017-01-23 09:52:14-08:00,,,2017-01-23 17:52:14+00:00
1,2017-02-05 06:00:00,4462,ZoeyD,90.0,155.0,114.0,,78.0,,,,,,2017-02-05 11:30:48-08:00,,,2017-02-05 19:30:48+00:00
2,2017-02-05 07:00:00,4462,JessicaB,173.0,182.0,106.0,,81.0,,,,,,2017-02-05 11:35:48-08:00,,,2017-02-05 19:35:48+00:00
3,2017-02-05 08:00:00,4462,JessicaB,333.0,311.0,102.0,,55.0,,,,,,2017-02-05 11:52:05-08:00,,,2017-02-05 19:52:05+00:00
4,2017-02-05 09:00:00,4462,JessicaB,594.0,598.0,155.0,,106.0,,,,,,2017-02-05 11:59:35-08:00,,,2017-02-05 19:59:35+00:00


# Extra Material Below

Below is material we usual don't cover in the lecture, but you may want to review it after the lecture.

<a name=date_range></a>
# Using .date_range() to create a Datetime index

We can easily create a Datetime index by using the .date_range() method. We just need to pass a start date, and end date (or a number of periods) and a frequency (the amount of time between each value in the series of values).  Possible values for freq are:
* s (for seconds)
* min (for minutes)
* H (for hours)
* D (for days)
* A (for annual, ending on the end of a year)
* AS (for annual, ending on the start of a year)
* You can also do multiples, i.e. 3H for a step of 3 hours.
* You can also use something called a DateOffset object for more custom steps, but that is beyond the scope of this course.

Note that, once you create an index, you can use it as a row index or column in a dataframe.

The docs for this method are here: <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.date_range.html>

Let's look at some examples:

#### Create a DatetimeIndex with a start date of 2001-01-01, an end date of 2002-01-01 and a step of 3 hours.

In [None]:
pd.date_range(start='2001-01-01', end='2002-01-01', freq='3H')

#### Create a DatetimeIndex with a start date of 2001-01-01 and a  frequency of 1 day that continues for 100 periods.

In [None]:
pd.date_range(start='2001-01-01', periods=100, freq='D')

#### Create a DatetimeIndex with a start date of 1970-01-01, that continues for 10 periods, and has an annual frequency - on the first day of each year.

In [None]:
pd.date_range(start='1970-01-01', periods=10, freq='AS')

#### Create a DatetimeIndex with a start date of 1970-01-01, that continues for 10 periods, and has an annual frequency - on the last day of each year.

In [None]:
pd.date_range(start='1969-12-31', periods=10, freq='A')

#### Create a DatetimeIndex with a start date of 2001-01-01 and a step size of 1 day that continues for 100 steps.

In [None]:
pd.date_range(start='1969-12-31', periods=10, freq='A-AUG')

### In Class Exercise:
Please add a cell below and create three DatetimeInidices of your choosing.

## Question or Comments About This Notebook?
Feel free to contact me via my LinkedIn: https://www.linkedin.com/in/william-j-henry <br>
You can also email me at will@henryanalytics.com <br>