# 1-02 Working With Dates and Times

In [1]:
# Options for all cells 
import pandas as pd
# change display setting of pandas
pd.set_option('display.width', 76)
pd.set_option('display.max_colwidth', 10)
import warnings
warnings.filterwarnings('ignore', category=Warning)

* Pandas has a well developed library for dealing with Dates and Times.
* This is built on top of Numpy
* There is also the datetime library in Python - Pandas plays nice with this and the two are easily convertable.

## The Timestamp Object

* We'll start with the basics - a Timestamp Object
* We use the pd.to_datetime function with an argument

In [2]:
pd.to_datetime("1999/01/12 22:01:00")

Timestamp('1999-01-12 22:01:00')

In [3]:
pd.to_datetime("22:01 19990112")

Timestamp('1999-01-12 22:01:00')

In [4]:
pd.to_datetime("199901122201")

Timestamp('1999-01-12 22:01:00')

In [5]:
pd.to_datetime("12th Jan 1999 10:01pm")

Timestamp('1999-01-12 22:01:00')

* However we have to be careful as pandas can make incorrect assumptions regarding our date format

In [6]:
pd.to_datetime("12/01/1999 22:01")

Timestamp('1999-12-01 22:01:00')

* Here pandas has incorrectly transposed the day and month
* Defaults to the US date format
* We can get around this by including a format statement

In [7]:
pd.to_datetime("12/01/1999 22:01", format="%d/%m/%Y %H:%M")

Timestamp('1999-01-12 22:01:00')

* http://strftime.org/ is an excellent resource for format syntax and well worth bookmarking
* Applies to all of Python and not just Pandas

## Extracting information from TimeStamp objects

* Once we've created a Timestamp object we can also extract information from it
* We do this by calling an attribute:

In [8]:
 my_day = pd.to_datetime("12th Jan 1999 10:01pm")

In [9]:
my_day.weekday_name

'Tuesday'

* Attributes differ from methods in that they just return a piece of data associated with an object, rather than running a function
* A method call will have brackets after it, an attribute call won't
* You can get a full list of the attribute objects for Timesstamp objects here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html

## The date_range function

* In the 'real world' you'll likely not be working with a single date but a range 
* We can create a range of dates to work with using the date_range function

In [10]:
my_range = pd.date_range(start="1999-12-01", end="1999-12-09")

In [11]:
my_range

DatetimeIndex(['1999-12-01', '1999-12-02', '1999-12-03', '1999-12-04',
               '1999-12-05', '1999-12-06', '1999-12-07', '1999-12-08',
               '1999-12-09'],
              dtype='datetime64[ns]', freq='D')

* This creates a DatetimeIndex object.
* This is a list of pandas Timestamp objects
* We can use standard Python indexing syntax to access the items:

In [12]:
my_range[0]

Timestamp('1999-12-01 00:00:00', freq='D')

In [13]:
my_range[-1]

Timestamp('1999-12-09 00:00:00', freq='D')

In [14]:
my_range[0:3]

DatetimeIndex(['1999-12-01', '1999-12-02', '1999-12-03'], dtype='datetime64[ns]', freq='D')

* date_range defaults to a daily sequence but we can change this using either the freq argument:

In [15]:
res = pd.date_range("19991201", "20000101", freq="4D")
res

DatetimeIndex(['1999-12-01', '1999-12-05', '1999-12-09', '1999-12-13',
               '1999-12-17', '1999-12-21', '1999-12-25', '1999-12-29'],
              dtype='datetime64[ns]', freq='4D')

* Or the periods argument:

In [16]:
pd.date_range(start="20000101", periods=4)

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04'], dtype='datetime64[ns]', freq='D')

## Date arithmetic

* Dates are stored in Pandas as numbers so you can perform calculations on them

In [17]:
time_difference = pd.to_datetime("2018-01-31") - pd.to_datetime("2018-01-01")
time_difference

Timedelta('30 days 00:00:00')

* This creates a Timedelta object
* Timedeltas store a period of time
* This shares some of the properties of a Timestamp object

In [18]:
time_difference.days

30

In [19]:
time_difference.seconds

0

* We can also create our own Timedelta object and use it with a Timestamp object as follows:

In [20]:
newYearEve = pd.to_datetime("2017-12-31 23:59:59")
newYearEve + pd.Timedelta("2 seconds")

Timestamp('2018-01-01 00:00:01')

In [21]:
newYearEve + pd.Timedelta(3, unit='M')

Timestamp('2018-04-02 07:27:17')

* Pandas also has a set of tools for offsetting dates
* These are useful for transforming dates to something specific
* (e.g you have some messy dates and you want them to be consistent in all occuring on a year end)

In [22]:
from pandas.tseries.offsets import YearEnd

In [23]:
toms_birthday = pd.to_datetime('2019-02-14 00:00:00')
toms_birthday - YearEnd(1)

Timestamp('2018-12-31 00:00:00')

In [24]:
toms_birthday + YearEnd(1)

Timestamp('2019-12-31 00:00:00')

In [25]:
YearEnd(1)

<YearEnd: month=12>

## Exercise

Note that I've shown you the basics, now you have to apply these to a dataframe. Hopefully you'll be able to work it out, but if not, feel free to ask for help.

1. Load in dji.csv, reformat the Date column, and store the weekday of that date in an additional column.
2. How many times does Monday occur in the data?
3. Load in air_passengers.csv. The Time column is in the format Month-Year. Reformat the column to a Day-Month-Year format.

#### 1. Load in `dji.csv`, reformat the Date column, and store the weekday of that date in an additional column.

In [26]:
dji = pd.read_csv('dji.csv')
dji['Date'] = pd.to_datetime(dji['Date'])
dji['Weekday'] = dji['Date'].dt.day_name()

#### 2. How many times does Monday occur in the data?

In [27]:
dji['Weekday'].value_counts()

Tuesday      52
Wednesday    52
Friday       50
Thursday     50
Monday       48
Name: Weekday, dtype: int64

#### 3. Load in `air_passengers.csv`. The Time column is in the format Month-Year. Reformat the column to a Day-Month-Year format. 

In [28]:
ap = pd.read_csv('air_passengers.csv')

In [30]:
ap.head(5)

Unnamed: 0,index,value
0,1949 Jan,112
1,1949 Feb,118
2,1949 Mar,132
3,1949 Apr,129
4,1949 May,121


In [35]:
type(ap['index'])

pandas.core.series.Series

In [36]:
ap['index'] = pd.to_datetime(ap['index'], format='%Y %b')

In [37]:
ap.head(5)

Unnamed: 0,index,value
0,1949-01-01,112
1,1949-02-01,118
2,1949-03-01,132
3,1949-04-01,129
4,1949-05-01,121


In [38]:
type(ap['index'])

pandas.core.series.Series