Manipulating dates and time is quite flexible in Pandas and thus allows us to conduct more analysis such as time series analysis, which we will talk about soon. Actually, pandas was originally created by Wed McKinney to handle date and time data when he worked as a consultant for hedge funds.

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

In [1]:
import pandas as pd
import numpy as np

## Timestamp

In [2]:
pd.Timestamp('2021/6/8 12:13pm') #YYYY-MM-DD

Timestamp('2021-06-08 12:13:00')

In [3]:
pd.Timestamp('06/08/2021 12:13pm') #MM-DD-YYYY

Timestamp('2021-06-08 12:13:00')

In [4]:
# We can also create a timestamp by passing multiple parameters such as year, month, date, hour,
# minute, separately
pd.Timestamp(2021, 6, 8, 12, 13)

Timestamp('2021-06-08 12:13:00')

In [5]:
# Timestamp also has some useful attributes, such as isoweekday(), which shows the weekday of the timestamp
# note that 1 represents Monday and 7 represents Sunday
pd.Timestamp(2019, 12, 20, 0, 0).isoweekday()

5

In [6]:
# You can find extract the specific year, month, day, hour, minute, second from a timestamp
pd.Timestamp(2019, 12, 20, 5, 2,23).second

23

In [7]:
pd.Timestamp(2021, 6, 8, 12, 21, 34).hour

12

In [8]:
pd.Timestamp(2021, 6, 8, 12, 21, 34).year

2021

In [9]:
pd.Timestamp.min

Timestamp('1677-09-21 00:12:43.145225')

In [10]:
pd.Timestamp.max

Timestamp('2262-04-11 23:47:16.854775807')

## Period

In [11]:
# Suppose we weren't interested in a specific point in time and instead wanted a span of time. This is where
# the Period class comes into play. Period represents a single time span, such as a specific day or month.

# Here we are creating a period that is January 2016,
pd.Period('2016-1')

Period('2016-01', 'M')

In [12]:
# You'll notice when we print that out that the granularity of the period is M for month, since that was the
# finest grained piece we provided. Here's an example of a period that is March 5th, 2016.
pd.Period('2016-1-5')

Period('2016-01-05', 'D')

In [13]:
# Period objects represent the full timespan that you specify. Arithmetic on period is very easy and
# intuitive, for instance, if we want to find out 5 months after January 2016, we simply plus 5
pd.Period('2016/1')+5

Period('2016-06', 'M')

In [14]:
# From the result, you can see we get June 2016. If we want to find out two days before March 5th 2016, we
# simply subtract 2
pd.Period('2016/3/5')-2

Period('2016-03-03', 'D')

In [15]:
# The key here is that the period object encapsulates the granularity for arithmetic

## DatetimeIndex and PeriodIndex

In [16]:
# The index of a timestamp is DatetimeIndex. Let's look at a quick example. First, let's create our example
# series t1, we'll use the Timestamp of September 1st, 2nd and 3rd of 2021. When we look at the series, each
# Timestamp is the index and has a value associated with it, in this case, a, b and c.

t1 = pd.Series(list('abc'), [(pd.Timestamp('2021/9/1')),(pd.Timestamp('2021/9/2')),(pd.Timestamp('2021/9/3'))])

In [17]:
t1

2021-09-01    a
2021-09-02    b
2021-09-03    c
dtype: object

In [18]:
# Looking at the type of our series index, we see that it's DatetimeIndex.
type(t1.index)

pandas.core.indexes.datetimes.DatetimeIndex

In [19]:
t1.index

DatetimeIndex(['2021-09-01', '2021-09-02', '2021-09-03'], dtype='datetime64[ns]', freq=None)

In [20]:
t2 = pd.Series(list('def'), [(pd.Period('2021-1')),(pd.Period('2021-2')),(pd.Period('2021-3'))])

In [21]:
t2

2021-01    d
2021-02    e
2021-03    f
Freq: M, dtype: object

In [22]:
# Looking at the type of the t2.index, we can see that it's PeriodIndex. 
type(t2.index)

pandas.core.indexes.period.PeriodIndex

## Converting to Datetime

In [23]:
# Now, let's look into how to convert to Datetime. Suppose we have a list of dates as strings and we want to
# create a new dataframe

# I'm going to try a bunch of different date formats
d1 = ['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16']

# And just some random data
ts3 = pd.DataFrame(np.random.randint(10, 100, (4,2)), index=d1, 
                   columns=list('ab'))
ts3

Unnamed: 0,a,b
2 June 2013,56,77
"Aug 29, 2014",95,80
2015-06-26,66,29
7/12/16,17,34


In [24]:
# Using pandas to_datetime, pandas will try to convert these to Datetime and put them in a standard format.

ts3.index = pd.to_datetime(ts3.index)
ts3

Unnamed: 0,a,b
2013-06-02,56,77
2014-08-29,95,80
2015-06-26,66,29
2016-07-12,17,34


In [25]:
ts3 = ts3.sort_index()

In [26]:
ts3

Unnamed: 0,a,b
2013-06-02,56,77
2014-08-29,95,80
2015-06-26,66,29
2016-07-12,17,34


In [27]:
# to_datetime also() has options to change the date parse order. For example, we 
# can pass in the argument dayfirst = True to parse the date in European date.

pd.to_datetime('4.7.12', dayfirst=True)

Timestamp('2012-07-04 00:00:00')

In [28]:
pd.to_datetime("12-11-2010 00:00", format="%d-%m-%Y %H:%M")

Timestamp('2010-11-12 00:00:00')

In [29]:
pd.to_datetime(["14-01-2012", "01-14-2012"], dayfirst=True)

DatetimeIndex(['2012-01-14', '2012-01-14'], dtype='datetime64[ns]', freq=None)

In [30]:
# You see in the above example that dayfirst isn’t strict, 
# so if a date can’t be parsed with the day being first it will be parsed as if dayfirst were False.
pd.to_datetime(["14-01-2012", "01-12-2012"], dayfirst=True)
# after changing 14 to 12 in 2nd date we see that now 01 is taken as date

DatetimeIndex(['2012-01-14', '2012-12-01'], dtype='datetime64[ns]', freq=None)

In [31]:
pd.to_datetime('aug-06-12')

Timestamp('2012-08-06 00:00:00')

In [32]:
pd.to_datetime('06-Aug-2021')

Timestamp('2021-08-06 00:00:00')

In [33]:
pd.to_datetime('06-Aug,2021')

Timestamp('2021-08-06 00:00:00')

In [34]:
pd.to_datetime('06 Aug,2021')

Timestamp('2021-08-06 00:00:00')

In [35]:
df = pd.DataFrame({"year": [2015, 2016], "month": [2, 3], "day": [4, 5], "hour": [2, 3]})
pd.to_datetime(df)

0   2015-02-04 02:00:00
1   2016-03-05 03:00:00
dtype: datetime64[ns]

## Timedelta

In [36]:
# Timedeltas are differences in times. This is not the same as a a period, but conceptually similar. For
# instance, if we want to take the difference between September 3rd and  September 1st, we get a Timedelta of
# two days.
pd.Timestamp('9/3/2016')-pd.Timestamp('9/1/2016')

Timedelta('2 days 00:00:00')

In [37]:
pd.Timestamp('2021-06-8') - pd.Timestamp('2020-03-8')

Timedelta('457 days 00:00:00')

In [38]:
# We can also do something like find what the date and time is for 12 days and three hours past September 2nd,
# at 8:10 AM.
pd.Timestamp('9/2/2016 8:10AM') + pd.Timedelta('12D 3H')

Timestamp('2016-09-14 11:10:00')

In [39]:
pd.Timestamp('9/3/2016') + pd.Timedelta('9 1 2021')

Timestamp('2016-09-03 00:00:00.000912021')

In [40]:
idx = pd.date_range("2021-06-08", periods=3, freq="D")
ts = pd.Series(idx.day_name(), index=idx)
print (ts)

2021-06-08      Tuesday
2021-06-09    Wednesday
2021-06-10     Thursday
Freq: D, dtype: object


In [41]:
idx = pd.date_range("2021-06-08", periods=3, freq="D")
ts = pd.Series(['Tuesday', 'Wednesday', 'Thursday'], index=idx)
ts

2021-06-08      Tuesday
2021-06-09    Wednesday
2021-06-10     Thursday
Freq: D, dtype: object

In [42]:
idx = pd.date_range("2021-06-08", periods=3, freq="D")
ts = pd.Series((x for x in idx.day_name()), index=idx)
ts

2021-06-08      Tuesday
2021-06-09    Wednesday
2021-06-10     Thursday
Freq: D, dtype: object

In [43]:
type(idx.day_name())

pandas.core.indexes.base.Index

In [44]:
friday = pd.Timestamp("2018-01-05")
saturday = friday + pd.Timedelta("1 day")
saturday.day_name()

'Saturday'

## Offset

In [45]:
# Add 1 business day (Friday --> Monday)
monday = friday + pd.offsets.BDay()
monday.day_name()

'Monday'

In [46]:
# Offset is similar to timedelta, but it follows specific calendar duration rules. Offset allows flexibility
# in terms of types of time intervals. Besides hour, day, week, month, etc it also has business day, end of
# month, semi month begin etc

# Let's create a timestamp, and see what day is that
pd.Timestamp('9/4/2016').weekday()

6

In [47]:
# Now we can now add the timestamp with a week ahead
pd.Timestamp('9/4/2016') + pd.offsets.Week()

Timestamp('2016-09-11 00:00:00')

In [48]:
# Now let's try to do the month end, then we would have the last day of Septemer
pd.Timestamp('9/4/2016') + pd.offsets.MonthEnd()

Timestamp('2016-09-30 00:00:00')

## Working with Dates in DataFrame

In [49]:
# Next, let's look at a few tricks for working with dates in a DataFrame. Suppose we want to look at nine
# measurements, taken bi-weekly, every Sunday, starting in October 2016. Using date_range, we can create this
# DatetimeIndex. In data_range, we have to either specify the start or end date. If it is not explicitly
# specified, by default, the date is considered the start date. Then we have to specify number of periods, and
# a frequency. Here, we set it to "2W-SUN", which means biweekly on Sunday

dates = pd.date_range('10-01-2016', periods=9, freq='2W-SUN')
dates

DatetimeIndex(['2016-10-02', '2016-10-16', '2016-10-30', '2016-11-13',
               '2016-11-27', '2016-12-11', '2016-12-25', '2017-01-08',
               '2017-01-22'],
              dtype='datetime64[ns]', freq='2W-SUN')

In [50]:
# There are many other frequencies that you can specify. For example, you can do business day
pd.date_range('10-01-2016', periods=9, freq='B')

DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
               '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
               '2016-10-13'],
              dtype='datetime64[ns]', freq='B')

In [51]:
# Or you can do quarterly, with the quarter start in June
pd.date_range('04-01-2016', periods=12, freq='QS-JUN')

DatetimeIndex(['2016-06-01', '2016-09-01', '2016-12-01', '2017-03-01',
               '2017-06-01', '2017-09-01', '2017-12-01', '2018-03-01',
               '2018-06-01', '2018-09-01', '2018-12-01', '2019-03-01'],
              dtype='datetime64[ns]', freq='QS-JUN')

In [52]:
# Now, let's go back to our weekly on Sunday example and create a DataFrame using these dates, and some random
# data, and see what we can do with it.
# numpy.cumsum() function is used when we want to compute the cumulative sum of array elements over a given axis.
dates = pd.date_range('10-01-2016', periods=9, freq='2W-SUN')
df = pd.DataFrame({'Count 1': 100 + np.random.randint(-5, 10, 9).cumsum(),
                  'Count 2': 120 + np.random.randint(-5, 10, 9)}, index=dates)
df

Unnamed: 0,Count 1,Count 2
2016-10-02,105,127
2016-10-16,102,122
2016-10-30,106,121
2016-11-13,107,121
2016-11-27,111,127
2016-12-11,110,116
2016-12-25,111,124
2017-01-08,115,119
2017-01-22,122,128


In [53]:
# First, we can check what day of the week a specific date is. For example, here we can see that all the dates
# in our index are on a Sunday. Which matches the frequency that we set
df.index.day_name()

Index(['Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday',
       'Sunday', 'Sunday'],
      dtype='object')

In [54]:
# We can also use diff() to find the difference between each date's value.
df.diff()

Unnamed: 0,Count 1,Count 2
2016-10-02,,
2016-10-16,-3.0,-5.0
2016-10-30,4.0,-1.0
2016-11-13,1.0,0.0
2016-11-27,4.0,6.0
2016-12-11,-1.0,-11.0
2016-12-25,1.0,8.0
2017-01-08,4.0,-5.0
2017-01-22,7.0,9.0


In [55]:
# Suppose we want to know what the mean count is for each month in our DataFrame. We can do this using
# resample. Converting from a higher frequency from a lower frequency is called downsampling (we'll talk about
# this in a moment)
df.resample('M').mean()

Unnamed: 0,Count 1,Count 2
2016-10-31,104.333333,123.333333
2016-11-30,109.0,124.0
2016-12-31,110.5,120.0
2017-01-31,118.5,123.5


In [56]:
# Now let's talk about datetime indexing and slicing, which is a wonderful feature of the pandas DataFrame.
# For instance, we can use partial string indexing to find values from a particular year,
df['2017']

  df['2017']


Unnamed: 0,Count 1,Count 2
2017-01-08,115,119
2017-01-22,122,128


In [57]:
df.loc['2017']

Unnamed: 0,Count 1,Count 2
2017-01-08,115,119
2017-01-22,122,128


In [58]:
# Or we can do it from a particular month
df.loc['2016-12']

Unnamed: 0,Count 1,Count 2
2016-12-11,110,116
2016-12-25,111,124


In [59]:
# Or we can even slice on a range of dates For example, here we only want the values from December 2016
# onwards.
df.loc['2016-12':]

Unnamed: 0,Count 1,Count 2
2016-12-11,110,116
2016-12-25,111,124
2017-01-08,115,119
2017-01-22,122,128


In [60]:
df.loc['2016']

Unnamed: 0,Count 1,Count 2
2016-10-02,105,127
2016-10-16,102,122
2016-10-30,106,121
2016-11-13,107,121
2016-11-27,111,127
2016-12-11,110,116
2016-12-25,111,124
