# 01. Datetime and Timedelta

### Objectives

+ A Pandas **`Timestamp`** is based on NumPy's **`datetime64`** which is superior to Python's **`datetime`**
+ **`pd.to_datetime`** is flexible and intelligent. Converts both strings and numbers to Timestamps
+ Timestamp and datetime refer to the same thing in Pandas
+ **`pd.to_timedelta`** converts strings and numbers to Timedeltas

### Resource
+ [Time series pandas documentation][1]
+ [Timedelta pandas documentation][2]

## Introduction
This notebook covers two distinct concepts, datetimes and timedeltas and how they are created and used in Pandas. A datetime represents a specific moment in time

[1]: http://pandas.pydata.org/pandas-docs/stable/timeseries.html
[2]: http://pandas.pydata.org/pandas-docs/stable/timedeltas.html

# Date vs Time vs Datetime
There is a distinction that needs to be made between the terms **date**, **time**, and **datetime**. They all three mean different things.

* **date** - Only the month, day, and year. So 2016-01-01 would represent January 1, 2016 and be considered a **date**.
* **time** - Only the hours, minutes, seconds and parts of a second (milli/micro/nano). 5 hours, 45 minutes and 6.74234 seconds for example would be considered a **time**.
* **datetime** - A combination of the above two. Has both date (Year, Month, Day) and time (Hour, Minute, Second) components. January 1, 2016 at 5:45 p.m. would be an example of a **datetime**.

The Python standard library contains the [datetime module][1]. It is a popular and important module but will not be covered here since Pandas builds its own datetime and timedelta objects that are much more powerful. However, there are some notes available on datetime standard library in the **extras** directory.

[1]: https://docs.python.org/3.5/library/datetime.html

# Datetimes in NumPy
NumPy has a [datetime64 data type][1]. It is more powerful and flexible than core Python's datetime module.

# Pandas Timestamp builds on top of NumPy's datetime64
Pandas goes one step further and has created a **`Timestamp`** data type that adds more functionality to NumPy's **`datetime64`**.

[1]: https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html

# Creating a single Timestamp with function `to_datetime`

The **`pd.to_datetime`** function converts both strings and numbers to Timestamps as well as entire arrays or Series. It is intelligent and can detect a wide variety of strings. 

In [1]:
import pandas as pd
import numpy as np

### Convert several different types of strings to Timestamps with `to_datetime`

In [2]:
pd.to_datetime('2016/1/10')

Timestamp('2016-01-10 00:00:00')

In [3]:
pd.to_datetime('2016-1-10')

Timestamp('2016-01-10 00:00:00')

In [4]:
pd.to_datetime('Jan 3, 2019 20:45.56')

Timestamp('2019-01-03 20:45:33')

In [5]:
pd.to_datetime('2016-01-05T05:34:43.123456789')

Timestamp('2016-01-05 05:34:43.123456789')

### Epoch
Python uses the UNIX epoch of Jan 1, 1970. This is officially the considered as integer 0, which represents 0 **nanoseconds** from Jan 1, 1970. The number 100 would represent 100 nanoseconds after Jan 1, 1970.

### Converting numbers to Timestamps
You can pass numbers to the **`to_datetime`** function and it will convert it to a Timestamp. It assumes you are passing in the number of nanoseconds after the epoch.

In [6]:
pd.to_datetime(100)

Timestamp('1970-01-01 00:00:00.000000100')

### Specify unit
The default unit is nanoseconds, but you can specify with the **`unit`** parameter.

In [7]:
# 100 seconds after epoch
pd.to_datetime(100, unit='s')

Timestamp('1970-01-01 00:01:40')

In [8]:
# 20,000 days after epoch
pd.to_datetime(20000, unit='d')

Timestamp('2024-10-04 00:00:00')

# Why is `to_datetime` returning a `Timestamp` object?
It must look a bit odd to see a Timestamp object being returned from the **`to_datetime`** function. The docstrings for **`to_datetime`** even write:

> **Convert argument to datetime.**

Technically, the object is definitely a Pandas Timestamp object. We can verify this with the **`type`** function:

In [9]:
dt = pd.to_datetime(20000, unit='d')
type(dt)

pandas._libs.tslibs.timestamps.Timestamp

### Datetime is common throughout many languages
The term **datetime** is common throughout many programming languages and this is what the pandas documentation is referring to. The technical name of the object is indeed Timestamp, but the common name for what it represents is a datetime.

"datetime" is also the term used for the column data type in a Pandas DataFrame. The following reads in the employee dataset with two columns as **datetime**.

In [10]:
emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date', 'job_date'])
emp.dtypes

title                object
dept                 object
salary              float64
race                 object
gender               object
hire_date    datetime64[ns]
job_date     datetime64[ns]
dtype: object

### Each individual value in the datetime columns is a Timestamp
If we extract the **`HIRE_DATE`** column as a Series and print out the first few rows, you will see that data type (at the bottom of the output) is still written with the word **datetime**.

In [11]:
hire_date = emp['hire_date']
hire_date.head()

0   2006-06-12
1   2000-07-19
2   2015-02-03
3   1982-02-08
4   1989-06-19
Name: hire_date, dtype: datetime64[ns]

If we select the first value in the Series, we get a Timestamp.

In [12]:
hire_date.loc[0]

Timestamp('2006-06-12 00:00:00')

## datetime and Timestamp refer to the same thing
The terms **datetime** and **Timestamp** refer to the exact same concept in pandas. Technically, each value is a Pandas **`Timestamp`** object but the term **datetime** is used to refer to it as well. Yes, that is extremely confusing, but hopefully now it is clear.

### Timestamp attributes
These Timestamp objects have the all the same attributes and methods that we saw with the **`dt`** Series accessor in a previous notebook. Let's see some of these again.

In [13]:
dt = pd.to_datetime('Jan 3, 2019 20:45.56')

In [14]:
dt.day

3

In [15]:
dt.day_name()

'Thursday'

In [16]:
dt.minute

45

## A Timedelta - an amount of time
A timedelta is a specific amount of time such as 20 seconds, or 13 days 5 minutes and 10 seconds. Use the **`to_timedelta`** function to create a Timedelta object. It works analogously to the **`to_datetime`** function.

### Converting strings to a Timedelta with `to_timedelta`
A wide variety of strings are able to be converted to Timedeltas. [See the docs][1] for more info.

[1]: http://pandas.pydata.org/pandas-docs/stable/timedeltas.html#to-timedelta

In [17]:
pd.to_timedelta('5 days 03:12:45.123')

Timedelta('5 days 03:12:45.123000')

In [18]:
# 10 hours and 13 microseconds
pd.to_timedelta('10h 13ms')

Timedelta('0 days 10:00:00.013000')

### Converting numbers to Timedeltas with `to_timedelta`
As with **`to_datetime`**, passing a number to **`to_timedelta`** will be by default treated as the number of nanoseconds. Use the **`unit`** parameter to change the time unit.

In [19]:
# 123,000 nanoseconds
pd.to_timedelta(123000)

Timedelta('0 days 00:00:00.000123')

In [20]:
# 500 days
pd.to_timedelta(500, unit='d')

Timedelta('500 days 00:00:00')

Since years is not a standard amount, the highest unit returned is in days. You can still 'y' to represent years. The output will simply be converted to days.

In [21]:
# 23 years
pd.to_timedelta(23, unit='y')

Timedelta('8400 days 13:51:36')

In [22]:
# 10 hours
pd.to_timedelta(10, 'h')

Timedelta('0 days 10:00:00')

### No name confusion with Timedelta
The Timedelta data type is unique to pandas just like the Timestamp object is. Pandas Timedelta is built upon NumPy's timedelta64 data type which is superior to pure Python's timedelta. Forunately, the Pandas developers used the name **Timedelta** for the data type which is the same as NumPy's. 

There is no name confusion here, unlike there is with **Timestamp/Datetime**

In [23]:
td = pd.to_timedelta(3, 'y')
type(td)

pandas._libs.tslibs.timedeltas.Timedelta

### Timedelta attributes and methods
There are many attributes and methods available to Timedelta objects. Let's see some below:

In [24]:
td

Timedelta('1095 days 17:27:36')

In [25]:
td.days

1095

In [26]:
td.seconds

62856

In [27]:
td.components

Components(days=1095, hours=17, minutes=27, seconds=36, milliseconds=0, microseconds=0, nanoseconds=0)

## Creating Timedeltas by subtracting Datetimes
It is possible to create a Timedelta object by subtracting two Datetimes.

In [28]:
df1 = pd.to_datetime('2012-12-21 5:30')
df2 = pd.to_datetime('2016-1-1 12:45:12')

In [29]:
df1

Timestamp('2012-12-21 05:30:00')

In [30]:
df2

Timestamp('2016-01-01 12:45:12')

Subtraction:

In [31]:
df2 - df1

Timedelta('1106 days 07:15:12')

### Negative Timedeltas
A negative amount of time is possible just like any negative number is.

In [32]:
df1 - df2

Timedelta('-1107 days +16:44:48')

### Math with Timedeltas
You can do many different math operations with two Timedeltas together.

In [33]:
td1 = pd.to_timedelta('05:23:10')
td2 = pd.to_timedelta('00:02:20')

In [34]:
td1 - td2

Timedelta('0 days 05:20:50')

In [35]:
td2 + 5 * td2

Timedelta('0 days 00:14:00')

In [36]:
td1 / td2

138.5

## Creating Timedeltas in a DataFrame by subtracting two Datetime columns
The employee dataset has two datetime columns, **`HIRE_DATE`** and **`JOB_DATE`**. The **`HIRE_DATE`** is when the employee was first hired to work for the city of Houston. The job date is the first day of their latest position.

In [37]:
emp.head()

Unnamed: 0,title,dept,salary,race,gender,hire_date,job_date
0,ASSISTANT DIRECTOR (EX LVL),Municipal Courts Department,121862.0,Hispanic,Female,2006-06-12,2012-10-13
1,LIBRARY ASSISTANT,Library,26125.0,Hispanic,Female,2000-07-19,2010-09-18
2,POLICE OFFICER,Houston Police Department-HPD,45279.0,White,Male,2015-02-03,2015-02-03
3,ENGINEER/OPERATOR,Houston Fire Department (HFD),63166.0,White,Male,1982-02-08,1991-05-25
4,ELECTRICIAN,General Services Department,56347.0,White,Male,1989-06-19,1994-10-22


Let's find the amount of time that elapsed between the hire date and job date.

In [38]:
days_at_new_position = emp['job_date'] - emp['hire_date']
days_at_new_position.head()

0   2315 days
1   3713 days
2      0 days
3   3393 days
4   1951 days
dtype: timedelta64[ns]

Since both hire and job date are datetime columns, subtracting them resulted in a timedelta column. The maximum unit of time for timedelta is days.

# Exercises

## Problem 1
<span  style="color:green; font-size:16px">What day of the week was Jan 15, 1997?</span>

## Problem 2
<span  style="color:green; font-size:16px">Was 1925 a leap year?</span>

## Problem 3
<span  style="color:green; font-size:16px">What year will it be 1 million hours after the UNIX epoch?</span>

## Problem 4
<span  style="color:green; font-size:16px">Create the datetime July 20, 1969 at 2:56 a.m. and 15 seconds.</span>

## Problem 5
<span  style="color:green; font-size:16px">Neil Armstrong stepped on the moon at the time in the last problem. How many days have passed since that happened? Use the string 'today' when creating your datetime.</span>

## Problem 6
<span  style="color:green; font-size:16px">Which is larger - 35 days or 700 hours?</span>

## Problem 7
<span  style="color:green; font-size:16px">In a previous notebook, we were told that the employee data was retrieved on Dec 1, 2016. We used the simple calculation `2016 - emp['hire_date'].dt.year` to determine the years of experience. Can you improve upon this method to get the exact amount of years of experience and assign as a new column named `experience`?</span>