# Datetime Series Methods

## Overview

In this chapter we will focus on methods that work for Series that contain datetime data. Just like pandas has the `str` accessor to give us access to string-only methods, it also has the `dt` accessor to give us access to datetime-only methods. Let's read in the bikes dataset which has two datetime columns, `starttime` and `stoptime`.

In [None]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head()

### pandas datetime columns always have nanosecond precision
pandas forces all datetime columns to have nanosecond precision. It relies on numpy's datetime64 data type as the foundation for this data type. numpy does allow you to have different ranges of precision such as microsecond or millisecond, but pandas enforces nanosecond precision. Let's take a look at the data types of each column with the `dtypes` attribute to verify that we have two datetime columns.

In [None]:
bikes.dtypes

## The `dt` accessor
The primary focus of this chapter will be the methods that follow the `dt` accessor. [Visit the API][1] to view all the possible datetime attributes and methods that are available.

### Embed API in the notebook
Instead of visiting the API, you can embed the page directly in this notebook as an **iframe**, which is a web page embedded inside of another web page. We do this with the help of the IPython display module.

[1]: http://pandas.pydata.org/pandas-docs/stable/reference/series.html#api-series-dt

In [None]:
from IPython.display import IFrame
url = 'http://pandas.pydata.org/pandas-docs/stable/reference/series.html#api-series-dt'
IFrame(url, 900, 400)

### Only available for Series
The `dt` and `str` accessors are only available to Series objects and not DataFrames. You will have to select a single Series first in order to use them. Let's begin by selecting the `starttime` column as a Series.

In [None]:
start = bikes['starttime']

### Datetime attributes and methods are simpler than strings
Almost all the attributes and methods available to datetime Series are simple and straightforward. Let's begin by outputting the head of the Series so that we can visually verify the results of the attributes and methods.

In [None]:
start.head()

There are many attributes that return a particular part of the datetime such as `year`, `month`, `day`, `hour`, `minute`, `second`, etc...

In [None]:
start.dt.year.head()

In [None]:
start.dt.month.head()

In [None]:
start.dt.minute.head()

In [None]:
# monday is 0
start.dt.dayofweek.head()

In [None]:
# week of year
start.dt.week.head()

## Datetime methods
There are only a few methods that are available to the `dt` accessor with the most useful being `ceil`, `round`, `floor`, `strftime`, and `to_period`. To use these methods you will need to be familiar with the [offset aliases][1], which are short strings, usually one character, that represent a unit of time. Below are a few of the offset aliases.

- `D` - day
- `H` - hour
- `T` or `min` - minute
- `S` - second

### Display the offset aliases in the notebook

Let's display all of the offset aliases directly in the notebook.

[1]: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

In [None]:
url = 'http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases'
IFrame(url, 900, 400)

### Use offset aliases with datetime methods

In [None]:
start.head()

### `ceil` rounds up to the nearest unit

Round up to nearest hour.

In [None]:
start.dt.ceil('H').head()

Round up to nearest day.

In [None]:
start.dt.ceil('D').head()

### `floor` rounds down to the nearest unit

Round down to nearest minute.

In [None]:
start.dt.floor('min').head()

### `round` rounds to nearest whole unit
Round to nearest hour.

In [None]:
start.dt.round('H').head()

## Format time as a string with `strftime`
The `strftime` method stands for **string format time**. It converts each datetime value into a string object. You will use something called **string directives** to convert a part of a datetime to a string. For instance, '%A' will convert to the weekday. Consult [Python's documentation][1] to view all of the string directives. Below is an example using multiple string directives to form a complex string from a datetime. You can write any other string intertwined with the directives. 

By default, the maximum column width is defaulted to 60 characters. The `set_option` function is used to increase this width so that the entire value is viewable in the output.

[1]: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

In [None]:
pd.set_option('display.max_colwidth', 100)
start.dt.strftime('On %A, %B %d, %Y at %X something great happened').head()

## Convert to period
A period is a special data type unique to pandas (they don't exist in numpy) and represent an entire period of time such as the entire month of June, 2012 or the entire year 1998, or the entire minute of June 11, 2011 12:34 p.m. This contrasts with datetimes which represent a single moment in time with nanosecond precision. Datetimes are always specific all the way down to a nanosecond, while a period refers to a time period.

### Use offset aliases to convert to a period
To convert to a period use the same [offset aliases](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) from above. Let's convert the start datetime column to a period column representing an entire month.

In [None]:
per = start.dt.to_period('M').head()
per

Let's verify that the data type of this Series is indeed a period.

In [None]:
per.dtype

Let's see another example with a different offset alias converting the datetime to a time period of an hour.

In [None]:
start.dt.to_period('h').head()

### Period Series also have a `dt` accessor
A Series with data type of period has its own special attributes and methods accessible with the `dt` accessor. They overlap substantially with the datetime `dt` attributes and methods. Currently the [official documentation only shows the period properties][1]. You can discover all of the attributes and methods and how to use all of the methods by placing a dot after the `dt` and pressing tab. Below, we get the start and end of the period. Note that pandas returns these values as datetimes and not periods.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#period-properties

In [None]:
per.dt.start_time

In [None]:
per.dt.end_time

## Timedeltas
Timedeltas are a separate data type that represent an amount of time such as 5 minutes and 34 seconds. The highest unit of a timedelta is days and they always have nanosecond precision. Timedeltas are also available in numpy. Timedelta Series have special attributes and methods accessible with the `dt` accessor as you can [find in the documentation][1].

### Creating a Timedelta
One way to create a timedelta is to subtract two datetime Series from each other. Here, we select `stoptime` as a Series and subtract the `start` Series from it.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#timedelta-properties

In [None]:
stop = bikes['stoptime']
ride_length = stop - start
ride_length.head()

Again, the best way to discover and learn about the special attributes and methods is by pressing tab after placing a dot after `dt`. Let's begin by converting each of the timedeltas into seconds.

In [None]:
ride_length.dt.seconds.head()

Numbers may be placed next to offset aliases to designate a more specific amount of time. Below, we round to the nearest 10 minutes.

In [None]:
ride_length.dt.round('10min').head()

## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">What percentage of bike rides happen in January?</span>

### Exercise 2
<span  style="color:green; font-size:16px">What percentage of bike rides happen on the weekend?</span>

### Exercise 3
<span  style="color:green; font-size:16px">What percentage of bike rides happen on the last day of the month?</span>

### Exercise 4
<span  style="color:green; font-size:16px">We would expect that the value of the minutes recorded for each starting ride is approximately random. Can you show some data that confirms or rejects this?</span>

### Exercise 5
<span  style="color:green; font-size:16px">Assign the length of the ride to `ride_length`. Then find the percentage of rides that lasted longer than 30 minutes.</span>