# Chapter 11 - Time Series

Time series data is an important form of structured data in many different fields, such
as finance, economics, ecology, neuroscience, and physics. Anything that is observed
or measured at many points in time forms a time series. Many time series are fixed
frequency, which is to say that data points occur at regular intervals according to some
rule, such as every 15 seconds, every 5 minutes, or once per month. Time series can
also be irregular without a fixed unit of time or offset between units. How you mark
and refer to time series data depends on the application, and you may have one of the
following:

    • Timestamps, specific instants in time
    • Fixed periods, such as the month January 2007 or the full year 2010
    • Intervals of time, indicated by a start and end timestamp. Periods can be thought
    of as special cases of intervals
    • Experiment or elapsed time; each timestamp is a measure of time relative to a
    particular start time (e.g., the diameter of a cookie baking each second since
    being placed in the oven)

In this chapter, I am mainly concerned with time series in the first three categories,
though many of the techniques can be applied to experimental time series where the
index may be an integer or floating-point number indicating elapsed time from the
start of the experiment. The simplest and most widely used kind of time series are
those indexed by timestamp.

pandas provides many built-in time series tools and data algorithms. You can efficiently work with very large time series and easily slice and dice, aggregate, and
resample irregular- and fixed-frequency time series. Some of these tools are especially
useful for financial and economics applications, but you could certainly use them to
analyze server log data, too.

## 11.1 Date and Time Data Types and Tools

The Python standard library includes data types for date and time data, as well as
calendar-related functionality. The datetime, time, and calendar modules are the
main places to start. The datetime.datetime type, or simply datetime, is widely
used:

In [1]:
from datetime import datetime

In [2]:
now = datetime.now()
now

datetime.datetime(2020, 8, 18, 11, 10, 34, 438170)

In [3]:
now.year, now.month, now.day

(2020, 8, 18)

datetime stores both the date and time down to the microsecond. timedelta represents the temporal difference between two datetime objects:

In [4]:
delta = datetime(2011,1,7) - datetime(2008,6,24,8,15)
delta

datetime.timedelta(days=926, seconds=56700)

In [5]:
delta.days

926

In [6]:
delta.seconds

56700

You can add (or subtract) a timedelta or multiple thereof to a datetime object to
yield a new shifted object:

In [7]:
from datetime import timedelta

In [8]:
start = datetime(2011,1,7)
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [9]:
start - 2*timedelta(12)

datetime.datetime(2010, 12, 14, 0, 0)

Table 11-1 summarizes the data types in the datetime module. While this chapter is
mainly concerned with the data types in pandas and higher-level time series manipulation, you may encounter the datetime-based types in many other places in Python
in the wild.

![](datetime_module.jpg)

### Converting Between String and Datetime

You can format datetime objects and pandas Timestamp objects, which I’ll introduce
later, as strings using str or the strftime method, passing a format specification:

In [11]:
stamp = datetime(2011,1,3)
str(stamp)

'2011-01-03 00:00:00'

In [12]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

See Table 11-2 for a complete list of the format codes.

![](datetime_format.jpg)
![](datetime_format2.jpg)

You can use these same format codes to convert strings to dates using date
time.strptime:

In [15]:
value = '2011-01-03'

In [16]:
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [17]:
datestr = ['7/6/2011', '8/6/2011']

In [18]:
[datetime.strptime(x, '%m/%d/%Y') for x in datestr]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

datetime.strptime is a good way to parse a date with a known format. However, it
can be a bit annoying to have to write a format spec each time, especially for common
date formats. In this case, you can use the parser.parse method in the third-party
dateutil package (this is installed automatically when you install pandas):

In [19]:
from dateutil.parser import parse

In [20]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

dateutil is capable of parsing most human-intelligible date representations:

In [21]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In international locales, day appearing before month is very common, so you can pass
dayfirst=True to indicate this:

In [22]:
parse('6/12/2011', dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

pandas is generally oriented toward working with arrays of dates, whether used as an
axis index or a column in a DataFrame. The to_datetime method parses many different kinds of date representations. Standard date formats like ISO 8601 can be
parsed very quickly:

In [25]:
import pandas as pd

In [26]:
datestr = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

In [27]:
pd.to_datetime(datestr)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

It also handles values that should be considered missing (None, empty string, etc.):

In [28]:
idx = pd.to_datetime(datestr + [None])
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [29]:
idx[2]

NaT

In [30]:
pd.isnull(idx)

array([False, False,  True])

NaT (Not a Time) is pandas’s null value for timestamp data.

datetime objects also have a number of locale-specific formatting options for systems
in other countries or languages. For example, the abbreviated month names will be
different on German or French systems compared with English systems. See
Table 11-3 for a listing.

![](date_format.jpg)

## 11.2 Time Series Basics

A basic kind of time series object in pandas is a Series indexed by timestamps, which
is often represented external to pandas as Python strings or datetime objects:

In [33]:
import numpy as np

In [31]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7), datetime(2011, 1, 8),
        datetime(2011, 1, 10), datetime(2011, 1, 12)]
dates

[datetime.datetime(2011, 1, 2, 0, 0),
 datetime.datetime(2011, 1, 5, 0, 0),
 datetime.datetime(2011, 1, 7, 0, 0),
 datetime.datetime(2011, 1, 8, 0, 0),
 datetime.datetime(2011, 1, 10, 0, 0),
 datetime.datetime(2011, 1, 12, 0, 0)]

In [34]:
ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02    0.112629
2011-01-05    1.651245
2011-01-07    1.580482
2011-01-08    1.385119
2011-01-10    0.729385
2011-01-12    1.199008
dtype: float64

Under the hood, these datetime objects have been put in a DatetimeIndex:

In [35]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

Like other Series, arithmetic operations between differently indexed time series auto‐
matically align on the dates:

In [36]:
ts + ts[::2]

2011-01-02    0.225258
2011-01-05         NaN
2011-01-07    3.160964
2011-01-08         NaN
2011-01-10    1.458771
2011-01-12         NaN
dtype: float64

Recall that ts[::2] selects every second element in ts.

pandas stores timestamps using NumPy’s datetime64 data type at the nanosecond
resolution:

In [37]:
ts.index.dtype

dtype('<M8[ns]')

Scalar values from a DatetimeIndex are pandas Timestamp objects:

In [38]:
stamp = ts.index[0]
stamp

Timestamp('2011-01-02 00:00:00')

A Timestamp can be substituted anywhere you would use a datetime object. Addi‐
tionally, it can store frequency information (if any) and understands how to do time
zone conversions and other kinds of manipulations. More on both of these things
later.

### Indexing, Selection, Subsetting

Time series behaves like any other pandas.Series when you are indexing and select‐
ing data based on label:

In [39]:
stamp = ts.index[2]

In [40]:
ts[stamp]

1.580482030190967

As a convenience, you can also pass a string that is interpretable as a date:

In [41]:
ts['1/10/2011']

0.7293852650818828

In [42]:
ts['20110110']

0.7293852650818828

For longer time series, a year or only a year and month can be passed to easily select
slices of data:

In [43]:
longer_ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
longer_ts.head()

2000-01-01    0.147492
2000-01-02   -2.982264
2000-01-03    0.638768
2000-01-04   -0.968138
2000-01-05    0.185274
Freq: D, dtype: float64

In [44]:
longer_ts['2001']

2001-01-01   -0.619026
2001-01-02    0.443069
2001-01-03   -0.188086
2001-01-04   -0.106184
2001-01-05    0.988790
2001-01-06    0.614062
2001-01-07    1.115592
2001-01-08    0.932176
2001-01-09   -1.235637
2001-01-10   -0.025724
2001-01-11   -0.068994
2001-01-12   -0.245707
2001-01-13    0.346583
2001-01-14    0.905761
2001-01-15   -0.135465
2001-01-16    1.081789
2001-01-17    1.067851
2001-01-18   -0.223208
2001-01-19    0.708369
2001-01-20   -0.810706
2001-01-21    1.332628
2001-01-22    1.476509
2001-01-23    0.922133
2001-01-24   -1.585332
2001-01-25    1.487634
2001-01-26   -0.575635
2001-01-27    0.582061
2001-01-28   -2.076041
2001-01-29   -1.724037
2001-01-30    0.488924
                ...   
2001-12-02   -1.078059
2001-12-03   -0.347959
2001-12-04   -0.373720
2001-12-05    0.384154
2001-12-06   -0.042122
2001-12-07    0.827835
2001-12-08   -0.413076
2001-12-09   -1.694091
2001-12-10    2.899343
2001-12-11   -0.022732
2001-12-12    0.222941
2001-12-13   -1.188600
2001-12-14 

Here, the string '2001' is interpreted as a year and selects that time period. This also
works if you specify the month:

In [45]:
longer_ts['2001-05']

2001-05-01    0.550777
2001-05-02   -0.775751
2001-05-03    0.866286
2001-05-04    0.068707
2001-05-05   -1.089961
2001-05-06    0.737064
2001-05-07    0.038989
2001-05-08    0.160904
2001-05-09    0.402760
2001-05-10   -1.129192
2001-05-11   -1.806964
2001-05-12    0.016255
2001-05-13   -1.175877
2001-05-14   -0.388821
2001-05-15    2.092732
2001-05-16   -0.681957
2001-05-17    0.708845
2001-05-18   -1.561575
2001-05-19    0.160577
2001-05-20    0.583594
2001-05-21   -0.147880
2001-05-22   -0.858209
2001-05-23   -0.858806
2001-05-24    0.088548
2001-05-25   -0.165873
2001-05-26   -1.590058
2001-05-27    1.584073
2001-05-28   -0.603733
2001-05-29    1.433337
2001-05-30   -0.736589
2001-05-31    0.615076
Freq: D, dtype: float64

Slicing with datetime objects works as well:

In [46]:
ts[datetime(2011,1,7):]

2011-01-07    1.580482
2011-01-08    1.385119
2011-01-10    0.729385
2011-01-12    1.199008
dtype: float64

Because most time series data is ordered chronologically, you can slice with time‐
stamps not contained in a time series to perform a range query:

In [47]:
ts

2011-01-02    0.112629
2011-01-05    1.651245
2011-01-07    1.580482
2011-01-08    1.385119
2011-01-10    0.729385
2011-01-12    1.199008
dtype: float64

In [48]:
ts['1/6/2011':'1/11/2011']

2011-01-07    1.580482
2011-01-08    1.385119
2011-01-10    0.729385
dtype: float64

As before, you can pass either a string date, datetime, or timestamp. Remember that
slicing in this manner produces views on the source time series like slicing NumPy
arrays. This means that no data is copied and modifications on the slice will be reflec‐
ted in the original data.

There is an equivalent instance method, truncate, that slices a Series between two
dates:

In [49]:
ts.truncate(after='1/9/2011')

2011-01-02    0.112629
2011-01-05    1.651245
2011-01-07    1.580482
2011-01-08    1.385119
dtype: float64

All of this holds true for DataFrame as well, indexing on its rows:

In [51]:
dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')

In [53]:
long_df = pd.DataFrame(np.random.randn(100,4), index=dates, columns=['Colorado', 'Texas','New York', 'Ohio'])
long_df.head()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.364139,-0.073836,-0.918964,0.370698
2000-01-12,0.241748,-0.579006,0.674587,0.929504
2000-01-19,-1.327692,0.43065,1.77628,-1.130488
2000-01-26,1.371841,-0.812181,0.317693,-1.527935
2000-02-02,0.120732,-0.603248,-0.010529,0.300139


In [54]:
long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,1.73593,0.777249,-1.170007,-0.418344
2001-05-09,-0.240786,0.202539,0.572415,-1.189443
2001-05-16,0.168579,0.741726,-1.215811,-0.362379
2001-05-23,-0.11731,0.56397,-0.680075,-1.269951
2001-05-30,-0.889851,1.981695,0.20604,0.087757


### Time Series with Duplicate Indices

In some applications, there may be multiple data observations falling on a particular
timestamp. Here is an example:

In [57]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000','1/2/2000', '1/3/2000'])
type(dates)

pandas.core.indexes.datetimes.DatetimeIndex

In [58]:
dup_ts = pd.Series(np.arange(5), index=dates)
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

We can tell that the index is not unique by checking its is_unique property:

In [59]:
dup_ts.index.is_unique

False

Indexing into this time series will now either produce scalar values or slices depending on whether a timestamp is duplicated:

In [60]:
dup_ts['1/3/2000'] # Not duplicated.

4

In [61]:
dup_ts['1/2/2000'] # Duplicated.

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

Suppose you wanted to aggregate the data having non-unique timestamps. One way
to do this is to use groupby and pass level=0:

In [62]:
grouped = dup_ts.groupby(level=0)
grouped

<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000017BD25D4B70>

In [63]:
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

In [64]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

## 11.3 Date Ranges, Frequencies, and Shifting

Generic time series in pandas are assumed to be irregular; that is, they have no fixed
frequency. For many applications this is sufficient. However, it’s often desirable to
work relative to a fixed frequency, such as daily, monthly, or every 15 minutes, even if
that means introducing missing values into a time series. Fortunately pandas has a
full suite of standard time series frequencies and tools for resampling, inferring fre‐
quencies, and generating fixed-frequency date ranges. For example, you can convert
the sample time series to be fixed daily frequency by calling resample:

In [65]:
ts

2011-01-02    0.112629
2011-01-05    1.651245
2011-01-07    1.580482
2011-01-08    1.385119
2011-01-10    0.729385
2011-01-12    1.199008
dtype: float64

In [66]:
resampler = ts.resample('D')

The string 'D' is interpreted as daily frequency.

### Generating Date Ranges

While I used it previously without explanation, pandas.date_range is responsible for
generating a DatetimeIndex with an indicated length according to a particular
frequency:

In [68]:
index = pd.date_range('2012-04-01', '2012-06-01')
type(index)

pandas.core.indexes.datetimes.DatetimeIndex

By default, date_range generates daily timestamps. If you pass only a start or end
date, you must pass a number of periods to generate:

In [69]:
pd.date_range(start='2012-04-01', periods=20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

In [70]:
pd.date_range(end='2012-06-01', periods=20)

DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16',
               '2012-05-17', '2012-05-18', '2012-05-19', '2012-05-20',
               '2012-05-21', '2012-05-22', '2012-05-23', '2012-05-24',
               '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28',
               '2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01'],
              dtype='datetime64[ns]', freq='D')

The start and end dates define strict boundaries for the generated date index. For
example, if you wanted a date index containing the last business day of each month,
you would pass the 'BM' frequency (business end of month; see more complete listing
of frequencies in Table 11-4) and only dates falling on or inside the date interval will
be included:

In [71]:
pd.date_range('2000-01-01', '2000-12-01', freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

![](timeseries_freq.jpg)

date_range by default preserves the time (if any) of the start or end timestamp:

In [72]:
pd.date_range('2012-05-02 12:56:31', periods=5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

Sometimes you will have start or end dates with time information but want to gener‐
ate a set of timestamps normalized to midnight as a convention. To do this, there is a
normalize option:

In [73]:
pd.date_range('2012-05-02 12:56:31', periods=5, normalize=True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

### Frequencies and Date Offsets

Frequencies in pandas are composed of a base frequency and a multiplier. Base fre‐
quencies are typically referred to by a string alias, like 'M' for monthly or 'H' for
hourly. For each base frequency, there is an object defined generally referred to as a
date offset. For example, hourly frequency can be represented with the Hour class:

#### Week of month dates

One useful frequency class is “week of month,” starting with WOM. This enables you to
get dates like the third Friday of each month:

### Shifting (Leading and Lagging) Data

“Shifting” refers to moving data backward and forward through time. Both Series and
DataFrame have a shift method for doing naive shifts forward or backward, leaving
the index unmodified: