In [1]:
import pandas as pd
import numpy as np

# Shifting (Leading and Lagging) Data
**Shifting** refers to moving data backward and forward through time.  
  
Both Series and DataFrame have ***shift*** method for doing naive shifts forward or backward, leaving the index unmodified.  

  
  
  
There are two kinds of date and time objects: “naive” and “aware”.  
  
An aware object has sufficient knowledge of applicable algorithmic and political time adjustments, such as time zone and daylight saving time information, to locate itself relative to other aware objects. An aware object is used to represent a specific moment in time that is not open to interpretation.

A naive object does not contain enough information to unambiguously locate itself relative to other date/time objects. Whether a naive object represents **Coordinated Universal Time (UTC)**, local time, or time in some other timezone is purely up to the program, just like it is up to the program whether a particular number represents metres, miles, or mass. Naive objects are easy to understand and to work with, at the cost of ignoring some aspects of reality.  

In [2]:
# MonthEnd frequency
ts = pd.Series(np.arange(4), 
               index=pd.date_range('4/20/2020', periods=4, freq='M'))
ts

2020-04-30    0
2020-05-31    1
2020-06-30    2
2020-07-31    3
Freq: M, dtype: int64

Missing data is introduced either at the start or the end of the time series when we shift like below:

In [3]:
# Shift the data forward two periods
ts.shift(2)

2020-04-30    NaN
2020-05-31    NaN
2020-06-30    0.0
2020-07-31    1.0
Freq: M, dtype: float64

In [4]:
ts.shift(-2)

2020-04-30    2.0
2020-05-31    3.0
2020-06-30    NaN
2020-07-31    NaN
Freq: M, dtype: float64

Calculate **percent changes** in a time series:

In [5]:
ts

2020-04-30    0
2020-05-31    1
2020-06-30    2
2020-07-31    3
Freq: M, dtype: int64

In [6]:
(ts - ts.shift(1)) / ts

2020-04-30         NaN
2020-05-31    1.000000
2020-06-30    0.500000
2020-07-31    0.333333
Freq: M, dtype: float64

In [7]:
ts

2020-04-30    0
2020-05-31    1
2020-06-30    2
2020-07-31    3
Freq: M, dtype: int64

Shift to advance the timestamp:

In [8]:
# Shift the timestamp forward by two months:
ts.shift(2, freq='M')

2020-06-30    0
2020-07-31    1
2020-08-31    2
2020-09-30    3
Freq: M, dtype: int64

In [9]:
ts.shift(3, freq='D')

2020-05-03    0
2020-06-03    1
2020-07-03    2
2020-08-03    3
dtype: int64

In [10]:
ts.shift(1, freq='90T') # 90 minutes

2020-04-30 01:30:00    0
2020-05-31 01:30:00    1
2020-06-30 01:30:00    2
2020-07-31 01:30:00    3
Freq: M, dtype: int64

## Shifting dates with offsets
The pandas date **offsets** can also be used with ***datetime*** or ***Timestamp*** objects:

In [11]:
from pandas.tseries.offsets import Day, MonthEnd
from datetime import datetime

In [12]:
now = datetime.now()
now

datetime.datetime(2020, 5, 9, 12, 18, 40, 456068)

In [13]:
now + 3 * Day()

Timestamp('2020-05-12 12:18:40.456068')

In [14]:
#MonthEnd is known as anchored offset
now + MonthEnd()

Timestamp('2020-05-31 12:18:40.456068')

In [15]:
now + MonthEnd(2)

Timestamp('2020-06-30 12:18:40.456068')

Anchored offsets can explicitly "roll" dates forward or backward by simply using their ***rollforward*** and ***rollback*** methods:

In [16]:
offset = MonthEnd()
type(offset)

pandas.tseries.offsets.MonthEnd

In [17]:
# Roll forward to month end
offset.rollforward(now)

Timestamp('2020-05-31 12:18:40.456068')

In [18]:
# Roll back to month end
offset.rollback(now)

Timestamp('2020-04-30 12:18:40.456068')

Aggregate with ***groupby***

In [19]:
ts = pd.Series(np.random.randn(20),
              index=pd.date_range('04/20/2020', periods=20, freq='4d'))
ts

2020-04-20   -0.416804
2020-04-24    0.483630
2020-04-28   -0.520229
2020-05-02   -1.526423
2020-05-06    0.905483
2020-05-10    1.782720
2020-05-14   -0.392640
2020-05-18    1.316044
2020-05-22    0.931504
2020-05-26    0.859547
2020-05-30   -0.366964
2020-06-03   -0.897200
2020-06-07   -0.813520
2020-06-11   -0.947003
2020-06-15    1.688285
2020-06-19   -0.505076
2020-06-23    0.274016
2020-06-27    0.699636
2020-07-01   -0.280650
2020-07-05   -0.211491
Freq: 4D, dtype: float64

In [20]:
ts.groupby(offset.rollforward).mean()

2020-04-30   -0.151134
2020-05-31    0.438659
2020-06-30   -0.071552
2020-07-31   -0.246070
dtype: float64