## Pandas Tutorial 19: Time Series Analysis: Period and PeriodIndex
In this tutorial, we continue exploring Pandas time series analysis by introducing `Period` and `PeriodIndex`. Periods represent specific time durations, widely used in financial analysis. Pandas offers robust support for period arithmetic, allowing you to create quarterly, yearly, and other time periods, and perform various operations on them. You can use `period_range` to generate a `PeriodIndex` between a defined start and end.

#### Topics covered:
* **What is a timestamp?**
* **What is a timespan?**
* **Using the `period()` function**
* **Arithmetic operations on period objects**
* **Creating quarterly time periods**
* **Using the `asfreq()` function**
* **List of all frequencies**
* **Creating a `PeriodIndex`**
* **Converting a `PeriodIndex` to a `DatetimeIndex`**
* **Converting a `DatetimeIndex` to a `PeriodIndex`**
* **Using `set_index()` with periods**
* **Converting an index to `PeriodIndex` using `PeriodIndex()`**
* **Creating new columns in a DataFrame**

This tutorial will guide you through working with time periods and period indexing in Pandas for advanced time series analysis.

In [2]:
import pandas as pd
import numpy as np

## Yearly Period

In Pandas, a `Period` object represents a span of time, such as a year, month, or quarter. 

Here, we create a yearly period for 2016 and explore some of its properties - `start_time`, `end_time`, and `is_leap_year`

In [3]:
# Creates a yearly period for 2016
y = pd.Period('2016')
y

Period('2016', 'A-DEC')

In [4]:
y.start_time

Timestamp('2016-01-01 00:00:00')

In [5]:
y.end_time

Timestamp('2016-12-31 23:59:59.999999999')

In [6]:
y.is_leap_year

True

## Monthly Period

A `Period` object can also represent smaller spans of time, such as months. 

Here, we create a monthly period for December 2017 and examine its start and end times. We also demonstrate how to perform arithmetic operations with periods, such as shifting to the next month.

In [7]:
# Creates a monthly period for December 2017
m = pd.Period('2017-12')
m

Period('2017-12', 'M')

In [8]:
m.start_time

Timestamp('2017-12-01 00:00:00')

In [9]:
m.end_time

Timestamp('2017-12-31 23:59:59.999999999')

**December 2017 → January 2018**

In [10]:
# Shifts the period forward by one month (January 2018)
m+1

Period('2018-01', 'M')

## Daily Period

`Period` objects can also be at the daily level as well. Here, we create a daily period for February 28, 2016, and explore its start and end times. 

Additionally, we show how to perform arithmetic operations with periods by shifting the period to the next day.

In [11]:
# Creates a daily period for February 28, 2016
d = pd.Period('2016-02-28', freq='D')
d

Period('2016-02-28', 'D')

In [12]:
d.start_time

Timestamp('2016-02-28 00:00:00')

In [13]:
d.end_time

Timestamp('2016-02-28 23:59:59.999999999')

**February 28 → February 29**

In [14]:
# Shifts the period forward by one day 
d+1

Period('2016-02-29', 'D')

## Hourly Period

In addition to daily and monthly periods, Pandas allows you to work with periods at the hourly level. 

Here, we create an hourly period for 11:00 PM on August 15, 2017, and demonstrate how to shift it by one hour. We also show how to achieve the same result  using `pandas` offsets.

In [15]:
# Creates an hourly period for 11 PM, August 15, 2017
h = pd.Period('2017-08-15 23:00:00', freq='H')
h

Period('2017-08-15 23:00', 'H')

In [16]:
# Shifts the period by one hour (Midnight, August 16, 2017)
h+1

Period('2017-08-16 00:00', 'H')

**Achieve same results using pandas offsets hour**

You can also use Pandas offsets to shift periods. Here's how to add one hour using the `pd.offsets.Hour()` method.

In [17]:
# Shifts the period forward by one hour
h+pd.offsets.Hour(1)

Period('2017-08-16 00:00', 'H')

## Quarterly Period

Here, we create a quarterly period representing the first quarter of 2017, with the quarter ending in January (denoted by `freq='Q-JAN'`). We then explore the start and end times of this quarter and use the `asfreq()` method to convert the quarterly period to a monthly frequency.

In [18]:
# Creates a quarterly period of Q1 2017 with the quarter ending in January
q1 = pd.Period('2017Q1', freq='Q-JAN')
q1

Period('2017Q1', 'Q-JAN')

In [19]:
q1.start_time

Timestamp('2016-02-01 00:00:00')

In [20]:
q1.end_time

Timestamp('2016-04-30 23:59:59.999999999')

**Use asfreq to convert period to a different frequency**

The `asfreq()` method allows you to convert the period to a different frequency. 

Here, we convert the quarterly period to a monthly frequency and specify whether to return the start or end of the month.

In [21]:
# Converts to the first month in the quarter (February 2016)
q1.asfreq('M', how='start')

Period('2016-02', 'M')

In [50]:
# Converts to the last month in the qaurter (April 2016)
q1.asfreq('M', how='end')

Period('2016-04', 'M')

## Weekly Period

Here, we create a weekly period for July 5, 2017, and demonstrate how to perform period arithmetic by subtracting weeks. We also calculate the difference between two weekly periods to see how many weeks apart they are.

In [23]:
# Creates a weekly period starting on July 5, 2017
w = pd.Period('2017-07-05', freq='W')
w

Period('2017-07-03/2017-07-09', 'W-SUN')

In [24]:
# Subtracts one week from the period (returns the previous week)
w-1

Period('2017-06-26/2017-07-02', 'W-SUN')

You can create another weekly period and find the difference between the two periods:

In [25]:
# Creates another weekly period starting on August 15, 2017
w2 = pd.Period('2017-08-15', freq='W')
w2

Period('2017-08-14/2017-08-20', 'W-SUN')

In [26]:
# Calculates the difference between the two periods in weeks
w2-w

<6 * Weeks: weekday=6>

## PeriodIndex and `period_range`

Pandas provides powerful tools for generating a range of periods using `period_range()`. Here, we create a range of quarterly periods between 2011 and 2017 and examine the start and end times of the first period.

In [27]:
# Creates a range of quarterly periods from 2011 to 2017
r = pd.period_range('2011', '2017', freq='q')
r

PeriodIndex(['2011Q1', '2011Q2', '2011Q3', '2011Q4', '2012Q1', '2012Q2',
             '2012Q3', '2012Q4', '2013Q1', '2013Q2', '2013Q3', '2013Q4',
             '2014Q1', '2014Q2', '2014Q3', '2014Q4', '2015Q1', '2015Q2',
             '2015Q3', '2015Q4', '2016Q1', '2016Q2', '2016Q3', '2016Q4',
             '2017Q1'],
            dtype='period[Q-DEC]')

In [28]:
# Start of 1st period: January 1, 2011
r[0].start_time

Timestamp('2011-01-01 00:00:00')

In [29]:
# End of 1st period: March 31, 2011
r[0].end_time

Timestamp('2011-03-31 23:59:59.999999999')

**Walmart's fiscal year ends in January. Below is how you generate Walmart's fiscal quarters between 2011 and 2017:**

In [30]:
# Creates quarterly periods aligned with Walmart's fiscal year
r = pd.period_range('2011', '2017', freq='q-jan')
r

PeriodIndex(['2011Q4', '2012Q1', '2012Q2', '2012Q3', '2012Q4', '2013Q1',
             '2013Q2', '2013Q3', '2013Q4', '2014Q1', '2014Q2', '2014Q3',
             '2014Q4', '2015Q1', '2015Q2', '2015Q3', '2015Q4', '2016Q1',
             '2016Q2', '2016Q3', '2016Q4', '2017Q1', '2017Q2', '2017Q3',
             '2017Q4'],
            dtype='period[Q-JAN]')

You can also generate a `PeriodIndex` with a custom frequency and a specific number of periods:

In [33]:
# Creates a PeriodIndex with 3-month intervals starting from January 2016 
r = pd.period_range(start='2016-01', freq='3M', periods=10)
r

PeriodIndex(['2016-01', '2016-04', '2016-07', '2016-10', '2017-01', '2017-04',
             '2017-07', '2017-10', '2018-01', '2018-04'],
            dtype='period[3M]')

### Creating a PeriodIndex for Series

You can create a `PeriodIndex` and use it as the index for a Pandas Series. Here's how to generate quarterly periods for Walmart's fiscal year and create a random Series:

In [34]:
# Generates 10 quarterly periods aligned with Walmart's fiscal year
idx = pd.period_range('2011', periods=10, freq='Q-JAN')
idx

PeriodIndex(['2011Q4', '2012Q1', '2012Q2', '2012Q3', '2012Q4', '2013Q1',
             '2013Q2', '2013Q3', '2013Q4', '2014Q1'],
            dtype='period[Q-JAN]')

In [35]:
# Creates a Series with random values and the PeriodIndex as the index
ps = pd.Series(np.random.randn(len(idx)), idx)
ps

2011Q4   -2.021685
2012Q1    0.114742
2012Q2    0.846414
2012Q3   -0.739876
2012Q4   -0.057268
2013Q1   -0.416103
2013Q2    0.420937
2013Q3   -0.085259
2013Q4   -0.697850
2014Q1   -1.250622
Freq: Q-JAN, dtype: float64

In [36]:
# Displays the PeriodIndex of the Series
ps.index

PeriodIndex(['2011Q4', '2012Q1', '2012Q2', '2012Q3', '2012Q4', '2013Q1',
             '2013Q2', '2013Q3', '2013Q4', '2014Q1'],
            dtype='period[Q-JAN]')

## Partial Indexing

In time series analysis, Pandas allows for **partial indexing**, where you can select data by specifying a year or a range of years. This makes it easy to retrieve subsets of data based on time.

**Key Features:**
* **Partial indexing:** Select data by just providing a year, and Pandas will automatically return the relevant period.
* **Range slicing:** You can specify a range of years to retrieve data across multiple periods.

In [37]:
# Retrieves all data for the year 2012 from the Series
ps['2012']

2012Q4   -0.057268
2013Q1   -0.416103
2013Q2    0.420937
2013Q3   -0.085259
2013Q4   -0.697850
Freq: Q-JAN, dtype: float64

In [38]:
# Retrieves data between the years 2011 and 2013 (inclusive)
ps['2011':'2013']

2011Q4   -2.021685
2012Q1    0.114742
2012Q2    0.846414
2012Q3   -0.739876
2012Q4   -0.057268
2013Q1   -0.416103
2013Q2    0.420937
2013Q3   -0.085259
2013Q4   -0.697850
2014Q1   -1.250622
Freq: Q-JAN, dtype: float64

### Converting Between representations: `Period` and `Timestamp`
Pandas provides the ability to convert between `Period` and `Timestamp` representations in time series data. This is useful when working with different time-based formats.

**Key Features:**
* `to_timestamp()`: Converts a `PeriodIndex` to a `DatetimeIndex` (useful for precise date-based analysis). 
* `to_period()`: Converts a `DatetimeIndex` back to a `PeriodIndex` (for working with time periods).

In [39]:
# Converts a PeriodIndex to a DatetimeIndex (Timestamp)
pst = ps.to_timestamp()
pst

2010-11-01   -2.021685
2011-02-01    0.114742
2011-05-01    0.846414
2011-08-01   -0.739876
2011-11-01   -0.057268
2012-02-01   -0.416103
2012-05-01    0.420937
2012-08-01   -0.085259
2012-11-01   -0.697850
2013-02-01   -1.250622
Freq: QS-NOV, dtype: float64

In [40]:
# Shows the DatetimeIndex
pst.index

DatetimeIndex(['2010-11-01', '2011-02-01', '2011-05-01', '2011-08-01',
               '2011-11-01', '2012-02-01', '2012-05-01', '2012-08-01',
               '2012-11-01', '2013-02-01'],
              dtype='datetime64[ns]', freq='QS-NOV')

In [41]:
# Converts back from DatetimeIndex to PeriodIndex
ps = pst.to_period()
ps

2010Q4   -2.021685
2011Q1    0.114742
2011Q2    0.846414
2011Q3   -0.739876
2011Q4   -0.057268
2012Q1   -0.416103
2012Q2    0.420937
2012Q3   -0.085259
2012Q4   -0.697850
2013Q1   -1.250622
Freq: Q-DEC, dtype: float64

In [42]:
# Shows the PeriodIndex
ps.index

PeriodIndex(['2010Q4', '2011Q1', '2011Q2', '2011Q3', '2011Q4', '2012Q1',
             '2012Q2', '2012Q3', '2012Q4', '2013Q1'],
            dtype='period[Q-DEC]')

## Processing Walmart's Financials

Here, we process Walmart's financial data through the CSV file, setting a `PeriodIndex` to align the data with Walmart's fiscal quarters, and converting the DataFrame for time-based analysis.

**Key Features:**
* `set_index()`: Sets "Line Item" as the index of the DataFrame
* `T` **(Transpose)**: Switches rows and columns for easier time-based operations
* `PeriodIndex()`: Converts the index to quarterly periods aligned with Walmart's fiscal year (`Q-JAN`)
* `start_time`: Retrieves the start date of a specific period, useful for time-based analysis.

In [43]:
df = pd.read_csv("wmt.csv")
df

Unnamed: 0,Line Item,2017Q1,2017Q2,2017Q3,2017Q4,2018Q1
0,Revenue,115904,120854,118179,130936,117542
1,Expenses,86544,89485,87484,97743,87688
2,Profit,29360,31369,30695,33193,29854


In [44]:
# Sets 'Line Item' as the index
df.set_index("Line Item", inplace=True)
# Transposes the DataFrame so that periods are the rows
df =  df.T
df

Line Item,Revenue,Expenses,Profit
2017Q1,115904,86544,29360
2017Q2,120854,89485,31369
2017Q3,118179,87484,30695
2017Q4,130936,97743,33193
2018Q1,117542,87688,29854


In [45]:
# Converts the index to a PeriodIndex aligned with Walmart's fiscal year
df.index = pd.PeriodIndex(df.index, freq="Q-JAN")
df

Line Item,Revenue,Expenses,Profit
2017Q1,115904,86544,29360
2017Q2,120854,89485,31369
2017Q3,118179,87484,30695
2017Q4,130936,97743,33193
2018Q1,117542,87688,29854


In [46]:
# Displays the PeriodIndex
df.index

PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1'], dtype='period[Q-JAN]')

In [47]:
# Retrieves the start time of the first period
df.index[0].start_time

Timestamp('2016-02-01 00:00:00')

### Add start end date columns to dataframe

Here, we add two new columns to the DataFrame: "Start Date" and "End Date". These columns will contain the start and end times of each period in the `PeriodIndex`.

In [48]:
# Adds a 'Start Date' column with the start time of each period
df["Start Date"]=df.index.map(lambda x: x.start_time)
df

Line Item,Revenue,Expenses,Profit,Start Date
2017Q1,115904,86544,29360,2016-02-01
2017Q2,120854,89485,31369,2016-05-01
2017Q3,118179,87484,30695,2016-08-01
2017Q4,130936,97743,33193,2016-11-01
2018Q1,117542,87688,29854,2017-02-01


In [49]:
# Adds an 'End Date' column with the end time of each period
df['End Date']=df.index.map(lambda x: x.end_time)
df

Line Item,Revenue,Expenses,Profit,Start Date,End Date
2017Q1,115904,86544,29360,2016-02-01,2016-04-30 23:59:59.999999999
2017Q2,120854,89485,31369,2016-05-01,2016-07-31 23:59:59.999999999
2017Q3,118179,87484,30695,2016-08-01,2016-10-31 23:59:59.999999999
2017Q4,130936,97743,33193,2016-11-01,2017-01-31 23:59:59.999999999
2018Q1,117542,87688,29854,2017-02-01,2017-04-30 23:59:59.999999999
