## 11.1 Date and Time Data Types and Tools
The Python standard library includes data types for date and time data, as well as calendar-related functionality. The `datetime`, `time`, and `calendar` modules are the main places to start. The `datetime.datetime` type, or simply datetime, is widely used:

from datetime import datetime
now = datetime.now()


now.year, now.month, now.day

`datetime` stores both the date and time down to the microsecond. `datetime.timedelta`, or simply `timedelta`, represents the temporal difference between two datetime objects:

delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
#datetime(year, month, day[, hour[, minute[, second[, microsecond[, tzinfo]]]]])


delta.days


delta.seconds

You can add (or subtract) a timedelta or multiple thereof to a datetime object to yield a new shifted object:

from datetime import timedelta
start = datetime(2011, 1, 7)
start + timedelta(12)
#timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)



start - 2 * timedelta(12)

Table 11.1: Types in the datetime module
Type	|Description
|:-----------|:--------------------------------------------------|
date|	Store calendar date (year, month, day) using the Gregorian calendar
time	|Store time of day as hours, minutes, seconds, and microseconds
datetime|	Store both date and time
timedelta|	The difference between two datetime values (as days, seconds, and microseconds)
tzinfo|	Base type for storing time zone information

### Converting Between String and Datetime
You can format datetime objects and pandas Timestamp objects, which I’ll introduce later, as strings using str or the strftime method, passing a format specification:

Table 11.2: datetime format specification (ISO C89 compatible)
Type	|Description
|:-----|:---------------------------------------------------------------|
%Y|	Four-digit year
%y|	Two-digit year
%m|	Two-digit month [01, 12]
%d|	Two-digit day [01, 31]
%H|	Hour (24-hour clock) [00, 23]
%I|	Hour (12-hour clock) [01, 12]
%M|	Two-digit minute [00, 59]
%S|	Second [00, 61] (seconds 60, 61 account for leap seconds)
%f|	Microsecond as an integer, zero-padded (from 000000 to 999999)
%j|	Day of the year as a zero-padded integer (from 001 to 336)
%w|	Weekday as an integer [0 (Sunday), 6]
%u|	Weekday as an integer starting from 1, where 1 is Monday.
%U|	Week number of the year [00, 53]; Sunday is considered the first day of the week, and days before the first Sunday of the year are “week 0”
%W|	Week number of the year [00, 53]; Monday is considered the first day of the week, and days before the first Monday of the year are “week 0”
%z|	UTC time zone offset as +HHMM or -HHMM; empty if time zone naive
%Z|	Time zone name as a string, or empty string if no time zone
%F|	Shortcut for %Y-%m-%d (e.g., 2012-4-18)
%D|	Shortcut for %m/%d/%y (e.g., 04/18/12)

You can use many of the same format codes to convert strings to dates using `datetime.strptime` (but some codes, like %F, cannot be used):

stamp = datetime(2011, 1, 3)
str(stamp)


stamp.strftime("%Y-%m-%d")

value = "2011-01-03"
datetime.strptime(value, "%Y-%m-%d")


The pandas.to_datetime method parses many different kinds of date representations. Standard date formats like ISO 8601 can be parsed quickly:

datestrs = ["2011-07-06 12:00:00", "2011-08-06 00:00:00"]
pd.to_datetime(datestrs)

It also handles values that should be considered missing (None, empty string, etc.): `NaT` (Not a Time) is pandas’s null value for timestamp data.

idx = pd.to_datetime(datestrs + [None])

:::{.callout-warning}
`dateutil.parser` is a useful but imperfect tool. Notably, it will recognize some strings as dates that you might prefer that it didn’t; for example, "42" will be parsed as the year 2042 with today’s calendar date.

datetime objects also have a number of locale-specific formatting options for systems in other countries or languages. For example, the abbreviated month names will be different on German or French systems compared with English systems. See Table 11.3 for a listing.

Table 11.3: Locale-specific date formatting

Type	|Description
|:------|:--------------------------------------------------|
%a	|Abbreviated weekday name
%A	|Full weekday name
%b	|Abbreviated month name
%B	|Full month name
%c	|Full date and time (e.g., ‘Tue 01 May 2012 04:20:57 PM’)
%p	|Locale equivalent of AM or PM
%x	|Locale-appropriate formatted date (e.g., in the United States, May 1, 2012 yields ‘05/01/2012’)
%X	|Locale-appropriate time (e.g., ‘04:24:12 PM’)

# 11.2 Time Series Basics
A basic kind of time series object in pandas is a Series indexed by timestamps, which is often represented outside of pandas as Python strings or datetime objects:

dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.standard_normal(6), index=dates)


Like other Series, arithmetic operations between differently indexed time series automatically align on the dates:

ts + ts[::2] # note the data are aligned by date

pandas stores timestamps using NumPy’s datetime64 data type at the nanosecond resolution:

Scalar values from a DatetimeIndex are pandas `Timestamp` objects:

A `pandas.Timestamp` can be substituted most places where you would use a `datetime` object. The reverse is not true, however, because `pandas.Timestamp` can store nanosecond precision data, while datetime stores only up to microseconds. Additionally, `pandas.Timestamp` can store frequency information (if any) and understands how to do time zone conversions and other kinds of manipulations. More on both of these things later in Time Zone Handling.

### Indexing, Selection, Subsetting
Time series behaves like any other Series when you are indexing and selecting data based on the label:

As a convenience, you can also pass a string that is interpretable as a date:

ts["2011-01-10"]

longer_ts = pd.Series(np.random.standard_normal(1000),
                      index=pd.date_range("2000-01-01", periods=1000))


longer_ts["2001"] # data of the year 2001

longer_ts["2001-05"] # data of year 2001 and the 5th month

Slicing with datetime objects works as well:

ts[datetime(2011, 1, 7):]


ts[datetime(2011, 1, 7):datetime(2011, 1, 10)]

Because most time series data is ordered chronologically, you can slice with timestamps not contained in a time series to perform a range query:


ts["2011-01-06":"2011-01-11"]

As before, you can pass a string date, datetime, or timestamp. Remember that slicing in this manner produces views on the source time series, like slicing NumPy arrays. This means that no data is copied, and modifications on the slice will be reflected in the original data.

There is an equivalent instance method, `truncate`, that slices a Series between two dates:

ts.truncate(after="2011-01-09")

All of this holds true for DataFrame as well, indexing on its rows:

dates = pd.date_range("2000-01-01", periods=100, freq="W-WED")
long_df = pd.DataFrame(np.random.standard_normal((100, 4)),
                       index=dates,
                       columns=["Colorado", "Texas",
                                "New York", "Ohio"])

long_df.loc["2001-05"]

### Time Series with Duplicate Indices
In some applications, there may be multiple data observations falling on a particular timestamp. 

dates = pd.DatetimeIndex(["2000-01-01", "2000-01-02", "2000-01-02",
                          "2000-01-02", "2000-01-03"])
dup_ts = pd.Series(np.arange(5), index=dates)

dup_ts.index.is_unique

Suppose you wanted to aggregate the data having nonunique timestamps. One way to do this is to use `groupby` and pass `level=0` (the one and only level):

grouped = dup_ts.groupby(level=0)
grouped.mean()


# 11.3 Date Ranges, Frequencies, and Shifting
Generic time series in pandas are assumed to be irregular; that is, they have no fixed frequency. For example, you can convert the sample time series to fixed daily frequency by calling resample:

resampler = ts.resample("D")

### Generating Date Ranges
`pandas.date_range` is responsible for generating a `DatetimeIndex` with an indicated length according to a particular frequency:

index = pd.date_range("2012-04-01", "2012-06-01")

By default, `pandas.date_range` generates daily timestamps. If you pass only a start or end date, you must pass a number of `periods` to generate:

pd.date_range(start="2012-04-01", periods=20)


pd.date_range(end="2012-06-01", periods=20)

The `start` and `end` dates define strict boundaries for the generated date index. For example, if you wanted a date index containing the last business day of each month, you would pass the `"BM"` frequency (business end of month; see a more complete listing of frequencies in Table 11.4), and only dates falling on or inside the date interval will be included:

pd.date_range("2000-01-01", "2000-12-01", freq="BM")

Table 11.4: Base time series frequencies (not comprehensive)
Alias	|Offset type	|Description
|:----------|:--------|:-----------------------------------------------------|
D|	Day|	Calendar daily
B|	BusinessDay|	Business daily
H|	Hour|	Hourly
T or min|	Minute|	Once a minute
S	|Second|	Once a second
L or ms|	Milli|	Millisecond (1/1,000 of 1 second)
U	|Micro	|Microsecond (1/1,000,000 of 1 second)
M	|MonthEnd|	Last calendar day of month
BM	|BusinessMonthEnd|	Last business day (weekday) of month
MS|	MonthBegin|	First calendar day of month
BMS	|BusinessMonthBegin|	First weekday of month
W-MON, W-TUE, ...|	Week|	Weekly on given day of week (MON, TUE, WED, THU, FRI, SAT, or SUN)
WOM-1MON, WOM-2MON, ...	|WeekOfMonth|	Generate weekly dates in the first, second, third, or fourth week of the month (e.g., WOM-3FRI for the third Friday of each month)
Q-JAN, Q-FEB, ...|	QuarterEnd|	Quarterly dates anchored on last calendar day of each month, for year ending in indicated month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC)
BQ-JAN, BQ-FEB, ...	|BusinessQuarterEnd|	Quarterly dates anchored on last weekday day of each month, for year ending in indicated month
QS-JAN, QS-FEB, ...	|QuarterBegin|	Quarterly dates anchored on first calendar day of each month, for year ending in indicated month
BQS-JAN, BQS-FEB, ...	|BusinessQuarterBegin|	Quarterly dates anchored on first weekday day of each month, for year ending in indicated month
A-JAN, A-FEB, ...	|YearEnd|	Annual dates anchored on last calendar day of given month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC)
BA-JAN, BA-FEB, ...|	BusinessYearEnd|	Annual dates anchored on last weekday of given month
AS-JAN, AS-FEB, ...	|YearBegin|	Annual dates anchored on first day of given month
BAS-JAN, BAS-FEB, ...	|BusinessYearBegin|	Ann

`pandas.date_range` by default preserves the time (if any) of the start or end timestamp:

pd.date_range("2012-05-02 12:56:31", periods=5)

Sometimes you will have start or end dates with time information but want to generate a set of timestamps normalized to midnight as a convention. To do this, there is a `normalize` option:

pd.date_range("2012-05-02 12:56:31", periods=5, normalize=True)

### Frequencies and Date Offsets
Frequencies in pandas are composed of a base frequency and a multiplier. Base frequencies are typically referred to by a string alias, like "M" for monthly or "H" for hourly. For each base frequency, there is an object referred to as a date offset. For example, hourly frequency can be represented with the Hour class:

from pandas.tseries.offsets import Hour, Minute
hour = Hour()

You can define a multiple of an offset by passing an integer:

four_hours = Hour(4) # <4*Hours>

In most applications, you would never need to explicitly create one of these objects; instead you'd use a string alias like `"H"` or `"4H"`. Putting an integer before the base frequency creates a multiple:

pd.date_range("2000-01-01", "2000-01-03 23:59", freq="4H")

Hour(2) + Minute(30)  # <150 * Minutes>

Similarly, you can pass frequency strings, like "1h30min", that will effectively be parsed to the same expression:

pd.date_range("2000-01-01", periods=10, freq="1h30min")

Some frequencies describe points in time that are not evenly spaced. For example, "M" (calendar month end) and "BM" (last business/weekday of month) depend on the number of days in a month and, in the latter case, whether the month ends on a weekend or not. We refer to these as `anchored offsets`.

### Week of month dates
One useful frequency class is “week of month,” starting with WOM. This enables you to get dates like the third Friday of each month:

monthly_dates = pd.date_range("2012-01-01", "2012-09-01", freq="WOM-3FRI")

### Shifting (Leading and Lagging) Data
Shifting refers to moving data backward and forward through time. Both Series and DataFrame have a shift method for doing naive shifts forward or backward, leaving the index unmodified:

ts = pd.Series(np.random.standard_normal(4),
               index=pd.date_range("2000-01-01", periods=4, freq="M"))

ts.shift(2)


ts.shift(-2)

A common use of shift is computing consecutive percent changes in a time series or multiple time series as DataFrame columns. This is expressed as:

`ts / ts.shift(1) - 1`

Because naive shifts leave the index unmodified, some data is discarded. Thus if the frequency is known, it can be passed to shift to advance the timestamps instead of simply the data:

ts.shift(2, freq="M")

ts.shift(1, freq="D")


ts.shift(1, freq="90T")

The `T` here stands for minutes. Note that the freq parameter here indicates the offset to apply to the timestamps, but it does not change the underlying frequency of the data, if any.

### Shifting dates with offsets
The pandas date offsets can also be used with `datetime` or `Timestamp` objects:

from pandas.tseries.offsets import Day, MonthEnd
now = datetime(2011, 11, 17)
now + 3 * Day()  # Timestamp('2011-11-20 00:00:00')

If you add an `anchored` offset like `MonthEnd`, the first increment will "roll forward" a date to the next date according to the frequency rule:

now + MonthEnd() # Timestamp('2011-11-30 00:00:00')

now + MonthEnd(2) # Timestamp('2011-12-31 00:00:00')

Anchored offsets can explicitly “roll” dates forward or backward by simply using their `rollforward` and `rollback` methods, respectively:

offset = MonthEnd()
offset.rollforward(now) #Timestamp('2011-11-30 00:00:00')

offset.rollback(now) # previous month end Timestamp('2011-10-31 00:00:00')

A creative use of date offsets is to use these methods with `groupby`:

ts.groupby(MonthEnd().rollforward).mean()

ts["2000-03"].mean()

Of course, an easier and faster way to do this is with `resample` 

ts.resample("M").mean()

## 11.4 Time Zone Handling
Working with time zones can be one of the most unpleasant parts of time series manipulation. As a result, many time series users choose to work with time series in coordinated universal time or UTC, which is the geography-independent international standard. Time zones are expressed as offsets from UTC; for example, New York is four hours behind UTC during daylight saving time (DST) and five hours behind the rest of the year.

In Python, time zone information comes from the third-party `pytz` library (installable with pip or conda), which exposes the Olson database, a compilation of world time zone information. This is especially important for historical data because the DST transition dates (and even UTC offsets) have been changed numerous times depending on the regional laws. In the United States, the DST transition times have been changed many times since 1900!

import pytz
pytz.common_timezones[-5:] # ['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']

To get a time zone object from `pytz`, use `pytz.timezone`: Methods in pandas will accept either time zone names or these objects.

tz = pytz.timezone("America/New_York")

### Time Zone Localization and Conversion
By default, time series in pandas are `time zone naive` (no time zone). For example, consider the following time series:

print(ts.index.tz)

Date ranges can be generated with a time zone set:

pd.date_range("2012-03-09 09:30", periods=10, tz="UTC")

Conversion from naive to localized (reinterpreted as having been observed in a particular time zone) is handled by the `tz_localize` method:

ts_utc = ts.tz_localize("UTC")

ts_utc.index

Once a time series has been localized to a particular time zone, it can be converted to another time zone with `tz_convert`:

ts_utc.tz_convert("America/New_York")# note the -5 and -4(DST): offset from UTC time

ts_eastern.tz_convert("UTC")

`tz_localize` and `tz_convert` are also instance methods on DatetimeIndex:

### Operations with Time Zone-Aware Timestamp Objects
Similar to time series and date ranges, individual Timestamp objects similarly can be localized from naive to time zone-aware and converted from one time zone to another:

You can also pass a time zone when creating the Timestamp:

stamp_moscow = pd.Timestamp("2011-03-12 04:00", tz="Europe/Moscow")

Time zone-aware Timestamp objects internally store a UTC timestamp value as nanoseconds since the Unix epoch (January 1, 1970), so changing the time zone does not alter the internal UTC value:

stamp_utc.value


stamp_utc.tz_convert("America/New_York").value

When performing time arithmetic using pandas’s DateOffset objects, pandas respects daylight saving time transitions where possible. Here we construct timestamps that occur right before DST transitions (forward and backward). First, 30 minutes before transitioning to DST:

stamp = pd.Timestamp("2012-03-11 01:30", tz="US/Eastern") # 30 minutes before the DST transition


stamp + Hour()

Then, 90 minutes before transitioning out of DST:

stamp = pd.Timestamp("2012-11-04 00:30", tz="US/Eastern") # 90 mins before transition out of DST


stamp + 2 * Hour()

### Operations Between Different Time Zones
If two time series with different time zones are combined, the result will be UTC. Since the timestamps are stored under the hood in UTC, this is a straightforward operation and requires no conversion:

dates = pd.date_range("2012-03-07 09:30", periods=10, freq="B")
ts = pd.Series(np.random.standard_normal(len(dates)), index=dates)


ts1 = ts[:7].tz_localize("Europe/London")

ts2 = ts1[2:].tz_convert("Europe/Moscow")


result = ts1 + ts2

Operations between time zone-naive and time zone-aware data are not supported and will raise an exception.

## 11.5 Periods and Period Arithmetic
Periods represent time spans, like days, months, quarters, or years. The `pandas.Period` class represents this data type, requiring a string or integer and a supported frequency from Table 11.4:

p = pd.Period("2011", freq="A-DEC") # the entire year 2011

In this case, the Period object represents the full time span from January 1, 2011, to December 31, 2011, inclusive. Conveniently, adding and subtracting integers from periods has the effect of shifting their frequency:

p + 5


If two periods have the same frequency, their difference is the number of units between them as a date offset:

pd.Period("2014", freq="A-DEC") - p

Regular ranges of periods can be constructed with the period_range function:

periods = pd.period_range("2000-01-01", "2000-06-30", freq="M")

The `PeriodIndex` class stores a sequence of periods and can serve as an axis index in any pandas data structure:

pd.Series(np.random.standard_normal(6), index=periods)

If you have an array of strings, you can also use the `PeriodIndex` class, where all of its values are periods:

values = ["2001Q3", "2002Q2", "2003Q1"]
index = pd.PeriodIndex(values, freq="Q-DEC")

### Period Frequency Conversion
Periods and PeriodIndex objects can be converted to another frequency with their `asfreq` method. As an example, suppose we had an annual period and wanted to convert it into a monthly period either at the start or end of the year. 

p = pd.Period("2011", freq="A-DEC")

p+1 # note p+1 is 2012 because the the frequency is by year ending in dec. 

pm=p.asfreq("M", how="start")# the  period is converted to monthly frequency. count each period  from the first (start) month

pmd =  p.asfreq("M", how="end")  # count each period (1 year) from the end month

p.asfreq("M") # same as p.asfreq("M", how="end")

You can think of `Period("2011", "A-DEC")` as being a sort of cursor pointing to a span of time, subdivided by monthly periods.  For a fiscal year ending on a month other than December, the corresponding monthly subperiods are different:

p = pd.Period("2011", freq="A-JUN") # the period 2011 ends in Jun. 

pmj=p.asfreq("M", how="start") # use (fiscal) year as frequency, but each fiscal year starts in July. 

pme=p.asfreq("M", how="end") # the (fiscal) year ends in 2011-06. 

When you are converting from high to low frequency, pandas determines the subperiod, depending on where the superperiod “belongs.” For example, in A-JUN frequency, the month Aug-2011 is actually part of the 2012 period:

p = pd.Period("Aug-2011", "M") #higher freq


pma = p.asfreq("A-JUN") # lower frequency: converted to annum freq, but each year ends in June. 

Whole PeriodIndex objects or time series can be similarly converted with the same semantics:

periods = pd.period_range("2006", "2009", freq="A-DEC")

ts = pd.Series(np.random.standard_normal(len(periods)), index=periods)

ts.asfreq("M", how="start")

ts.asfreq("B", how="end")

### Quarterly Period Frequencies
Quarterly data is standard in accounting, finance, and other fields. Much quarterly data is reported relative to a fiscal year end, typically the last calendar or business day of one of the 12 months of the year. Thus, the period 2012Q4 has a different meaning depending on fiscal year end. pandas supports all 12 possible quarterly frequencies as Q-JAN through Q-DEC:

p = pd.Period("2012Q4", freq="Q-JAN")

p.asfreq("D", how="start") #converted to daily frequency: 
#the start day of 2012Q4 with the last Quarter ending in Jan is 2011-11-01


p.asfreq("D", how="end")

Thus, it’s possible to do convenient period arithmetic; for example, to get the timestamp at 4 P.M. on the second-to-last business day of the quarter, you could do:

p.asfreq("B", how="end")

(p.asfreq("B", how="end") - 1).asfreq("T", how="start")

p4pm = (p.asfreq("B", how="end") - 1).asfreq("T", how="start") + 16 * 60 # T for minute


p4pm.to_timestamp()

The `to_timestamp` method returns the Timestamp at the start of the period by default.

You can generate quarterly ranges using `pandas.period_range`. The arithmetic is identical, too:

periods = pd.period_range("2011Q3", "2012Q4", freq="Q-JAN")
ts = pd.Series(np.arange(len(periods)), index=periods)

new_periods = (periods.asfreq("B", "end") - 1).asfreq("H", "start") + 16
ts.index = new_periods.to_timestamp()

### Converting Timestamps to Periods (and Back)
Series and DataFrame objects indexed by timestamps can be converted to periods with the to_period method:

dates = pd.date_range("2000-01-01", periods=3, freq="M")
ts = pd.Series(np.random.standard_normal(3), index=dates)

pts = ts.to_period()

Since periods refer to nonoverlapping time spans, a timestamp can only belong to a single period for a given frequency. While the frequency of the new PeriodIndex is inferred from the timestamps by default, you can specify any supported frequency (most of those listed in Table 11.4 are supported). There is also no problem with having duplicate periods in the result:

ts2.to_period("M")

To convert back to timestamps, use the `to_timestamp` method, which returns a DatetimeIndex:

pts = ts2.to_period() #default freq=D

pts.to_timestamp(how="end")

### Creating a PeriodIndex from Arrays
Fixed frequency datasets are sometimes stored with time span information spread across multiple columns. For example, in this macroeconomic dataset, the year and quarter are in different columns:

By passing these arrays to PeriodIndex with a frequency, you can combine them to form an index for the DataFrame:

index = pd.PeriodIndex(year=data["year"], quarter=data["quarter"],
                       freq="Q-DEC")

## 11.6 Resampling and Frequency Conversion
Resampling refers to the process of converting a time series from one frequency to another. Aggregating higher frequency data to lower frequency is called downsampling, while converting lower frequency to higher frequency is called upsampling. Not all resampling falls into either of these categories; for example, converting W-WED (weekly on Wednesday) to W-FRI is neither upsampling nor downsampling.

pandas objects are equipped with a resample method, which is the workhorse function for all frequency conversion. resample has a similar API to groupby; you call resample to group the data, then call an aggregation function:

dates = pd.date_range("2000-01-01", periods=100)
ts = pd.Series(np.random.standard_normal(len(dates)), index=dates)

ts.resample("M").mean()


ts.resample("M", kind="period").mean()

Table 11.5: resample method arguments

Argument	|Description
|:------------|:--------------------------------------------------------------------------|
rule|	String, DateOffset, or timedelta indicating desired resampled frequency (for example, ’M', ’5min', or Second(15))
axis|	Axis to resample on; default axis=0
fill_method	|How to interpolate when upsampling, as in "ffill" or "bfill"; by default does no interpolation
closed	|In downsampling, which end of each interval is closed (inclusive), "right" or "left"
label|	In downsampling, how to label the aggregated result, with the "right" or "left" bin edge (e.g., the 9:30 to 9:35 five-minute interval could be labeled 9:30 or 9:35)
limit|	When forward or backward filling, the maximum number of periods to fill
kind|	Aggregate to periods ("period") or timestamps ("timestamp"); defaults to the type of index the time series has
convention|	When resampling periods, the convention ("start" or "end") for converting the low-frequency period to high frequency; defaults to "start"
origin|	The "base" timestamp from which to determine the resampling bin edges; can also be one of "epoch", "start", "start_day", "end", or "end_day"; see the resample docstring for full details
offset|	An offset timedelta added to the origin; defaults to None

### Downsampling
Downsampling is aggregating data to a regular, lower frequency. The data you’re aggregating doesn’t need to be fixed frequently; the desired frequency defines bin edges that are used to slice the time series into pieces to aggregate. For example, to convert to monthly, "M" or "BM", you need to chop up the data into one-month intervals. Each interval is said to be half-open; a data point can belong only to one interval, and the union of the intervals must make up the whole time frame. There are a couple things to think about when using resample to downsample data:

Which side of each interval is closed

How to label each aggregated bin, either with the start of the interval or the end

To illustrate, let’s look at some one-minute frequency data:

dates = pd.date_range("2000-01-01", periods=12, freq="T") # T for min
ts = pd.Series(np.arange(len(dates)), index=dates)

ts.resample("5min").sum() #default: left-inclusive; use teh start time to denote each period
#each bin is labeled by the timestamps of the left side

ts.resample("5min", closed="right").sum() #each bin is labeled by the left edge
# first timestamp belons to the previous period, because it doesnot belongs to (left_0, right_0]

ts.resample("5min", closed="right", label="right").sum()#label by right edge

Lastly, you might want to shift the result index by some amount, say subtracting one second from the right edge to make it more clear which interval the timestamp refers to. To do this, add an offset to the resulting index:

from pandas.tseries.frequencies import to_offset
result = ts.resample("5min", closed="right", label="right").sum()
result.index = result.index + to_offset("-1s")

### Open-high-low-close (OHLC) resampling
In finance, a popular way to aggregate a time series is to compute four values for each bucket: the first (open), last (close), maximum (high), and minimal (low) values. By using the ohlc aggregate function, you will obtain a DataFrame having columns containing these four aggregates, which are efficiently computed in a single function call:

ts.resample("5min").ohlc() # for each bucket

### Upsampling and Interpolation

When you are using an aggregation function with this data, there is only one value per group, and missing values result in the gaps. We use the `asfreq` method to convert to the higher frequency without any aggregation:

df_daily = frame.resample("D").asfreq()

Suppose you wanted to fill forward each weekly value on the non-Wednesdays. The same filling or interpolation methods available in the fillna and reindex methods are available for resampling:

frame.resample("D").ffill()

frame.resample("D").ffill(limit=2)

Notably, the new date index need not coincide with the old one at all:

frame.resample("W-THU").ffill()

### Resampling with Periods
Resampling data indexed by periods is similar to timestamps:

annual_frame = frame.resample("A-DEC").mean()

Upsampling is more nuanced, as before resampling you must make a decision about which end of the time span in the new frequency to place the values. The convention argument defaults to "start" but can also be "end":

Since periods refer to time spans, the rules about upsampling and downsampling are more rigid:

In downsampling, the target frequency must be a subperiod of the source frequency.

In upsampling, the target frequency must be a superperiod of the source frequency.

If these rules are not satisfied, an exception will be raised. This mainly affects the quarterly, annual, and weekly frequencies; for example, the time spans defined by Q-MAR only line up with A-MAR, A-JUN, A-SEP, and A-DEC:

annual_frame.resample("Q-MAR").asfreq() #default: Place the value at the start

annual_frame.resample("Q-MAR").ffill()

### Grouped Time Resampling
For time series data, the resample method is semantically a group operation based on a time intervalization. Here's a small example table:

df.set_index("time").resample("5min").count()

Suppose that a DataFrame contains multiple time series, marked by an additional group key column:

A = np.array([[1, 2], [3, 4]])
repeated_array = np.tile(A, (2, 3))

df2 = pd.DataFrame({"time": times.repeat(3),
                    "key": np.tile(["a", "b", "c"], N),
                    "value": np.arange(N * 3.)})

To do the same resampling for each value of "key", we introduce the `pandas.Grouper` object: One constraint with using `pandas.Grouper` is that the time must be the index of the Series or DataFrame.

time_key = pd.Grouper(freq="5min")

We can then set the time index, group by `"key"` and `time_key`, and aggregate:

resampled = (df2.set_index("time") #time must be the index to use pd.Grouper
             .groupby(["key", time_key])
             .sum())

resampled.reset_index()

# 11.7 Moving Window Functions
An important class of array transformations used for time series operations are statistics and other functions evaluated over a sliding window or with exponentially decaying weights. This can be useful for smoothing noisy or gappy data. I call these moving window functions, even though they include functions without a fixed-length window like exponentially weighted moving average. Like other statistical functions, these also automatically exclude missing data.

close_px_1.isna().sum()# number of na rows (added dates)

close_px_1[close_px_1.isna().any(axis=1)] # list all dates with nan

close_px = close_px.resample("B").ffill()

I now introduce the `rolling` operator, which behaves similarly to `resample` and `groupby`. It can be called on a Series or DataFrame along with a window (expressed as a number of periods; see Apple price with 250-day moving average for the plot created):

close_px["AAPL"].rolling(250).mean() # note the first 249 values are nana

close_px["AAPL"].rolling(250).mean().isna().sum()

close_px["AAPL"].plot()
close_px["AAPL"].rolling(250).mean().plot()


The expression `rolling(250)` is similar in behavior to groupby, but instead of grouping, it creates an object that enables grouping over a 250-day sliding window. So here we have the 250-day moving window average of Apple's stock price.

By default, `rolling` functions require all of the values in the window to be `non-NA`. This behavior can be changed to account for missing data and, in particular, the fact that you will have fewer than window periods of data at the beginning of the time series (see Apple 250-day daily return standard deviation):

close_px_all["AAPL"].resample('B').asfreq().isna().sum()

# 
close_px_all["AAPL"].resample('B').asfreq().rolling(250).mean().plot() # with na values, no mean will be computeed. 
# if reaplce mean() by mean(skipna=True) still not working

close_px["AAPL"].pct_change()

std250 = close_px["AAPL"].pct_change().rolling(250, min_periods=10).std()
std250[0:12]

To compute an expanding window mean, use the `expanding` operator instead of `rolling`. The expanding mean starts the time window from the same point as the rolling window and increases the size of the window until it encompasses the whole series. An expanding window mean on the std250 time series looks like this:

expanding_mean = std250.expanding().mean() 
#expanding(min_periods=1,  axis=0)
#example ts.expanding().sum(): computer cumulative sum of ts

Calling a moving window function on a DataFrame applies the transformation to each column (see Stock prices 60-day moving average (log y-axis)):

plt.style.use('grayscale')
close_px.rolling(60).mean().plot(logy=True)

The `rolling` function also accepts a string indicating a fixed-size time offset rolling() in moving window functions rather than a set number of periods. Using this notation can be useful for irregular time series. These are the same strings that you can pass to resample. For example, we could compute a 20-day rolling mean like so:

close_px.rolling("20D").mean()

### Exponentially Weighted Functions
An alternative to using a fixed window size with equally weighted observations is to specify a constant decay factor to give more weight to more recent observations. There are a couple of ways to specify the decay factor. A popular one is using a span, which makes the result comparable to a simple moving window function with window size equal to the span.

Since an exponentially weighted statistic places more weight on more recent observations, it “adapts” faster to changes compared with the equal-weighted version.

`pandas` has the `ewm` operator (which stands for exponentially weighted moving) to go along with `rolling` and `expanding`. Here’s an example comparing a 30-day moving average of Apple’s stock price with an exponentially weighted (EW) moving average with `span=60` (see Simple moving average versus exponentially weighted):

plt.figure()
aapl_px = close_px["AAPL"]["2006":"2007"]

ma30 = aapl_px.rolling(30, min_periods=20).mean()
ewma30 = aapl_px.ewm(span=30).mean()

aapl_px.plot(style="k-", label="Price")
ma30.plot(style="k--", label="Simple Moving Avg")
ewma30.plot(style="k-", label="EW MA")
plt.legend()

### Binary Moving Window Functions
Some statistical operators, like correlation and covariance, need to operate on two time series. As an example, financial analysts are often interested in a stock’s correlation to a benchmark index like the S&P 500. To have a look at this, we first compute the percent change for all of our time series of interest:

After we call `rolling`, the `corr` aggregation function can then compute the rolling correlation with spx_rets 

corr = returns["AAPL"].rolling(125, min_periods=100).corr(spx_rets)
corr.plot()

Suppose you wanted to compute the rolling correlation of the S&P 500 index with many stocks at once.  we can compute all of the rolling correlations in one shot by calling rolling on the DataFrame and passing the spx_rets Series.

plt.figure()
corr = returns.rolling(125, min_periods=100).corr(spx_rets)
corr.plot()

### User-Defined Moving Window Functions
The `apply` method on rolling and related methods provides a way to apply an array function of your own creation over a moving window. The only requirement is that the function produce a single value (a reduction) from each piece of the array. For example, while we can compute sample quantiles using `rolling(...).quantile(q)`, we might be interested in the percentile rank of a particular value over the sample. The `scipy.stats.percentileofscore` function does just this 

from scipy.stats import percentileofscore
def score_at_2percent(x):
    return percentileofscore(x, 0.02) #compute the 2% percentile value in x. 

result = returns["AAPL"].rolling(250).apply(score_at_2percent)
result.plot()