Anything that is recorded repeatedly at many points in time forms a time series. 

Many time series are fixed frequency - data points occur at regular intervals according to some rules. 

How to mark or refer time series data:

- Timestamps: specific instants in time
- Fixed period: Such as whole month or whole year
- Intervals of time: Indicated by a start and end timestamp. Periods can be thought of as special cases of intervals.
- Experiment or elapsed time: Each timestamp is a measure if time related to a particular start time. (The index may be an integer or floating-point number indicating elapsed time from the start of the experiment)


In [None]:
import numpy as np
import pandas as pd

# 11.1 Date and Time Data Types and Tools



In [None]:
from datetime import datetime, timedelta

now = datetime.now()
now
now.year, now.month, now.day

# datetime stores both the date and time down to the microsecond. 
# datetime.timedelta represents the temporal difference between two datetime objects

delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta
delta.days, delta.seconds

# Add or substract a timedelta or multipole to a datetime object to yield a new shifted object
start = datetime(2011, 1, 7)
start + timedelta(12)
start - 2 * timedelta(12)

## Converting Between String and Datetime

In [None]:
# Format datetime objects and pandas Timestamp objects as strings using str or strftime method

stamp = datetime(2011, 1, 3)
str(stamp)
stamp.strftime("%Y-%m-%d")

# Use same format code convert strings to date
value = "2011-01-03"

# use datetime.strptime to parse a date with a known format
datetime.strptime(value, "%Y-%m-%d")
datestrs = ['7/6/2011', '8/6/2011']
[datetime.strptime(x, "%m/%d/%Y") for x in datestrs]


## Pandas datetime object
use to_datetime to parse string to date time.
datetime objects also have a number of local-specific formatting options for systems in other countries or languages. 

In [None]:
# pandas.to_datetime method parses different kinds of date representations, e.g. ISO8601 formats
datestrs = ["2011-07-06 12:00:00", "2011-08-06 00:00:00"]

pd.to_datetime(datestrs)


# NaT(Not a Time) is pandas's null value for timestamp data 
idx = pd.to_datetime(datestrs + [None])
idx

# 11.2 Time Series Basics
A basic kind of time series object in pandas is a Series indexed by timestamps, which is often represented outside of pandas as Python strings or datetime objects

In [None]:
dates = [
    datetime(2011, 1, 2),
    datetime(2011, 1, 5),
    datetime(2011, 1, 7),
    datetime(2011, 1, 8),
    datetime(2011, 1, 10),
    datetime(2011, 1, 12),
]

ts = pd.Series(np.random.standard_normal(6), index=dates)

ts

In [None]:
# perform arithmetic operation between differently indexed time series. 
# [::2] selects every second elements in ts
ts + ts[::2]

In [None]:
# Find the index data type
ts.index.dtype

In [None]:
ts.index

In [None]:
# scalar values from a datetimeIndex are pandas timestamp objects

stamp = ts.index[0]
stamp

A pandas.Timestamp can be substituted most places where your would use a datetime object. But reverse it not true.
pandas.Timestamp can store nano seconds and frequency information, where datetime object cannot. 

## Indexing, Selection, Subsetting

Time series behaves like any other Series when you are indexing and selecting data based on the label.

In [None]:
# Select by index
stamp = ts.index[2]

ts[stamp]

In [None]:
# Select by string that is interpretable as a date
# Both of these 3 works
ts["2011/01/10"]
ts["20110110"]
ts["2011-01-10"]

In [None]:
# Pace a year or a month to select slices of data 
longer_ts = pd.Series(np.random.standard_normal(1000), index=pd.date_range('2011-01-01', periods=1000))

In [None]:
longer_ts['2012']

# Select the month
longer_ts['2012-02']

# Select by datetime object
ts[datetime(2011,1,7)]

# If time stamps not contained in a time series, the range still can be selected
ts['2011-01-06':'2011-01-11']

# use truncate slice a series between two dates
ts.truncate(after="2011-01-09")

In [None]:
times = ts.truncate(after="2011-01-09")
times[:] = np.nan

In [None]:
ts

In [None]:
times

In [None]:
#  Indexing is same for DataFrame

dates = pd.date_range("2000-01-01", periods=100, freq="W-WED")

long_df = pd.DataFrame(
    np.random.standard_normal((100, 4)),
    index=dates,
    columns=["Coloradi", "Texas", "New York", "Ohio"],
)

long_df.loc['2001-05']


## Time Series with Duplicated Indices

When there are multiple observations failing on a particular timestamp. 

In [37]:
dates = pd.DatetimeIndex(['2000-01-01','2000-01-02','2000-01-02', '2000-01-02', '2000-01-03'])

dup_ts = pd.Series(np.arange(5), index=dates)

In [39]:
dup_ts.index.is_unique

False

In [40]:
# To aggregate the data having nonunique timestamps

grouped = dup_ts.groupby(level=0)

# To get the dataframe without duplications
grouped.mean()

2000-01-01    0.0
2000-01-02    2.0
2000-01-03    4.0
dtype: float64