# Time Series Basics
**Time Series** is a pandas Series object indexed by timestamps.

**pandas.Timestamp** is pandas replacement for **datetime.datetime** and is interchangeable with it in most cases. It’s the type used for the entries that make up a *DatetimeIndex*, and other timeseries oriented data structures in pandas: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html

In [2]:
from datetime import datetime
import pandas as pd
import numpy as np

## Create Time Series
Create a Time Series from a list of datetime objects.

In [3]:
dates = [datetime(2019, 4, 2),
        datetime(2019, 4, 4),
        datetime(2019, 4, 6),
        datetime(2019, 4, 8)]
dates

[datetime.datetime(2019, 4, 2, 0, 0),
 datetime.datetime(2019, 4, 4, 0, 0),
 datetime.datetime(2019, 4, 6, 0, 0),
 datetime.datetime(2019, 4, 8, 0, 0)]

In [4]:
ts = pd.Series([1, 2, 3, 4], index=dates)
ts

2019-04-02    1
2019-04-04    2
2019-04-06    3
2019-04-08    4
dtype: int64

Scalar values from a *DatetimeIndex* are pandas ***Timestamp*** objects:

In [5]:
ts.index

DatetimeIndex(['2019-04-02', '2019-04-04', '2019-04-06', '2019-04-08'], dtype='datetime64[ns]', freq=None)

In [6]:
ts.index[0]

Timestamp('2019-04-02 00:00:00')

Arithmetic operations between differently indexed time series **align on the dates**:

In [7]:
ts

2019-04-02    1
2019-04-04    2
2019-04-06    3
2019-04-08    4
dtype: int64

In [8]:
# Select every 2nd element
ts[::2]

2019-04-02    1
2019-04-06    3
dtype: int64

In [9]:
ts + ts[::2]

2019-04-02    2.0
2019-04-04    NaN
2019-04-06    6.0
2019-04-08    NaN
dtype: float64

## Indexing, Selection, Subsetting

In [10]:
ts

2019-04-02    1
2019-04-04    2
2019-04-06    3
2019-04-08    4
dtype: int64

### Time Series indexing
Time series behaves like any other pandas.Series when you are indexing and selecting data based on label:

In [11]:
# Second index
stamp = ts.index[1]
stamp

Timestamp('2019-04-04 00:00:00')

A string can be interpreted as a date:

In [12]:
ts['2019/04/02']

1

In [13]:
ts['20190402']

1

Easily select a slice of data for a year:

In [14]:
ts['2019']

2019-04-02    1
2019-04-04    2
2019-04-06    3
2019-04-08    4
dtype: int64

Select a slice of data for a year and month:

In [15]:
ts['2019-04']

2019-04-02    1
2019-04-04    2
2019-04-06    3
2019-04-08    4
dtype: int64

Select a slice of data for a year, month and date:

In [16]:
ts['2019-04-04']

2

### Slicing Time Series

In [17]:
ts

2019-04-02    1
2019-04-04    2
2019-04-06    3
2019-04-08    4
dtype: int64

Most time series data is ordered chronologically, you can slice with:
- string date
- timestamps
- datetime

In [18]:
ts['2019-04-01':'2019-04-04']

2019-04-02    1
2019-04-04    2
dtype: int64

Slice a Series between two dates:  

In [19]:
# To truncate is to cut off the end
ts.truncate(after='04/05/2019')


2019-04-02    1
2019-04-04    2
dtype: int64

In [20]:
ts

2019-04-02    1
2019-04-04    2
2019-04-06    3
2019-04-08    4
dtype: int64

### Time Series with Duplicate Indices
You can have multiple data observations falling on a particular timestamp:

In [21]:
dates = pd.DatetimeIndex(['1/1/2000', 
                          '1/2/2000', '1/2/2000', '1/2/2000', 
                          '1/3/2000'])

In [22]:
dup_ts = pd.Series(np.arange(5), index=dates)
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int64

Is index unique?

In [23]:
dup_ts.index.is_unique

False

Indexing this object will now produce either scalar values or slices:

In [24]:
#scalar values
dup_ts['1/3/2000']

4

In [25]:
dup_ts['1/2/2000']

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int64

### Data Aggregation
Suppose you wanted to aggregate the data having non-unique timestamps. One way to do this is to use groupby and pass level=0:

In [26]:
grouped = dup_ts.groupby(level=0) # level=0 is for the first index
grouped.describe()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
2000-01-01,1.0,0.0,,0.0,0.0,0.0,0.0,0.0
2000-01-02,3.0,2.0,1.0,1.0,1.5,2.0,2.5,3.0
2000-01-03,1.0,4.0,,4.0,4.0,4.0,4.0,4.0


In [27]:
grouped.mean()

2000-01-01    0.0
2000-01-02    2.0
2000-01-03    4.0
dtype: float64

In [28]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64