<h2 align='center'> Time Series with Pandas</h2>

<h3> In this program we will be working on Date Time Index (time stamped data) </h3>

We will see numpy datetime arrays and date ranges as well as datetime index and analysis tools with pandas

In [2]:
# Import libraries
from datetime import datetime
import numpy as np
import pandas as pd

In [3]:
my_year = 2020
my_month = 1 #Jan
my_day = 2
my_hour = 13 #24 hour format
my_min = 30
my_sec = 15

In [4]:
# datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

my_date = datetime(my_year,my_month,my_day)
my_date

datetime.datetime(2020, 1, 2, 0, 0)

So if we don't mention seconds, hours and minutes, python automatically set them as zero

In [5]:
my_date_time = datetime(my_year,my_month,my_day,my_hour,my_min,my_sec)
my_date_time

datetime.datetime(2020, 1, 2, 13, 30, 15)

<h4> Attributes </h4>

In [6]:
print('Day is: {}'.format(my_date_time.day))
print('Hour is: {}'.format(my_date_time.hour))

Day is: 2
Hour is: 13


As Numpy handles datetime more efficiently than pythond datetime format, let us analyze numpy format (datetime64)

In [7]:
type(my_date_time)

datetime.datetime

<h4> Numpy arrays </h4>

In [8]:
np.array(['2020-03-15','2020-03-16','2020-03-17']) #as normal strings

array(['2020-03-15', '2020-03-16', '2020-03-17'], dtype='<U10')

In [9]:
np.array(['2020-03-15','2020-03-16','2020-03-17'],dtype='datetime64') # Now datetime objects

array(['2020-03-15', '2020-03-16', '2020-03-17'], dtype='datetime64[D]')

[D] means here numpy ahs applied day level date precision and we change to year,month,hour,min,sec level precision as well.

In [10]:
np.array(['2020-03-15','2020-03-16','2020-03-17'],dtype='datetime64[Y]')

array(['2020', '2020', '2020'], dtype='datetime64[Y]')

In [11]:
np.array(['2020-03-15','2020-03-16','2020-03-17'],dtype='datetime64[M]')

array(['2020-03', '2020-03', '2020-03'], dtype='datetime64[M]')

In [12]:
np.array(['2020-03-15','2020-03-16','2020-03-17'],dtype='datetime64[h]')

array(['2020-03-15T00', '2020-03-16T00', '2020-03-17T00'],
      dtype='datetime64[h]')

<h4> Numpy generating dates </h4>

In [13]:
np.arange(0,10,2) #to produce an array of numbers between 0-10 with a step of 2

array([0, 2, 4, 6, 8])

In [14]:
# to generate dates from 1 June 2018 to 23 June 2018 with step size of 7 days
np.arange('2018-06-01','2018-06-23',7,dtype='datetime64[D]')

array(['2018-06-01', '2018-06-08', '2018-06-15', '2018-06-22'],
      dtype='datetime64[D]')

In [15]:
np.arange('1968','1976',dtype='datetime64[Y]') #by deafult 1 year step size

array(['1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975'],
      dtype='datetime64[Y]')

<h4> Datetime using pandas </h4>

In [16]:
pd.date_range(start='2020-01-01',periods=7,freq='D') #D stands for day

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07'],
              dtype='datetime64[ns]', freq='D')

pandas has specialized DateTimeIndex and have nano level precision. Pandas is really good in inferring these string codes

In [17]:
pd.date_range(start='Jan 01, 2018',periods=7,freq='D')

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07'],
              dtype='datetime64[ns]', freq='D')

<h4> Specific datetime format using pandas </h4>
    - to_datetime (by deafult it gives YYYY-MM-DD and we need to use format parameter (default=None)
    -

In [18]:
pd.to_datetime(['1/2/2018','Jan 03, 2018']) #1/2/2018 American version of date - MM/DD/YYYY - Output = 2Jan and 3Jan 2018

DatetimeIndex(['2018-01-02', '2018-01-03'], dtype='datetime64[ns]', freq=None)

In [19]:
pd.to_datetime(['2/1/2018','3/1/2018']) #2/1/2018 European version of date - DD/MM/YYYY - Output = 1Feb and 1March 2018

DatetimeIndex(['2018-02-01', '2018-03-01'], dtype='datetime64[ns]', freq=None)

In [20]:
pd.to_datetime(['2/1/2018','3/1/2018'],format='%d/%m/%Y') # - Output = 2Jan and 3Jan 2018

DatetimeIndex(['2018-01-02', '2018-01-03'], dtype='datetime64[ns]', freq=None)

In [21]:
pd.to_datetime(['2--1--2018','3--1--2018'],format='%d--%m--%Y')

DatetimeIndex(['2018-01-02', '2018-01-03'], dtype='datetime64[ns]', freq=None)

<h4> Datetime analysis pandas </h4>

In [22]:
data = np.random.randn(3,2)
cols = ['A','B']
print(data)

[[ 0.6214551  -1.36149821]
 [-0.50090468  0.23182034]
 [ 0.1677308  -2.72586462]]


In [23]:
idx = pd.date_range('2020-01-01',periods=3,freq='D') # 3 days in total for each day for each row of the data

df = pd.DataFrame(data,index=idx,columns=cols)
df

Unnamed: 0,A,B
2020-01-01,0.621455,-1.361498
2020-01-02,-0.500905,0.23182
2020-01-03,0.167731,-2.725865


In [24]:
df.index

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq='D')

In [25]:
print('Latest Date: {}'.format(df.index.max()))# most recent date
print('Location of Latest Date: {}'.format(df.index.argmax())) #argmax for latest date location

Latest Date: 2020-01-03 00:00:00
Location of Latest Date: 2


In [26]:
print('Oldest Date: {}'.format(df.index.min()))# most recent date
print('Location of Oldest Date: {}'.format(df.index.argmin())) #argmax for latest date location

Oldest Date: 2020-01-01 00:00:00
Location of Oldest Date: 0
