# Time Series
> The date of different format can be standardized.    
> ```datetime``` and ```dateutil``` are important Python modules that converts **string** into datetime object.  

In [1]:
from datetime import datetime
d1 = datetime(year=2019, month=7, day=4)

print(type(d1))
print('----------------')

print(d1)
print('----------------')

print(d1.day)

<class 'datetime.datetime'>
----------------
2019-07-04 00:00:00
----------------
4


In [2]:
from dateutil import parser
d2 = parser.parse("4th of July, 2019")

print(type(d2))
print('----------------')

print(d2)
print('----------------')

print(d2.month)

<class 'datetime.datetime'>
----------------
2019-07-04 00:00:00
----------------
7


## More Time Formatting/Current date and time
> ```%Y```, ```%m```, ```%d``` etc. are format codes.  
> **strftime()** takes one or more format codes as an argument and returns a formatted string based on it.  

In [3]:
from datetime import datetime
now = datetime.now() 
now

datetime.datetime(2019, 10, 21, 16, 15, 8, 159858)

In [4]:
year = now.strftime("%Y")
print("year:", year)
print('----------------')

month = now.strftime("%m")
print("month:", month)
print('----------------')

day = now.strftime("%d")
print("day:", day)
print('----------------')

time = now.strftime("%H:%M:%S")
print("time:", time)
print('----------------')

date_time = now.strftime("%m/%d/%Y, %H:%M:%S")
print("date and time:",date_time)

year: 2019
----------------
month: 10
----------------
day: 21
----------------
time: 16:15:08
----------------
date and time: 10/21/2019, 16:15:08


## Convert Date into Numpy
> ```np.datetime``` is one of ```dtype```.  
> **np.datetime64()** is a method itself that converts strings to *np.datetime64 dtype*.  
> $+1$, $+2$, ..., $+p$ adds 1 day, 2 days, .... p days to Numpy date.   
> **Reference**: https://www.programiz.com/python-programming/datetime/strftime

In [5]:
import numpy as np
date = np.array('2019-07-04', dtype=np.datetime64)
date

array('2019-07-04', dtype='datetime64[D]')

In [6]:
date + np.arange(12)

array(['2019-07-04', '2019-07-05', '2019-07-06', '2019-07-07',
       '2019-07-08', '2019-07-09', '2019-07-10', '2019-07-11',
       '2019-07-12', '2019-07-13', '2019-07-14', '2019-07-15'],
      dtype='datetime64[D]')

In [7]:
np.datetime64('2019-07-04')

numpy.datetime64('2019-07-04')

In [8]:
np.datetime64('2019-07-04 23:00')

numpy.datetime64('2019-07-04T23:00')

## Convert Date to Pandas
> **pd.to_datetime** converts strings, list of various formats to **Timestamp**.   
> A series of dates can be generated as well using **pd.to_timedelta**.  

In [9]:
import pandas as pd

data = pd.read_csv("todatetime.csv") 
data.head()

Unnamed: 0,Date,Time
0,8/6/1993,12:42 PM
1,3/31/1996,6:53 AM
2,4/23/1993,11:17 AM
3,3/4/2005,1:00 PM
4,1/24/1998,4:47 PM


In [10]:
# Overwriting data with new format 
data['Date']= pd.to_datetime(data['Date']) 
data.head()

Unnamed: 0,Date,Time
0,1993-08-06,12:42 PM
1,1996-03-31,6:53 AM
2,1993-04-23,11:17 AM
3,2005-03-04,1:00 PM
4,1998-01-24,4:47 PM


In [11]:
data.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
Date    1000 non-null datetime64[ns]
Time    1000 non-null object
dtypes: datetime64[ns](1), object(1)
memory usage: 15.7+ KB


In [12]:
date = pd.to_datetime("1st of July, 2019")
type(date)

pandas._libs.tslibs.timestamps.Timestamp

In [13]:
date.strftime('%A')

'Monday'

In [14]:
date + pd.to_timedelta(arg=np.arange(12), unit='D')

DatetimeIndex(['2019-07-01', '2019-07-02', '2019-07-03', '2019-07-04',
               '2019-07-05', '2019-07-06', '2019-07-07', '2019-07-08',
               '2019-07-09', '2019-07-10', '2019-07-11', '2019-07-12'],
              dtype='datetime64[ns]', freq=None)

In [15]:
dates = pd.to_datetime([datetime(2019, 7, 3), '4th of July, 2019',
                        '2019-Jul-6', '07-07-2019', '20190708'])
print( dates )
print('----------------')

print( dates.strftime('%d') )
print('----------------')

print( dates[0] )

DatetimeIndex(['2019-07-03', '2019-07-04', '2019-07-06', '2019-07-07',
               '2019-07-08'],
              dtype='datetime64[ns]', freq=None)
----------------
Index(['03', '04', '06', '07', '08'], dtype='object')
----------------
2019-07-03 00:00:00


In [16]:
d3 = dates.to_period('D')

print( d3 )
print('----------------')

print( d3[0] )

PeriodIndex(['2019-07-03', '2019-07-04', '2019-07-06', '2019-07-07',
             '2019-07-08'],
            dtype='period[D]', freq='D')
----------------
2019-07-03


In [17]:
dates - dates[0]

TimedeltaIndex(['0 days', '1 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=None)

## Activity 1
> The **year, month, day, hour, minute, and second** are specified.  
> Use **datetime()** to do the followings:  
> Creare a variable **my_date** that reports **year, month, day**.  
> Creare a variable **my_date_time** that reports  **year, month, day, hour, minute, second**.   
> Grab the **month** from **my_date**.   
> Grab the **hour** from **my_date_time**.   

In [18]:
my_year = 2019
my_month = 2
my_day = 1
my_hour = 22
my_minute = 30
my_second = 15

In [1]:
from datetime import datetime

my_year = 2019
my_month = 2
my_day = 1
my_hour = 22
my_minute = 30
my_second = 15

my_date = datetime(my_year,my_month,my_day)
print(my_date)
print(my_date.month) 

my_date_time = datetime(my_year,my_month,my_day,my_hour,my_minute,my_second)
print(my_date_time)
print(my_date_time.hour)
print(my_date_time.day)

2019-02-01 00:00:00
2
2019-02-01 22:30:15
22
1


## DatetimeIndex as Pandas Index
> Pandas **DatetimeIndex()** convert list of dates into *date time index*.    
> Slicing can be based on part of the **Date**.  

In [19]:
index = pd.DatetimeIndex(['2018-07-04', '2018-08-04',
                          '2019-07-04', '2019-08-04'])
data  = pd.Series([10.1, 10.5, 10.2, 10.6], index=index)

print(data)
print('----------------')
print(data.index)

2018-07-04    10.1
2018-08-04    10.5
2019-07-04    10.2
2019-08-04    10.6
dtype: float64
----------------
DatetimeIndex(['2018-07-04', '2018-08-04', '2019-07-04', '2019-08-04'], dtype='datetime64[ns]', freq=None)


### Slicing date time index (Explicit Indexing)

In [20]:
data['2018-07-04':'2019-07-04']

2018-07-04    10.1
2018-08-04    10.5
2019-07-04    10.2
dtype: float64

In [21]:
data['2019']

2019-07-04    10.2
2019-08-04    10.6
dtype: float64

### Functions on Date Time Index
> **argmax()** and **argmin()** returns the date where the values are the latest, earliest date.  
> The max, min values can be called out using **max()** and **min()**.  

In [22]:
# Latest Date
print(data.index.argmax())
print('----------------')

print(data.index.max())
print('----------------')

# Earliest Date
print(data.index.argmin())
print('----------------')

print(data.index.min())

3
----------------
2019-08-04 00:00:00
----------------
0
----------------
2018-07-04 00:00:00


## Activity 2
> Create a date variable **ind** that lists 10 days from 2019-09-01 using **to_timedelta()**.  
> Convert **ind** into date time index and drop the HMS using **to_period** in the same line.  
> Generate 2 columns with 10 numbers that are $U(0,20)$ using **rand()**.   
> Combine the values using **ind** as index, and columns ```['A','B']```.  
> Print a statement: ```'On 2019-09-01, the value is 2019-09-01'```.  

In [2]:
import numpy as np
import pandas as pd
from datetime import datetime

ind = datetime(2019, 9, 1) + pd.to_timedelta(arg=np.arange(10), unit='D')
ind = pd.DatetimeIndex(ind).to_period('D')

data = 10*np.random.rand(10,2)
cols = ['A','B']

df = pd.DataFrame(data=data,index=ind,columns=cols)
df

Unnamed: 0,A,B
2019-09-01,2.528834,0.553407
2019-09-02,6.215127,9.287794
2019-09-03,6.058875,8.816503
2019-09-04,0.523226,0.808427
2019-09-05,0.441774,5.828309
2019-09-06,5.717486,5.651766
2019-09-07,6.132311,9.463576
2019-09-08,5.966923,7.203906
2019-09-09,3.583831,8.798685
2019-09-10,7.625294,7.957557


In [3]:
'On {}, the value is {}'.format(df.index[df.index.argmin()], df.index.min())

'On 2019-09-01, the value is 2019-09-01'

## Equal Time Intervals
> **date_range(start,end)**.   
> **date_range(start,periods,freq)** produces number of periods at freq.  
> **period_range(start,periods,freq)** is similar.  
> **timedelta_range** 

In [23]:
pd.date_range('2019-07-01', '2019-07-14')

DatetimeIndex(['2019-07-01', '2019-07-02', '2019-07-03', '2019-07-04',
               '2019-07-05', '2019-07-06', '2019-07-07', '2019-07-08',
               '2019-07-09', '2019-07-10', '2019-07-11', '2019-07-12',
               '2019-07-13', '2019-07-14'],
              dtype='datetime64[ns]', freq='D')

In [24]:
pd.date_range('2019-07-01', periods=14)

DatetimeIndex(['2019-07-01', '2019-07-02', '2019-07-03', '2019-07-04',
               '2019-07-05', '2019-07-06', '2019-07-07', '2019-07-08',
               '2019-07-09', '2019-07-10', '2019-07-11', '2019-07-12',
               '2019-07-13', '2019-07-14'],
              dtype='datetime64[ns]', freq='D')

In [25]:
pd.date_range('2019-07-01', periods=14, freq='H')

DatetimeIndex(['2019-07-01 00:00:00', '2019-07-01 01:00:00',
               '2019-07-01 02:00:00', '2019-07-01 03:00:00',
               '2019-07-01 04:00:00', '2019-07-01 05:00:00',
               '2019-07-01 06:00:00', '2019-07-01 07:00:00',
               '2019-07-01 08:00:00', '2019-07-01 09:00:00',
               '2019-07-01 10:00:00', '2019-07-01 11:00:00',
               '2019-07-01 12:00:00', '2019-07-01 13:00:00'],
              dtype='datetime64[ns]', freq='H')

In [26]:
pd.period_range('2018-9', periods=12, freq='M')

PeriodIndex(['2018-09', '2018-10', '2018-11', '2018-12', '2019-01', '2019-02',
             '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08'],
            dtype='period[M]', freq='M')

In [27]:
pd.timedelta_range(0, periods=12, freq='H')

TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00',
                '05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00',
                '10:00:00', '11:00:00'],
               dtype='timedelta64[ns]', freq='H')

In [28]:
pd.timedelta_range(0, periods=6, freq="2H30T")

TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
                '12:30:00'],
               dtype='timedelta64[ns]', freq='150T')

In [29]:
from pandas.tseries.offsets import BDay
pd.date_range('2019-07-01', periods=5, freq=BDay())

DatetimeIndex(['2019-07-01', '2019-07-02', '2019-07-03', '2019-07-04',
               '2019-07-05'],
              dtype='datetime64[ns]', freq='B')