In [1]:
# Styling notebook
from IPython.core.display import HTML
def css_styling():
    styles = open("rise.css", "r").read()
    return HTML(styles)
css_styling()

<div style="font-size:2em; text-align:center; margin-top:30px; margin-bottom:20px">Data Science Academy 7</div>
<hr>
<br>

<div style="font-size:4em; text-align:center; margin-bottom:30px; color:#00746E"><b>Time Series Data</b></div>

Various libraries are available in Python to convert datatypes. The example shown below will be using pandas to <b>convert the datatypes to datetime format</b>. Firstly, import the data from csv file and check the datatype.

In [2]:
import pandas as pd

In [4]:
csv_input = pd.read_csv("../input/sample_data.csv")
csv_input

Unnamed: 0,time,variable,duration
0,1/20/2021,1.11,10
1,2/20/2021,2.22,20
2,3/20/2021,3.33,30
3,4/20/2021,4.44,40
4,5/20/2021,5.55,50
5,6/20/2021,6.66,60
6,7/20/2021,7.77,70
7,8/20/2021,8.88,80
8,9/20/2021,9.99,90
9,10/20/2021,10.1,100


In [5]:
csv_input.dtypes

time         object
variable    float64
duration      int64
dtype: object

As shown above, column 'time' is not recognized as datetime but as object(string/character). Function pd.to_datetime can be used to convert it to datetime format as shown below. Check again the datatype after conversion.

In [6]:
csv_input['time'] = pd.to_datetime(csv_input['time'], format = "%m/%d/%Y")
csv_input

Unnamed: 0,time,variable,duration
0,2021-01-20,1.11,10
1,2021-02-20,2.22,20
2,2021-03-20,3.33,30
3,2021-04-20,4.44,40
4,2021-05-20,5.55,50
5,2021-06-20,6.66,60
6,2021-07-20,7.77,70
7,2021-08-20,8.88,80
8,2021-09-20,9.99,90
9,2021-10-20,10.1,100


In [7]:
csv_input.dtypes

time        datetime64[ns]
variable           float64
duration             int64
dtype: object

Pandas also provide function to convert <b>datatype to duration</b>, using `pd.to_timedelta` and specifying the unit of time to convert. Try it out on column 'duration' and check again the datatype.
More info: type `help(pd.to_datetime`)

In [9]:
# Example
help(pd.to_datetime)

Help on function to_datetime in module pandas.core.tools.datetimes:

to_datetime(arg: Union[~DatetimeScalar, List, Tuple, ~ArrayLike, ForwardRef('Series')], errors: str = 'raise', dayfirst: bool = False, yearfirst: bool = False, utc: Union[bool, NoneType] = None, format: Union[str, NoneType] = None, exact: bool = True, unit: Union[str, NoneType] = None, infer_datetime_format: bool = False, origin='unix', cache: bool = True) -> Union[pandas.core.indexes.datetimes.DatetimeIndex, ForwardRef('Series'), ~DatetimeScalar, ForwardRef('NaTType')]
    Convert argument to datetime.
    
    Parameters
    ----------
    arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
        The object to convert to a datetime.
    errors : {'ignore', 'raise', 'coerce'}, default 'raise'
        - If 'raise', then invalid parsing will raise an exception.
        - If 'coerce', then invalid parsing will be set as NaT.
        - If 'ignore', then invalid parsing will return the i

In [6]:
csv_input['duration'] = pd.to_timedelta(csv_input['duration'], unit="m") # unit 'm' stands for minutes. 
csv_input

Unnamed: 0,time,variable,duration
0,2021-01-20,1.11,00:10:00
1,2021-02-20,2.22,00:20:00
2,2021-03-20,3.33,00:30:00
3,2021-04-20,4.44,00:40:00
4,2021-05-20,5.55,00:50:00
5,2021-06-20,6.66,01:00:00
6,2021-07-20,7.77,01:10:00
7,2021-08-20,8.88,01:20:00
8,2021-09-20,9.99,01:30:00
9,2021-10-20,10.1,01:40:00


In [7]:
csv_input.dtypes

time         datetime64[ns]
variable            float64
duration    timedelta64[ns]
dtype: object

Using the correct datatype helps in managing the data, such as sorting and filtering.

In [8]:
csv_input[(csv_input['time'] > '2021-03-01') & (csv_input['time'] < '2021-10-01')]

Unnamed: 0,time,variable,duration
2,2021-03-20,3.33,00:30:00
3,2021-04-20,4.44,00:40:00
4,2021-05-20,5.55,00:50:00
5,2021-06-20,6.66,01:00:00
6,2021-07-20,7.77,01:10:00
7,2021-08-20,8.88,01:20:00
8,2021-09-20,9.99,01:30:00


Below shows an example of extracting the information from datetime data. Converting to <b>datetime datatype </b> allows the user to extract and use all the components in the data. 

<font color='#00a19d'> !! Do take note that Python are very sensitive to the datatypes, each datatypes has their own <i>functions and characteristics</i></font>. 

In [9]:
user_date = "11 Mar, 2021"
type(user_date)

str

In [10]:
user_date = pd.to_datetime(user_date)
user_date

Timestamp('2021-03-11 00:00:00')

In [11]:
user_date2 = pd.to_datetime("2021-03-09 15:30:20")
duration = user_date - user_date2
duration

Timedelta('1 days 08:29:40')

In [12]:
duration.components.hours

8

In [13]:
print("Duration = " + str(duration.components.days) + " day(s) " + str(duration.components.hours) + " hour(s) " + str(duration.components.minutes) + " minute(s) " + str(duration.components.seconds) + " second(s)")

Duration = 1 day(s) 8 hour(s) 29 minute(s) 40 second(s)


## END