# Time features from the datetime variable

Time series data are, by definition, time-indexed. The "time" component has information about the date and time. We can extract a number of features from the time component of the index.

In this notebook, we will see how we can easily derive many time-related features.


## Features from the time part:

Below are some of the features that we can extract off-the-shelf using [pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#time-date-components):

- pandas.Series.dt.hour
- pandas.Series.dt.minute
- pandas.Series.dt.second
- pandas.Series.dt.microsecond
- pandas.Series.dt.nanosecond


## The dataset

We will use the Online Retail II Data Set available in the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/machine-learning-databases/00502/).

Download the xlsx file from the link above and save it in the **Datasets** folder within this repo.

**Citation**:

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

## In this demo

We will extract different time-related features from the datetime variable: **InvoiceDate**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Load the data

In [2]:
# File path:
file = "../Datasets/online_retail_II.xlsx"

# The data is provided as two sheets in a single Excel file.
# Each sheet contains a different time period.
# Load both and join them into a single dataframe
# as shown below:

df_1 = pd.read_excel(file, sheet_name="Year 2009-2010")
df_2 = pd.read_excel(file, sheet_name="Year 2010-2011")

data = pd.concat([df_1, df_2])

print(data.shape)

data.head()

(1067371, 8)


Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom


In this dataset, we have the datetime variable in a column called InvoiceDate. We could also have it in the dataframe index. The procedure for extracting the date and time features is identical. That is, we would use the methods from pandas dt as shown below.

The dataset contains sales information for different customers in different countries. Customers may have made one or multiple purchases from the business that provided the data.

## Variable format

In [3]:
# Let's determine the type of data in the datetime variable.

data["InvoiceDate"].dtypes

dtype('<M8[ns]')

In this dataset, the variable is already parsed as datetime data.

In some datasets, the datetime variable may be cast as an object, i.e., strings. In these cases, before carrying on with the rest of the notebook, we should re-cast it from object into datetime, as we do in the following cell:

In [4]:
# This is how we parse date strings into datetime format.

data["date"] = pd.to_datetime(data["InvoiceDate"])

data[["date", "InvoiceDate"]].head()

Unnamed: 0,date,InvoiceDate
0,2009-12-01 07:45:00,2009-12-01 07:45:00
1,2009-12-01 07:45:00,2009-12-01 07:45:00
2,2009-12-01 07:45:00,2009-12-01 07:45:00
3,2009-12-01 07:45:00,2009-12-01 07:45:00
4,2009-12-01 07:45:00,2009-12-01 07:45:00


## Extract the time part

In [5]:
# Extract time part.

# (We would normally not use this as a predictive feature,
# but it might be handy for data analysis).

data["time_part"] = data["date"].dt.time

data["time_part"].head()

0    07:45:00
1    07:45:00
2    07:45:00
3    07:45:00
4    07:45:00
Name: time_part, dtype: object

### Extract the hr, minute and second

In [6]:
data["hour"] = data["date"].dt.hour
data["min"] = data["date"].dt.minute
data["sec"] = data["date"].dt.second

# We do not have micro and nano seconds in this dataset,
# but if we did, we can extract them as follows:

data["microsec"] = data["date"].dt.microsecond
data["nanosec"] = data["date"].dt.nanosecond

data.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country,date,time_part,hour,min,sec,microsec,nanosec
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0


### Extract hr, min, sec, at the same time

In [7]:
# Now, let's repeat what we did in the previous cell in 1 command.

data[["h", "m", "s"]] = pd.DataFrame(
    [(x.hour, x.minute, x.second) for x in data["date"]]
)

data.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country,date,time_part,hour,min,sec,microsec,nanosec,h,m,s
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0,7,45,0
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0,7,45,0
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0,7,45,0
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0,7,45,0
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom,2009-12-01 07:45:00,07:45:00,7,45,0,0,0,7,45,0


## Work with different timezones

In the next few cells, we will see how to work with timestamps that are in different time zones.

In [8]:
# First, let's create a toy dataframe with some timestamps in different time zones.

df = pd.DataFrame()

df["time"] = pd.concat(
    [
        pd.Series(
            pd.date_range(
                start="2014-08-01 09:00", freq="H", periods=3, tz="Europe/Berlin"
            )
        ),
        pd.Series(
            pd.date_range(
                start="2014-08-01 09:00", freq="H", periods=3, tz="US/Central"
            )
        ),
    ],
    axis=0,
)

df

Unnamed: 0,time
0,2014-08-01 09:00:00+02:00
1,2014-08-01 10:00:00+02:00
2,2014-08-01 11:00:00+02:00
0,2014-08-01 09:00:00-05:00
1,2014-08-01 10:00:00-05:00
2,2014-08-01 11:00:00-05:00


We can see the different timezones indicated by the +2 and -5, with respect to the central meridian.

In [9]:
# To work with different time zones, first we unify the
# timezone to the central one by setting utc = True.

df["time_utc"] = pd.to_datetime(df["time"], utc=True)

# Next, we change all timestamps to the desired timezone,
# e.g., Europe/London, as in this example.

df["time_london"] = df["time_utc"].dt.tz_convert("Europe/London")


df

Unnamed: 0,time,time_utc,time_london
0,2014-08-01 09:00:00+02:00,2014-08-01 07:00:00+00:00,2014-08-01 08:00:00+01:00
1,2014-08-01 10:00:00+02:00,2014-08-01 08:00:00+00:00,2014-08-01 09:00:00+01:00
2,2014-08-01 11:00:00+02:00,2014-08-01 09:00:00+00:00,2014-08-01 10:00:00+01:00
0,2014-08-01 09:00:00-05:00,2014-08-01 14:00:00+00:00,2014-08-01 15:00:00+01:00
1,2014-08-01 10:00:00-05:00,2014-08-01 15:00:00+00:00,2014-08-01 16:00:00+01:00
2,2014-08-01 11:00:00-05:00,2014-08-01 16:00:00+00:00,2014-08-01 17:00:00+01:00


Whether to unify the timezone depends on the use case. If we are forecasting sales for different countries, perhaps it is better to keep each country's respective time zone, since we will treat those series independently.

If we have a small company that sells mostly inland and occasionally sells something abroad, we probably have the local timezone already, but if we do not, we may want to localize the time stamp to our time zone.