# Periodic features

Periodic features are those that repeat their values at regular intervals, like the hour, the days of a week, and the months of a year.

With cyclical or periodic features, values that are very different in absolute magnitude are actually close. For example, January is close to December, even though their absolute magnitude suggests otherwise.

We can use periodic functions like sine and cosine, to transform cyclical features and help machine learning models  pick up their intrinsic nature.

#  Pollutants

Let's work with the air quality dataset that we created in the notebook **pollutants** which you can find in the folder **02-Datasets** in this repository.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# to automate many of our engineering processes
from feature_engine.creation import CyclicalFeatures
from feature_engine.datetime import DatetimeFeatures

from sklearn.pipeline import Pipeline

In [3]:
filename = '../../AirQualityUCI_ready.csv'

data = pd.read_csv(filename)

data.head()

Unnamed: 0,Date_Time,CO_sensor
0,2004-10-03 18:00:00,1360.0
1,2004-10-03 19:00:00,1292.0
2,2004-10-03 20:00:00,1402.0
3,2004-10-03 21:00:00,1376.0
4,2004-10-03 22:00:00,1272.0


In [4]:
# Cast date variable in datetime format.
data['Date_Time'] = pd.to_datetime(data['Date_Time'])

## Add temporal features

We will extract year, month and hour from the index.

In [5]:
# Get datetime features from the datetime variable
# and apply periodic transformation.

pipe = Pipeline([
    
    # create datetime features.
    ('datetime', DatetimeFeatures(
        variables="Date_Time",
        features_to_extract=["month", "hour"],
        drop_original=True,
    )),

    # apply sine and cosine transformation.
    ('cyclical', CyclicalFeatures(
        variables=["Date_Time_month", "Date_Time_hour"],
    )),
])

In [6]:
# Extract the features.

data = pipe.fit_transform(data)

data.head()

Unnamed: 0,CO_sensor,Date_Time_month,Date_Time_hour,Date_Time_month_sin,Date_Time_month_cos,Date_Time_hour_sin,Date_Time_hour_cos
0,1360.0,10,18,-0.866025,0.5,-0.979084,0.203456
1,1292.0,10,19,-0.866025,0.5,-0.887885,0.460065
2,1402.0,10,20,-0.866025,0.5,-0.730836,0.682553
3,1376.0,10,21,-0.866025,0.5,-0.519584,0.854419
4,1272.0,10,22,-0.866025,0.5,-0.269797,0.962917


# Compare model performance

Now, let's compare the performance of a model trained with the raw features or the trigonometric transformed features.

**NOTE**: I am going to do a quick and dirty job to prove my point. Keep in mind that we should split the data into a train and a test set or use cross-validation to have an accurate measure of the model performance.

In [7]:
from sklearn.linear_model import Lasso

In [8]:
# Lasso regression.

reg = Lasso(random_state=10)

In [9]:
# Fit Lasso to the raw inputs.

reg.fit(data[["Date_Time_month", "Date_Time_hour"]], data["CO_sensor"])

# Get the R2
reg.score(data[["Date_Time_month", "Date_Time_hour"]], data["CO_sensor"])

0.10359654878036195

In [10]:
# Capture the trigonometrically transformed variables in a list.

trig_vars = [var for var in data.columns if 'sin' in var or 'cos' in var]

trig_vars

['Date_Time_month_sin',
 'Date_Time_month_cos',
 'Date_Time_hour_sin',
 'Date_Time_hour_cos']

In [11]:
# Fit Lasso with transformed inputs.

reg.fit(data[trig_vars], data["CO_sensor"])

# Get the R2.
reg.score(data[trig_vars], data["CO_sensor"])

0.13263074168408584

As we see, with the transformed features we improved the performance of the model by roughly 30%.

More details on working with temporal features in the [Scikit-learn docs](https://scikit-learn.org/stable/auto_examples/applications/plot_cyclical_feature_engineering.html#trigonometric-features)