Updated: I initially proposed an API like step_date(..., cyclic = TRUE). I now think that would be confusing because features = "month" would then mean "time of month" whereas it currently means "month of year". Instead, I propose a new function called step_date_cyclic
Feature
When forecasting, cyclical time-trends can be an important predictor. I propose to add a new function caled step_date_cyclic which feature codes POSIXct columns using trigonometric columns as mlr3pipelines::PipeOpDateFeatures(). E.g., for features = "month" it would add two columns: sin(time) and cos(time). This avoids the data reduction inherent in the current categorical feature coding. And since many learners support numerical features, this enables more models to work on forecasting.
Here's one exposition of the rationale behind this feature coding: https://towardsdatascience.com/cyclical-features-encoding-its-about-time-ce23581845ca
R code demo
Say the user has data with some timestamps:
df = data.frame(
timestamps = seq(as.POSIXct("2021-02-01"), as.POSIXct("2021-05-01"), length.out = 300)
)
Calling step_date_cyclic(..., features = "month"), would add two new columns:
timestamps_secs = as.numeric(df$timestamps)
secs_per_month = 30 * 24 * 60 * 60
df$timestamps_month_sin = sin(timestamps_secs * 2 * pi / secs_per_month)
df$timestamps_month_cos = cos(timestamps_secs * 2 * pi / secs_per_month)
Visually:
plot(feature_month_sin ~ timestamps, df)
points(feature_month_cos ~ timestamps, df, col = "red")

Jointly, these two columns uniquely identifies each time of month:
plot(feature_month_cos ~ feature_month_sin, df)

Postscript
For each features, two columns would be added. For each feature, just use secs_per_{feature}, and the rest should be the same.
Updated: I initially proposed an API like
step_date(..., cyclic = TRUE). I now think that would be confusing becausefeatures = "month"would then mean "time of month" whereas it currently means "month of year". Instead, I propose a new function calledstep_date_cyclicFeature
When forecasting, cyclical time-trends can be an important predictor. I propose to add a new function caled
step_date_cyclicwhich feature codesPOSIXctcolumns using trigonometric columns asmlr3pipelines::PipeOpDateFeatures(). E.g., forfeatures = "month"it would add two columns:sin(time)andcos(time). This avoids the data reduction inherent in the current categorical feature coding. And since many learners support numerical features, this enables more models to work on forecasting.Here's one exposition of the rationale behind this feature coding: https://towardsdatascience.com/cyclical-features-encoding-its-about-time-ce23581845ca
R code demo
Say the user has data with some timestamps:
Calling
step_date_cyclic(..., features = "month"), would add two new columns:Visually:
Jointly, these two columns uniquely identifies each time of month:
Postscript
For each
features, two columns would be added. For each feature, just usesecs_per_{feature}, and the rest should be the same.