Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Number of folds not reported correctly when passing ExpandingWindowSplitter #3011

Open
2 of 3 tasks
ngupta23 opened this issue Oct 2, 2022 · 2 comments · May be fixed by #4104
Open
2 of 3 tasks

[BUG]: Number of folds not reported correctly when passing ExpandingWindowSplitter #3011

ngupta23 opened this issue Oct 2, 2022 · 2 comments · May be fixed by #4104
Assignees
Labels
bug Something isn't working good first issue Good for newcomers time_series Topics related to the time series
Milestone

Comments

@ngupta23
Copy link
Collaborator

ngupta23 commented Oct 2, 2022

pycaret version checks

Issue Description

The number of folds is not shown correctly when passing an sktime ExpandingWindowSplitter to the fold parameter.

Reproducible Example

# !pip install pycaret==3.0.0rc3

import numpy as np
from pycaret.time_series import TSForecastingExperiment
from pycaret.datasets import get_data
from sktime.forecasting.model_selection import ExpandingWindowSplitter

y = get_data(114, folder="time_series/seasonal", verbose=False)
cv = ExpandingWindowSplitter(fh=np.arange(1, 13), initial_window=24, step_length=4)
exp = TSForecastingExperiment()
exp.setup(y, fh=12, fold=cv)

Expected Behavior

Expected: "Fold Number" should list the number of folds in the CV step, but it just lists the ExpandingWindowSplitter object

Actual results:
image

Actual Results

Shown above.

Installed Versions

System: python: 3.7.14 (default, Sep 8 2022, 00:06:44) [GCC 7.5.0] executable: /usr/bin/python3 machine: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic

PyCaret required dependencies:
pip: 21.1.3
setuptools: 57.4.0
pycaret: 3.0.0.rc3
IPython: 7.9.0
ipywidgets: 7.7.1
tqdm: 4.64.1
numpy: 1.21.6
pandas: 1.3.5
jinja2: 2.11.3
scipy: 1.7.3
joblib: 1.1.0
sklearn: 1.0.2
pyod: Installed but version unavailable
imblearn: 0.8.1
category_encoders: 2.5.0
lightgbm: 3.3.2
numba: 0.55.2
requests: 2.28.1
matplotlib: 3.5.3
scikitplot: 0.3.7
yellowbrick: 1.5
plotly: 5.5.0
kaleido: 0.2.1
statsmodels: 0.13.2
sktime: 0.11.4
tbats: Installed but version unavailable
pmdarima: 2.0.1
psutil: 5.9.2

@ngupta23 ngupta23 added bug Something isn't working time_series Topics related to the time series labels Oct 2, 2022
@ngupta23 ngupta23 added this to the pycaret 3.0.0 milestone Oct 2, 2022
@ngupta23 ngupta23 added the good first issue Good for newcomers label Oct 2, 2022
@ngupta23 ngupta23 modified the milestones: pycaret 3.0.0, pycaret 3.1.0 Nov 9, 2022
@ngupta23 ngupta23 modified the milestones: 3.1.2, 3.2.1 Feb 2, 2024
@mhsizar
Copy link

mhsizar commented Dec 2, 2024

Hi @ngupta23,

I was looking into this, and initially, I thought the _set_fold_generator function in the oop.py file under time_series/forecasting directory was not handling fold number calculation properly:

def _set_fold_generator(self) -> "TSForecastingExperiment":
    """Sets up the cross-validation fold generator for the training dataset."""
    possible_time_series_fold_strategies = ["expanding", "sliding", "rolling"]

    if not (
        self.fold_strategy in possible_time_series_fold_strategies
        or is_sklearn_cv_generator(self.fold_strategy)
    ):
        raise TypeError(
            "fold_strategy parameter must be either a sktime compatible CV generator "
            f"object or one of '{', '.join(possible_time_series_fold_strategies)}'."
        )

    if self.fold_strategy in possible_time_series_fold_strategies:
        # Number of folds
        self.fold_param = self.fold
        self.fold_generator = self.get_fold_generator(fold=self.fold_param)
    else:
        self.fold_generator = self.fold_strategy
        self.fold_param = self.fold_generator.get_n_splits(y=self.y_train)

    return self

So, I modified the second if block to calculate the fold count for CV objects like ExpandingWindowSplitter correctly:

    if self.fold_strategy in possible_time_series_fold_strategies:
        # Calculate the fold_param based on splits
        self.fold_generator = self.get_fold_generator(fold=self.fold)
        self.fold_param = (
            self.fold_generator.get_n_splits(y=self.y_train)
            if hasattr(self.fold_generator, 'get_n_splits') else None  
        )

This fixed the output. So running the same code as you, I got this:

image

Now the fold number is showing up correctly. I was about to make a pull request with the changes I have made, however, at this point, I started looking at the documentation of Pycaret's Time Series libraries. And I found out that the fold parameter should only get int type argument, and the CV objects should be passed to the fold_strategy object instead!

So, your code should be modified to this:

# !pip install pycaret==3.0.0rc3

import numpy as np
from pycaret.time_series import TSForecastingExperiment
from pycaret.datasets import get_data
from sktime.forecasting.model_selection import ExpandingWindowSplitter

y = get_data(114, folder="time_series/seasonal", verbose=False)
cv = ExpandingWindowSplitter(fh=np.arange(1, 13), initial_window=24, step_length=4)
exp = TSForecastingExperiment()
exp.setup(y, fh=12, fold_strategy=cv)

This should return the output as expected!

@mhsizar
Copy link

mhsizar commented Dec 2, 2024

However, since passing a CV object to the fold parameter still runs the code without throwing an error and returning the object as the Fold Number in output, I believe we should raise a TypeError when any type of argument except int is passed into the fold parameter. So I have modified the setup() function in pycaret/time_series/forecasting/oop.py and inserted the following code block:

# Validate fold
        if not isinstance(fold, int):
            raise TypeError(
                f"The 'fold' parameter must be an integer. You provided: {type(fold).__name__}. "
                "If you intended to use a custom cross-validation object such as SlidingWindowSplitter or "
                "ExpandingWindowSplitter, please pass it to the 'fold_strategy' parameter instead. "
                "The 'fold' parameter is ignored when 'fold_strategy' is a custom CV object."
            )

I am making a pull request. Let me know if you have any suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers time_series Topics related to the time series
Projects
None yet
2 participants