[BUG]: Number of folds not reported correctly when passing ExpandingWindowSplitter #3011

ngupta23 · 2022-10-02T15:21:47Z

pycaret version checks

I have checked that this issue has not already been reported here.
I have confirmed this bug exists on the latest version of pycaret.
I have confirmed this bug exists on the master branch of pycaret (pip install -U git+https://github.com/pycaret/pycaret.git@master).

Issue Description

The number of folds is not shown correctly when passing an sktime ExpandingWindowSplitter to the fold parameter.

Reproducible Example

# !pip install pycaret==3.0.0rc3

import numpy as np
from pycaret.time_series import TSForecastingExperiment
from pycaret.datasets import get_data
from sktime.forecasting.model_selection import ExpandingWindowSplitter

y = get_data(114, folder="time_series/seasonal", verbose=False)
cv = ExpandingWindowSplitter(fh=np.arange(1, 13), initial_window=24, step_length=4)
exp = TSForecastingExperiment()
exp.setup(y, fh=12, fold=cv)

Expected Behavior

Expected: "Fold Number" should list the number of folds in the CV step, but it just lists the ExpandingWindowSplitter object

Actual results:

Actual Results

Shown above.

Installed Versions

System: python: 3.7.14 (default, Sep 8 2022, 00:06:44) [GCC 7.5.0] executable: /usr/bin/python3 machine: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic

PyCaret required dependencies:
pip: 21.1.3
setuptools: 57.4.0
pycaret: 3.0.0.rc3
IPython: 7.9.0
ipywidgets: 7.7.1
tqdm: 4.64.1
numpy: 1.21.6
pandas: 1.3.5
jinja2: 2.11.3
scipy: 1.7.3
joblib: 1.1.0
sklearn: 1.0.2
pyod: Installed but version unavailable
imblearn: 0.8.1
category_encoders: 2.5.0
lightgbm: 3.3.2
numba: 0.55.2
requests: 2.28.1
matplotlib: 3.5.3
scikitplot: 0.3.7
yellowbrick: 1.5
plotly: 5.5.0
kaleido: 0.2.1
statsmodels: 0.13.2
sktime: 0.11.4
tbats: Installed but version unavailable
pmdarima: 2.0.1
psutil: 5.9.2

The text was updated successfully, but these errors were encountered:

mhsizar · 2024-12-02T16:18:48Z

Hi @ngupta23,

I was looking into this, and initially, I thought the _set_fold_generator function in the oop.py file under time_series/forecasting directory was not handling fold number calculation properly:

def _set_fold_generator(self) -> "TSForecastingExperiment":
    """Sets up the cross-validation fold generator for the training dataset."""
    possible_time_series_fold_strategies = ["expanding", "sliding", "rolling"]

    if not (
        self.fold_strategy in possible_time_series_fold_strategies
        or is_sklearn_cv_generator(self.fold_strategy)
    ):
        raise TypeError(
            "fold_strategy parameter must be either a sktime compatible CV generator "
            f"object or one of '{', '.join(possible_time_series_fold_strategies)}'."
        )

    if self.fold_strategy in possible_time_series_fold_strategies:
        # Number of folds
        self.fold_param = self.fold
        self.fold_generator = self.get_fold_generator(fold=self.fold_param)
    else:
        self.fold_generator = self.fold_strategy
        self.fold_param = self.fold_generator.get_n_splits(y=self.y_train)

    return self

So, I modified the second if block to calculate the fold count for CV objects like ExpandingWindowSplitter correctly:

    if self.fold_strategy in possible_time_series_fold_strategies:
        # Calculate the fold_param based on splits
        self.fold_generator = self.get_fold_generator(fold=self.fold)
        self.fold_param = (
            self.fold_generator.get_n_splits(y=self.y_train)
            if hasattr(self.fold_generator, 'get_n_splits') else None  
        )

This fixed the output. So running the same code as you, I got this:

Now the fold number is showing up correctly. I was about to make a pull request with the changes I have made, however, at this point, I started looking at the documentation of Pycaret's Time Series libraries. And I found out that the fold parameter should only get int type argument, and the CV objects should be passed to the fold_strategy object instead!

So, your code should be modified to this:

# !pip install pycaret==3.0.0rc3

import numpy as np
from pycaret.time_series import TSForecastingExperiment
from pycaret.datasets import get_data
from sktime.forecasting.model_selection import ExpandingWindowSplitter

y = get_data(114, folder="time_series/seasonal", verbose=False)
cv = ExpandingWindowSplitter(fh=np.arange(1, 13), initial_window=24, step_length=4)
exp = TSForecastingExperiment()
exp.setup(y, fh=12, fold_strategy=cv)

This should return the output as expected!

mhsizar · 2024-12-02T16:25:02Z

However, since passing a CV object to the fold parameter still runs the code without throwing an error and returning the object as the Fold Number in output, I believe we should raise a TypeError when any type of argument except int is passed into the fold parameter. So I have modified the setup() function in pycaret/time_series/forecasting/oop.py and inserted the following code block:

# Validate fold
        if not isinstance(fold, int):
            raise TypeError(
                f"The 'fold' parameter must be an integer. You provided: {type(fold).__name__}. "
                "If you intended to use a custom cross-validation object such as SlidingWindowSplitter or "
                "ExpandingWindowSplitter, please pass it to the 'fold_strategy' parameter instead. "
                "The 'fold' parameter is ignored when 'fold_strategy' is a custom CV object."
            )

I am making a pull request. Let me know if you have any suggestions!

ngupta23 added bug time_series labels Oct 2, 2022

ngupta23 added this to the pycaret 3.0.0 milestone Oct 2, 2022

ngupta23 added the good first issue label Oct 2, 2022

moezali1 assigned ngupta23 Oct 10, 2022

ngupta23 modified the milestones: pycaret 3.0.0, pycaret 3.1.0 Nov 9, 2022

ngupta23 modified the milestones: 3.1.2, 3.2.1 Feb 2, 2024

mhsizar linked a pull request Dec 2, 2024 that will close this issue

Number of folds not reported correctly when passing non-integers into the fold parameter #4104

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Number of folds not reported correctly when passing ExpandingWindowSplitter #3011

[BUG]: Number of folds not reported correctly when passing ExpandingWindowSplitter #3011

ngupta23 commented Oct 2, 2022 •

edited

Loading

mhsizar commented Dec 2, 2024

mhsizar commented Dec 2, 2024 •

edited

Loading

[BUG]: Number of folds not reported correctly when passing ExpandingWindowSplitter #3011

[BUG]: Number of folds not reported correctly when passing ExpandingWindowSplitter #3011

Comments

ngupta23 commented Oct 2, 2022 • edited Loading

pycaret version checks

Issue Description

Reproducible Example

Expected Behavior

Actual Results

Installed Versions

mhsizar commented Dec 2, 2024

mhsizar commented Dec 2, 2024 • edited Loading

ngupta23 commented Oct 2, 2022 •

edited

Loading

mhsizar commented Dec 2, 2024 •

edited

Loading