# Statistical Backtesting Time Series for Freqtrade
This notebook helps you define a proper timerange for training, validating and testing your strategy.

Classical way of backtesting:
- split data into train / validate / test 0.6 0.2 0.2
- train dataset: train & fit the model, adjust params
- validate dataset: eval performance, hyperparam, select best model
- test dataset: evaluate performance (final evaluation, unseen data, estimates real world scenario)
- never go back on the data after the test dataset (as you're overfitting to the test)
- ... unless you are prepared for another test dataset / dry run

A better way:
- define your strategy
- run the Unanchored Walk Forward Optimization to get the folds
- train your dataset on the first training fold
- test it on the first test fold
- repeat until the last fold
- how did your strategy behave after experiencing all those market conditions?

I recommend using the Walk Forward Optimization which, if correctly implemented, is one of the best ways to avoid overfitting a trading strategy.

## Imports & config

In [1]:
%matplotlib inline
import requests

import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection._split import _BaseKFold, indexable, _num_samples
from sklearn.utils.validation import _deprecate_positional_args

import os

In [2]:
os.chdir("/freqtrade")
token = pd.read_feather('user_data/data/binance/futures/BTC_USDT_USDT-1d-futures.feather')
start_date = token['date'].min().strftime('%Y%m%d')
end_date = token['date'].max().strftime('%Y%m%d')

print(f"Timerange: {start_date}-{end_date}")

Timerange: 20210101-20230821


## Preferred methods

### Unanchored Walk Forward Optimization
Here we have a rolling window of training/test data and we continuously adapt to the market.


In [3]:
df = token[['date', 'close']]

# Define the number of splits and the test data compared to the training data (0.2 = 80% training 20% testing)
n_splits = 5
test_ratio = 0.2

# Calculate the step size for each fold
step_size = len(df) // n_splits

# Create an empty list to store the folds
folds = []

# Split the data into folds with unanchored walk-forward
for i in range(n_splits):
    start_idx = i * step_size
    test_days = int((step_size + 1) * test_ratio)
    end_idx = start_idx + step_size + test_days
    
    
    # Ensure the end index is within the data range
    if end_idx > len(df):
        end_idx = len(df)
    
    train_fold = df.iloc[start_idx:end_idx - test_days]
    test_fold = df.iloc[end_idx - test_days:end_idx]
    folds.append((train_fold, test_fold))

# Display the data and time ranges in each fold
for i, (train_fold, test_fold) in enumerate(folds):
    train_start_date = pd.to_datetime(train_fold['date'].min()).strftime('%Y%m%d')
    train_end_date = pd.to_datetime(train_fold['date'].max()).strftime('%Y%m%d')
    test_start_date = pd.to_datetime(test_fold['date'].min()).strftime('%Y%m%d')
    test_end_date = pd.to_datetime(test_fold['date'].max()).strftime('%Y%m%d')
    
    print(f"Fold {i+1}:")
    print(f"\tTrain:\t{train_start_date}-{train_end_date}\t{train_fold.shape}")
    print(f"\tTest:\t{test_start_date}-{test_end_date}\t{train_fold.shape}")

print("\n")

# Display the fold ranges and their values after the loop
for i, (train_fold, test_fold) in enumerate(folds):
    fold_train_range = f"fold{i+1}_train=\"{pd.to_datetime(train_fold['date'].min()).strftime('%Y%m%d')}-{pd.to_datetime(train_fold['date'].max()).strftime('%Y%m%d')}\""
    fold_test_range = f"fold{i+1}_test=\"{pd.to_datetime(test_fold['date'].min()).strftime('%Y%m%d')}-{pd.to_datetime(test_fold['date'].max()).strftime('%Y%m%d')}\""
    
    print(f"{fold_train_range}")
    print(f"{fold_test_range}")


Fold 1:
	Train:	20210101-20210711	(192, 2)
	Test:	20210712-20210818	(192, 2)
Fold 2:
	Train:	20210712-20220119	(192, 2)
	Test:	20220120-20220226	(192, 2)
Fold 3:
	Train:	20220120-20220730	(192, 2)
	Test:	20220731-20220906	(192, 2)
Fold 4:
	Train:	20220731-20230207	(192, 2)
	Test:	20230208-20230317	(192, 2)
Fold 5:
	Train:	20230208-20230714	(157, 2)
	Test:	20230715-20230821	(157, 2)


fold1_train="20210101-20210711"
fold1_test="20210712-20210818"
fold2_train="20210712-20220119"
fold2_test="20220120-20220226"
fold3_train="20220120-20220730"
fold3_test="20220731-20220906"
fold4_train="20220731-20230207"
fold4_test="20230208-20230317"
fold5_train="20230208-20230714"
fold5_test="20230715-20230821"


### Splitting the training dataset via kfold into separate timeranges with test sets
Here we split the data into 5 folds.
There's a bug in which the test set of the last fold might be too small but you can use the last fold for a final test.
The disadvantaged with this method compared to a walk forward approach is that we don't adapt to the new market regime and our old data might have the same impact as the new data has. Think Moving Average vs Exponential Moving Average.
Read more here: https://medium.com/eatpredlove/time-series-cross-validation-a-walk-forward-approach-in-python-8534dd1db51a

In [4]:
df = token[['date', 'close']]

# Define the number of splits
n_splits = 5

# Calculate the step size for each fold
step_size = len(df) // n_splits

# Create an empty list to store the folds
folds = []

print(f"Step size: {step_size}\n")

# Split the data into folds
for i in range(n_splits):
    start_idx = i * step_size
    end_idx = start_idx + step_size

    # Split the data into train and test based on indices
    train_fold = df.iloc[start_idx:end_idx]
    test_fold = df.iloc[end_idx:end_idx + step_size] if i < n_splits - 1 else df.iloc[end_idx:]
    
    folds.append((train_fold, test_fold))

# Display the data and time ranges in each fold
for i, (train_fold, test_fold) in enumerate(folds):
    train_start_date = pd.to_datetime(train_fold['date'].min()).strftime('%Y%m%d')
    train_end_date = pd.to_datetime(train_fold['date'].max()).strftime('%Y%m%d')
    test_start_date = pd.to_datetime(test_fold['date'].min()).strftime('%Y%m%d')
    test_end_date = pd.to_datetime(test_fold['date'].max()).strftime('%Y%m%d')
        
    print(f"Fold {i+1}:")
    print(f"\tTrain:\t{train_start_date}-{train_end_date}\t{train_fold.shape}")
    print(f"\tTest:\t{test_start_date}-{test_end_date}\t{test_fold.shape}")

print("\n")

# Display the fold ranges and their values after the loop
for i, (train_fold, test_fold) in enumerate(folds):
    fold_train_range = f"fold{i+1}_train=\"{pd.to_datetime(train_fold['date'].min()).strftime('%Y%m%d')}-{pd.to_datetime(train_fold['date'].max()).strftime('%Y%m%d')}\""
    fold_test_range = f"fold{i+1}_test=\"{pd.to_datetime(test_fold['date'].min()).strftime('%Y%m%d')}-{pd.to_datetime(test_fold['date'].max()).strftime('%Y%m%d')}\""
    
    print(f"{fold_train_range}")
    print(f"{fold_test_range}")


Step size: 192

Fold 1:
	Train:	20210101-20210711	(192, 2)
	Test:	20210712-20220119	(192, 2)
Fold 2:
	Train:	20210712-20220119	(192, 2)
	Test:	20220120-20220730	(192, 2)
Fold 3:
	Train:	20220120-20220730	(192, 2)
	Test:	20220731-20230207	(192, 2)
Fold 4:
	Train:	20220731-20230207	(192, 2)
	Test:	20230208-20230818	(192, 2)
Fold 5:
	Train:	20230208-20230818	(192, 2)
	Test:	20230819-20230821	(3, 2)


fold1_train="20210101-20210711"
fold1_test="20210712-20220119"
fold2_train="20210712-20220119"
fold2_test="20220120-20220730"
fold3_train="20220120-20220730"
fold3_test="20220731-20230207"
fold4_train="20220731-20230207"
fold4_test="20230208-20230818"
fold5_train="20230208-20230818"
fold5_test="20230819-20230821"


### Anchored Walk Forward Optimization
Here we split the data starting from the same start date and see how our strategy evolves, how the parameters are modified up to fold 5.
Each fold has a training period and a test period. The advantage of anchored is that we have bigger training data.

In [5]:
df = token[['date', 'close']]

# Define the number of splits
n_splits = 5

# Initialize TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=n_splits)

# Create an empty list to store the folds
folds = []

# Split the data using TimeSeriesSplit
for train_idx, test_idx in tscv.split(df):
    train_fold = df.iloc[train_idx]
    test_fold = df.iloc[test_idx]
    folds.append((train_fold, test_fold))

# Display the data and time ranges in each fold
for i, (train_fold, test_fold) in enumerate(folds):
    train_start_date = pd.to_datetime(train_fold['date'].min()).strftime('%Y%m%d')
    train_end_date = pd.to_datetime(train_fold['date'].max()).strftime('%Y%m%d')
    test_start_date = pd.to_datetime(test_fold['date'].min()).strftime('%Y%m%d')
    test_end_date = pd.to_datetime(test_fold['date'].max()).strftime('%Y%m%d')
    
    print(f"Fold {i+1}:")
    print(f"\tTrain:\t{train_start_date}-{train_end_date}\t{train_fold.shape}")
    print(f"\tTest:\t{test_start_date}-{test_end_date}\t{train_fold.shape}")

print("\n")

# Display the fold ranges and their values after the loop
for i, (train_fold, test_fold) in enumerate(folds):
    fold_train_range = f"fold{i+1}_train=\"{pd.to_datetime(train_fold['date'].min()).strftime('%Y%m%d')}-{pd.to_datetime(train_fold['date'].max()).strftime('%Y%m%d')}\""
    fold_test_range = f"fold{i+1}_test=\"{pd.to_datetime(test_fold['date'].min()).strftime('%Y%m%d')}-{pd.to_datetime(test_fold['date'].max()).strftime('%Y%m%d')}\""
    
    print(f"{fold_train_range}")
    print(f"{fold_test_range}")


Fold 1:
	Train:	20210101-20210612	(163, 2)
	Test:	20210613-20211119	(163, 2)
Fold 2:
	Train:	20210101-20211119	(323, 2)
	Test:	20211120-20220428	(323, 2)
Fold 3:
	Train:	20210101-20220428	(483, 2)
	Test:	20220429-20221005	(483, 2)
Fold 4:
	Train:	20210101-20221005	(643, 2)
	Test:	20221006-20230314	(643, 2)
Fold 5:
	Train:	20210101-20230314	(803, 2)
	Test:	20230315-20230821	(803, 2)


fold1_train="20210101-20210612"
fold1_test="20210613-20211119"
fold2_train="20210101-20211119"
fold2_test="20211120-20220428"
fold3_train="20210101-20220428"
fold3_test="20220429-20221005"
fold4_train="20210101-20221005"
fold4_test="20221006-20230314"
fold5_train="20210101-20230314"
fold5_test="20230315-20230821"


### Anchored Walk Forward Optimization v2
Here we split the data starting from the same start date and see how our strategy evolves, how the parameters are modified up to fold 5.
Each fold has a training period and a test period. The difference in v2 is that we have a longer training period in the beginning.

In [6]:
df = token[['date', 'close']]

import numpy as np

# credits: https://medium.com/eatpredlove/time-series-cross-validation-a-walk-forward-approach-in-python-8534dd1db51a
class expanding_window(object):
    '''	
    Parameters 
    ----------
    
    Note that if you define a horizon that is too far, then subsequently the split will ignore horizon length 
    such that there is validation data left. This similar to Prof Rob hyndman's TsCv 
    
    
    initial: int
        initial train length 
    horizon: int 
        forecast horizon (forecast length). Default = 1
    period: int 
        length of train data to add each iteration 
    '''
    

    def __init__(self,initial= 1,horizon = 1,period = 1):
        self.initial = initial
        self.horizon = horizon 
        self.period = period 


    def split(self,data):
        '''
        Parameters 
        ----------
        
        Data: Training data 
        
        Returns 
        -------
        train_index ,test_index: 
            index for train and valid set similar to sklearn model selection
        '''
        self.data = data
        self.counter = 0 # for us to iterate and track later 


        data_length = data.shape[0] # rows 
        data_index = list(np.arange(data_length))
         
        output_train = []
        output_test = []
        # append initial 
        output_train.append(list(np.arange(self.initial)))
        progress = [x for x in data_index if x not in list(np.arange(self.initial)) ] # indexes left to append to train 
        output_test.append([x for x in data_index if x not in output_train[self.counter]][:self.horizon] )
        # clip initial indexes from progress since that is what we are left 
         
        while len(progress) != 0:
            temp = progress[:self.period]
            to_add = output_train[self.counter] + temp
            # update the train index 
            output_train.append(to_add)
            # increment counter 
            self.counter +=1 
            # then we update the test index 
            
            to_add_test = [x for x in data_index if x not in output_train[self.counter] ][:self.horizon]
            output_test.append(to_add_test)

            # update progress 
            progress = [x for x in data_index if x not in output_train[self.counter]]	
            
        # clip the last element of output_train and output_test
        output_train = output_train[:-1]
        output_test = output_test[:-1]
        
        # mimic sklearn output 
        index_output = [(train,test) for train,test in zip(output_train,output_test)]
        
        return index_output


# We train on 1 year worth of data, then we have a test data length of 2 months and we add 6 months to each iteration (121 days)
tscv = expanding_window(initial = 365, horizon = 2*30,period = 6*30)

# Create an empty list to store the folds
folds = []

# Split the data using TimeSeriesSplit
for train_idx, test_idx in tscv.split(df):
    train_fold = df.iloc[train_idx]
    test_fold = df.iloc[test_idx]
    folds.append((train_fold, test_fold))

# Display the data and time ranges in each fold
for i, (train_fold, test_fold) in enumerate(folds):
    train_start_date = pd.to_datetime(train_fold['date'].min()).strftime('%Y%m%d')
    train_end_date = pd.to_datetime(train_fold['date'].max()).strftime('%Y%m%d')
    test_start_date = pd.to_datetime(test_fold['date'].min()).strftime('%Y%m%d')
    test_end_date = pd.to_datetime(test_fold['date'].max()).strftime('%Y%m%d')
    
    print(f"Fold {i+1}:")
    print(f"\tTrain:\t{train_start_date}-{train_end_date}\t{train_fold.shape}")
    print(f"\tTest:\t{test_start_date}-{test_end_date}\t{train_fold.shape}")

print("\n")

# Display the fold ranges and their values after the loop
for i, (train_fold, test_fold) in enumerate(folds):
    fold_train_range = f"fold{i+1}_train=\"{pd.to_datetime(train_fold['date'].min()).strftime('%Y%m%d')}-{pd.to_datetime(train_fold['date'].max()).strftime('%Y%m%d')}\""
    fold_test_range = f"fold{i+1}_test=\"{pd.to_datetime(test_fold['date'].min()).strftime('%Y%m%d')}-{pd.to_datetime(test_fold['date'].max()).strftime('%Y%m%d')}\""
    
    print(f"{fold_train_range}")
    print(f"{fold_test_range}")


Fold 1:
	Train:	20210101-20211231	(365, 2)
	Test:	20220101-20220301	(365, 2)
Fold 2:
	Train:	20210101-20220629	(545, 2)
	Test:	20220630-20220828	(545, 2)
Fold 3:
	Train:	20210101-20221226	(725, 2)
	Test:	20221227-20230224	(725, 2)
Fold 4:
	Train:	20210101-20230624	(905, 2)
	Test:	20230625-20230821	(905, 2)


fold1_train="20210101-20211231"
fold1_test="20220101-20220301"
fold2_train="20210101-20220629"
fold2_test="20220630-20220828"
fold3_train="20210101-20221226"
fold3_test="20221227-20230224"
fold4_train="20210101-20230624"
fold4_test="20230625-20230821"


## Not used

### Splitting the data into a separate training, validation and test dataset
Normally we would split the data into a 80-20-20 split for training, validation and testing. 
With TimeSeries this would just hit different types of markets (e.g. training on a downtrend and testing on an uptrend).
The results might be too optimistic (or pessimistic) and not statistically sound. 
So we're skipping it in favor of a walk forward optimization or a kfold split.

### Splitting the data using train_test_split() 60-40

The issue with train_test_split() is that you might train your model in a specific market situation like a downtrend and test it right when the market changes, maybe to an uptrend. On a long biased strategy this would show more profit.

Try using the other methods instead.

In [7]:
df = token[['date', 'close']]

# Splitting the data into training (80%) and test (20%)
train, test = train_test_split(df, test_size=0.2, shuffle=False)

# Get the date ranges for training and validation sets
train_start_date = train['date'].iloc[0].strftime('%Y%m%d')
train_end_date = train['date'].iloc[-1].strftime('%Y%m%d')
test_start_date = test['date'].iloc[0].strftime('%Y%m%d')
test_end_date = test['date'].iloc[-1].strftime('%Y%m%d')

# Display the date ranges
print(f"Train:\t\t{train_start_date}-{train_end_date}")
print(f"Test:\t\t{test_start_date}-{test_end_date}")

print(f"\nTraining length:\t{len(train)}")
print(f"Testing length:\t\t{len(test)}")

Train:		20210101-20230209
Test:		20230210-20230821

Training length:	770
Testing length:		193


### Splitting the data using train_test_split() 60-20-20

In [8]:
df = token[['date', 'close']]

# Splitting the data into training (60%) and temp (40%)
train, temp = train_test_split(df, test_size=0.4, shuffle=False)

# Splitting the temp data into validation (50%) and test (50%) to achieve an overall split of 60-20-20
val, test = train_test_split(temp, test_size=0.5, shuffle=False)

# Get the date ranges for training and validation sets
train_start_date = train['date'].iloc[0].strftime('%Y%m%d')
train_end_date = train['date'].iloc[-1].strftime('%Y%m%d')
val_start_date = val['date'].iloc[0].strftime('%Y%m%d')
val_end_date = val['date'].iloc[-1].strftime('%Y%m%d')
test_start_date = test['date'].iloc[0].strftime('%Y%m%d')
test_end_date = test['date'].iloc[-1].strftime('%Y%m%d')

# Display the date ranges
print(f"Train:\t\t{train_start_date}-{train_end_date}")
print(f"Validate:\t{val_start_date}-{val_end_date}")
print(f"Test:\t\t{test_start_date}-{test_end_date}")

print(f"\nTraining length:\t{len(train)}")
print(f"Validation length:\t{len(val)}")
print(f"Testing length:\t\t{len(test)}")

Train:		20210101-20220731
Validate:	20220801-20230209
Test:		20230210-20230821

Training length:	577
Validation length:	193
Testing length:		193


### Splitting the data using KFold
This approach is problematic because it overfits by testing on data that we've learned from. 

In [9]:
df = token[['date', 'close']]

# Initialize KFold with the desired number of splits
kfold = KFold(n_splits=5, shuffle=False)  # Let's use 5 splits as an example

# Define date format for printing
date_format = "%Y%m%d"

for train_idx, test_idx in kfold.split(df):
    train_data, test_data = df.iloc[train_idx], df.iloc[test_idx]    
    
    # Get date ranges
    train_start_date = train_data['date'].min().strftime(date_format)
    train_end_date = train_data['date'].max().strftime(date_format)
    test_start_date = test_data['date'].min().strftime(date_format)
    test_end_date = test_data['date'].max().strftime(date_format)
    
    # Print the train and test data ranges
    print(f"Train:\t\t{train_start_date}-{train_end_date}\t\tshape:\t{train_data.shape}")
    print(f"Test:\t\t{test_start_date}-{test_end_date}\t\tshape:\t{test_data.shape}")
    print("------")


Train:		20210713-20230821		shape:	(770, 2)
Test:		20210101-20210712		shape:	(193, 2)
------
Train:		20210101-20230821		shape:	(770, 2)
Test:		20210713-20220121		shape:	(193, 2)
------
Train:		20210101-20230821		shape:	(770, 2)
Test:		20220122-20220802		shape:	(193, 2)
------
Train:		20210101-20230821		shape:	(771, 2)
Test:		20220803-20230210		shape:	(192, 2)
------
Train:		20210101-20230210		shape:	(771, 2)
Test:		20230211-20230821		shape:	(192, 2)
------


### Found  GroupTimeSeriesSplit on Kaggle which might provide another option of splitting the timeframes
https://www.kaggle.com/code/jorijnsmit/found-the-holy-grail-grouptimeseriessplit/comments

In [10]:
df = token[['date', 'close']]

# https://github.com/getgaurav2/scikit-learn/blob/d4a3af5cc9da3a76f0266932644b884c99724c57/sklearn/model_selection/_split.py#L2243
class GroupTimeSeriesSplit(_BaseKFold):
    """Time Series cross-validator variant with non-overlapping groups.
    Provides train/test indices to split time series data samples
    that are observed at fixed time intervals according to a
    third-party provided group.
    In each split, test indices must be higher than before, and thus shuffling
    in cross validator is inappropriate.
    This cross-validation object is a variation of :class:`KFold`.
    In the kth split, it returns first k folds as train set and the
    (k+1)th fold as test set.
    The same group will not appear in two different folds (the number of
    distinct groups has to be at least equal to the number of folds).
    Note that unlike standard cross-validation methods, successive
    training sets are supersets of those that come before them.
    Read more in the :ref:`User Guide <cross_validation>`.
    Parameters
    ----------
    n_splits : int, default=5
        Number of splits. Must be at least 2.
    max_train_size : int, default=None
        Maximum size for a single training set.
    Examples
    --------
    >>> import numpy as np
    >>> from sklearn.model_selection import GroupTimeSeriesSplit
    >>> groups = np.array(['a', 'a', 'a', 'a', 'a', 'a',\
                           'b', 'b', 'b', 'b', 'b',\
                           'c', 'c', 'c', 'c',\
                           'd', 'd', 'd'])
    >>> gtss = GroupTimeSeriesSplit(n_splits=3)
    >>> for train_idx, test_idx in gtss.split(groups, groups=groups):
    ...     print("TRAIN:", train_idx, "TEST:", test_idx)
    ...     print("TRAIN GROUP:", groups[train_idx],\
                  "TEST GROUP:", groups[test_idx])
    TRAIN: [0, 1, 2, 3, 4, 5] TEST: [6, 7, 8, 9, 10]
    TRAIN GROUP: ['a' 'a' 'a' 'a' 'a' 'a']\
    TEST GROUP: ['b' 'b' 'b' 'b' 'b']
    TRAIN: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] TEST: [11, 12, 13, 14]
    TRAIN GROUP: ['a' 'a' 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b' 'b']\
    TEST GROUP: ['c' 'c' 'c' 'c']
    TRAIN: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]\
    TEST: [15, 16, 17]
    TRAIN GROUP: ['a' 'a' 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b' 'b' 'c' 'c' 'c' 'c']\
    TEST GROUP: ['d' 'd' 'd']
    """
    @_deprecate_positional_args
    def __init__(self,
                 n_splits=5,
                 *,
                 max_train_size=None
                 ):
        super().__init__(n_splits, shuffle=False, random_state=None)
        self.max_train_size = max_train_size

    def split(self, X, y=None, groups=None):
        """Generate indices to split data into training and test set.
        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data, where n_samples is the number of samples
            and n_features is the number of features.
        y : array-like of shape (n_samples,)
            Always ignored, exists for compatibility.
        groups : array-like of shape (n_samples,)
            Group labels for the samples used while splitting the dataset into
            train/test set.
        Yields
        ------
        train : ndarray
            The training set indices for that split.
        test : ndarray
            The testing set indices for that split.
        """
        if groups is None:
            raise ValueError(
                "The 'groups' parameter should not be None")
        X, y, groups = indexable(X, y, groups)
        n_samples = _num_samples(X)
        n_splits = self.n_splits
        n_folds = n_splits + 1
        group_dict = {}
        u, ind = np.unique(groups, return_index=True)
        unique_groups = u[np.argsort(ind)]
        n_samples = _num_samples(X)
        n_groups = _num_samples(unique_groups)
        for idx in np.arange(n_samples):
            if (groups[idx] in group_dict):
                group_dict[groups[idx]].append(idx)
            else:
                group_dict[groups[idx]] = [idx]
        if n_folds > n_groups:
            raise ValueError(
                ("Cannot have number of folds={0} greater than"
                 " the number of groups={1}").format(n_folds,
                                                     n_groups))
        group_test_size = n_groups // n_folds
        group_test_starts = range(n_groups - n_splits * group_test_size,
                                  n_groups, group_test_size)
        for group_test_start in group_test_starts:
            train_array = []
            test_array = []
            for train_group_idx in unique_groups[:group_test_start]:
                train_array_tmp = group_dict[train_group_idx]
                train_array = np.sort(np.unique(
                                      np.concatenate((train_array,
                                                      train_array_tmp)),
                                      axis=None), axis=None)
            train_end = train_array.size
            if self.max_train_size and self.max_train_size < train_end:
                train_array = train_array[train_end -
                                          self.max_train_size:train_end]
            for test_group_idx in unique_groups[group_test_start:
                                                group_test_start +
                                                group_test_size]:
                test_array_tmp = group_dict[test_group_idx]
                test_array = np.sort(np.unique(
                                              np.concatenate((test_array,
                                                              test_array_tmp)),
                                     axis=None), axis=None)
            yield [int(i) for i in train_array], [int(i) for i in test_array]
