Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to choose 'or' mode in TimeSeries.gaps() #629

Closed
hrzn opened this issue Dec 1, 2021 · 3 comments · Fixed by #1265
Closed

Add option to choose 'or' mode in TimeSeries.gaps() #629

hrzn opened this issue Dec 1, 2021 · 3 comments · Fixed by #1265
Assignees
Labels
good first issue Good for newcomers improvement New feature or improvement
Projects

Comments

@hrzn
Copy link
Contributor

hrzn commented Dec 1, 2021

Right now for multivariate TS, gaps are only considered if all components are nan, but it should be possible to treat periods where any value is nan as gap too.

to be discussed with @pennfranc

@hrzn hrzn created this issue from a note in darts (To do) Dec 1, 2021
@hrzn hrzn added improvement New feature or improvement good first issue Good for newcomers labels Dec 1, 2021
@ghost
Copy link

ghost commented Dec 16, 2021

The code bolow adds the option to TimeSeries.gaps() The code will also return an empty DataFrame in case no gaps are found. I did not include the code in a pull request because:

  • I did no proper testing and
  • Implications on other parts of the library are not investigated. In particular the additional change that the gaps function can return an empty pd.DataFrame (in case no gaps are availabe - issue TimeSeries.gaps() not working if there are no gaps #647) will trow an exception if you use it with darts.utils.missing_values.extract_subseries
    Hope the code can still be of value for somebody...
def gaps(self, mode:str='all') -> pd.DataFrame:
    """
    A function to compute and return gaps in the TimeSeries. Works only on deterministic time series (1 sample).
    Parameters
    ----------
    mode
         Only relevant for multivariate time series. The mode defines how gaps are defnind. Set to 
         'any' in if periods where any value is Nan should be considere as as gaps. 'all' will only 
         consider periods where all values are NaN. Defaults to 'all'. 
    Returns
    -------/
    pd.DataFrame
        A pandas.DataFrame containing a row for every gap (rows with all-NaN values in underlying DataFrame)
        in this time series. The DataFrame contains three columns that include the start and end time stamps
        of the gap and the integer length of the gap (in `self.freq` units if the series is indexed
        by a DatetimeIndex).
    """

    df = self.pd_dataframe()

    if mode == 'all':
        is_nan_series = df.isna().all(axis=1).astype(int)
    elif mode == 'any':
        is_nan_series = df.isna().any(axis=1).astype(int)
    else:
        raise_log(ValueError(f"Keyword mode accepts only 'any' or 'all'. Provided {mode}"), logger)
    diff = pd.Series(np.diff(is_nan_series.values), index=is_nan_series.index[:-1])
    gap_starts = diff[diff == 1].index + self._freq
    gap_ends = diff[diff == -1].index

    if is_nan_series.iloc[0] == 1:
        gap_starts = gap_starts.insert(0, self.start_time())
    if is_nan_series.iloc[-1] == 1:
        gap_ends = gap_ends.insert(len(gap_ends), self.end_time())

    gap_df = pd.DataFrame(columns=['gap_start', 'gap_end'])

    if gap_starts.size == 0:
        return gap_df
    else:
        def intvl(start, end):
            if self._has_datetime_index:
                return pd.date_range(start=start, end=end, freq=self._freq).size
            else:
                return start - end
        
        gap_df['gap_start'] = gap_starts
        gap_df['gap_end'] = gap_ends
        gap_df['gap_size'] = gap_df.apply(
            lambda row: intvl(start=row.gap_start, end=row.gap_end), axis=1
        )

        return gap_df

@hrzn
Copy link
Contributor Author

hrzn commented Aug 15, 2022

Thank you @phubermi and sorry for the late reaction. I think your message somehow slipped through..
Would you be willing to turn this into a PR?

@madtoinou madtoinou moved this from To do to In progress in darts Oct 5, 2022
@madtoinou madtoinou self-assigned this Oct 5, 2022
@madtoinou madtoinou mentioned this issue Oct 6, 2022
@madtoinou madtoinou moved this from In progress to In review in darts Oct 6, 2022
@mukkla
Copy link

mukkla commented Oct 6, 2022

@madtoinou: Thank's a lot for taking care of it!

darts automation moved this from In review to Done Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers improvement New feature or improvement
Projects
darts
Done
Development

Successfully merging a pull request may close this issue.

3 participants