Skip to content

Change method TSDataset.describe #409

Merged
merged 18 commits into from
Dec 28, 2021
Merged

Change method TSDataset.describe #409

merged 18 commits into from
Dec 28, 2021

Conversation

Mr-Geekman
Copy link
Contributor

@Mr-Geekman Mr-Geekman commented Dec 22, 2021

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Proposed Changes

Look #347

Related Issue

#347

Closing issues

Closes #347

@Mr-Geekman Mr-Geekman self-assigned this Dec 22, 2021
@Mr-Geekman Mr-Geekman added the enhancement New feature or request label Dec 22, 2021
@Mr-Geekman Mr-Geekman changed the title Add method describe to TSDataset Change method TSDataset.describe Dec 22, 2021
@codecov-commenter
Copy link

codecov-commenter commented Dec 22, 2021

Codecov Report

Merging #409 (561cff2) into master (ecf872d) will decrease coverage by 0.14%.
The diff coverage is 69.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #409      +/-   ##
==========================================
- Coverage   87.22%   87.08%   -0.15%     
==========================================
  Files          99       99              
  Lines        5003     5047      +44     
==========================================
+ Hits         4364     4395      +31     
- Misses        639      652      +13     
Impacted Files Coverage Δ
etna/datasets/tsdataset.py 89.86% <69.56%> (-3.39%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecf872d...561cff2. Read the comment docs.

@Mr-Geekman Mr-Geekman marked this pull request as ready for review December 22, 2021 10:56
start_date end_date length num_missing num_segments num_exogs num_regressors freq
segments
segment_0 2021-06-01 2021-06-30 30 0 2 1 1 D
segment_1 2021-06-01 2021-06-30 30 0 2 1 1 D
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there are only four segment-specific columns: start_date, end_date, num_missing, length; another ones duplicate each other, don't they?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change *_date -> *_timestamp

Information about individual segments:
* start_timestamp: beginning of the segment, missing values in the beginning are ignored
* length: length according to start_date and end_date
* num_missing: number of missing variables between start_date and end_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* num_missing: number of missing variables between start_date and end_date
* num_missing: number of missing variables between start_timestamp and end_timestamp


Information about individual segments:
* start_timestamp: beginning of the segment, missing values in the beginning are ignored
* length: length according to start_date and end_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* length: length according to start_date and end_date
* length: length according to start_timestamp and end_timestamp

>>> ts = TSDataset(df_ts_format, df_exog=df_exog_ts_format, freq="D")
>>> ts.info()
<class 'etna.datasets.TSDataset'>
end_timestamp: 2021-06-30 00:00:00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually end_timestamp can be different for different series
I can run smth like

ts = TSDataset(...)
ts.info()

and should get valid result

* start_timestamp: beginning of the segment, missing values in the beginning are ignored
* end_timestamp: ending of the segment, missing values are not ignored, common for all segments
* length: length according to start_date and end_date
* num_missing: number of missing variables between start_date and end_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* num_missing: number of missing variables between start_date and end_date
* num_missing: number of missing variables between start_timestamp and end_timestamp

Method describes dataset in segment-wise fashion. Description columns:
* start_timestamp: beginning of the segment, missing values in the beginning are ignored
* end_timestamp: ending of the segment, missing values are not ignored, common for all segments
* length: length according to start_date and end_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* length: length according to start_date and end_date
* length: length according to start_timestamp and end_timestamp

@julia-shenshina julia-shenshina merged commit f4d0a34 into master Dec 28, 2021
@martins0n martins0n deleted the issue-347 branch January 21, 2022 09:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add method describe to TSDataset
3 participants