Features/indexing #150

grll · 2020-07-16T08:53:26Z

Fixes #DARTS-95.

Summary

Add better indexing support and column names to TimeSeries.

Other Information

darts/timeseries.py

Co-authored-by: Julien Herzen <julien.herzen@unit8.co>

hrzn · 2020-07-17T19:55:18Z

darts/timeseries.py

+                columns = [columns]
+            raise_if_not(len(columns) == df.shape[1], "`{}` columns specified when `{}` was expected from `df` "
+                                                      "shape".format(len(columns), df.shape[1]))
+            df.columns = columns


Here and in line 64 we are changing the DataFrame provided by the user. I think we should not touch it, but rather do everything on the "internal" dataframe df_, which is a copy of df created on line 74.

Ah indeed I agree

btw relying on sort_index() to create a copy of the df might make it a bit unclear that it is a copy

opinions @TheMP @pennfranc @Droxef ?

Yes actually this made me notice a problem we currently have I think. In line 62 you change the columns of the original data frame right? This means that the original data frame that the user provides gets changed.

@pennfranc You are right I modified this behavior in the latest commits, my question was more about using ??
sort_index() to return a new DF I was personally not sure it was returning a copy or a view in that case.

Yes good point, I think a comment clarifying at which point we are creating a copy of the df might be nice.

+1 for a comment to make it clear. An option could also be to split it in using both df.copy() first, and then df_.sort_index(inplace=True).

darts/timeseries.py

pennfranc · 2020-07-22T11:57:01Z

darts/timeseries.py

@@ -541,7 +567,8 @@ def from_dataframe(df: pd.DataFrame,
    def from_times_and_values(times: pd.DatetimeIndex,
                              values: np.ndarray,
                              freq: Optional[str] = None,
-                              fill_missing_dates: Optional[bool] = True) -> 'TimeSeries':
+                              fill_missing_dates: Optional[bool] = True,
+                              column: Optional[str] = None,) -> 'TimeSeries':


This setup with the column parameter assumes that we are creating a univariate time series, right? But technically we can also use this function to create a multivariate time series by passing a 2-
dimensional numpy array as values. So we can just change the data type to also allow for a list of strings to support multivariate time series creation, right?

you are right I would have to check that everything works with 2d array

I think you'll have to use an argument similar to the init method: columns: Optional[Union[List[str], str]].

I changed it to rely more on pandas by accepting a pd._typing.Axes as columns parameter type. It's up to the user to come with either a list of columns or a pandas index...

hrzn · 2020-07-22T18:35:59Z

darts/timeseries.py

@@ -21,7 +21,8 @@ class TimeSeries:
    def __init__(self,
                 df: pd.DataFrame,
                 freq: Optional[str] = None,
-                 fill_missing_dates: Optional[bool] = True):
+                 fill_missing_dates: Optional[bool] = True,
+                 columns: Optional[Union[List[str], str]] = None):


I'm still not convinced that we need this argument to the init method. I think it's enough to simply re-use the column names of the dataframe, which would simplify everything. For instance the checks that column names are unique and of the correct size etc; Pandas can take care of that, and removing it for darts would improve separation of concerns IMO. It would still be good to keep this argument in from_times_and_values() method though, in order to build a DF with correct column names before building the TimeSeries.
WDYT?

@grll what was teh final decision regarding this comment from @hrzn ?

I agree that separation of concerns is good but we still need to enforce a few more things that is not guaranteed by pd.DataFrame such as uniqueness and str format. I created a private method in the latest commit to do that as it is used in a couple of places.

…tures/int-indexing

grll added 2 commits July 15, 2020 18:03

add support for columns to the TimeSeries object

425a479

add colum support indexing to timeseries

beb6432

grll requested review from hrzn and TheMP as code owners July 16, 2020 08:53

fix wrong docstring

20064aa

Kostiiii reviewed Jul 16, 2020

View reviewed changes

darts/timeseries.py Outdated Show resolved Hide resolved

pennfranc reviewed Jul 16, 2020

View reviewed changes

darts/timeseries.py Outdated Show resolved Hide resolved

refactor indexing, fix docstring, columns as last arg

8c9f224

grll commented Jul 16, 2020

View reviewed changes

darts/timeseries.py Show resolved Hide resolved

clean indexing method

a8021ed

hrzn reviewed Jul 16, 2020

View reviewed changes

darts/timeseries.py Outdated Show resolved Hide resolved

darts/timeseries.py Outdated Show resolved Hide resolved

grll and others added 2 commits July 16, 2020 15:54

refactor indexing only based on loc and iloc

4bde24b

Update darts/timeseries.py

f9d89a8

Co-authored-by: Julien Herzen <julien.herzen@unit8.co>

hrzn reviewed Jul 17, 2020

View reviewed changes

darts/timeseries.py Show resolved Hide resolved

darts/timeseries.py Outdated Show resolved Hide resolved

darts/timeseries.py Outdated Show resolved Hide resolved

darts/timeseries.py Show resolved Hide resolved

grll and others added 7 commits July 20, 2020 11:25

use underlying columns by default

9c0c9e6

fix column added on intern _df and use self.freq_str

cd6df5d

fix parameter position in from_times_and_values

6bef192

fix the tests to use str columns

fb8b78d

fix docstring timeseries

9ad5c46

remove None check on df that should exists

1cde216

Merge branch 'develop' into features/indexing

66105c7

pennfranc reviewed Jul 22, 2020

View reviewed changes

hrzn reviewed Jul 22, 2020

View reviewed changes

Merge branch 'develop' into features/indexing

8ddc228

TheMP approved these changes Jul 28, 2020

View reviewed changes

grll and others added 4 commits July 28, 2020 16:10

add comment for clarifying that _df is a copy

cade004

add separate function to process columns

38635ab

Merge branch 'features/indexing' of github.com:unit8co/darts into fea…

f96f169

…tures/int-indexing

Merge branch 'develop' into features/indexing

28173d1

grll and others added 3 commits July 30, 2020 15:31

adapt map with str col indexing

b515be0

Merge branch 'develop' into features/indexing

1a2189b

Merge branch 'develop' into features/indexing

4261a1f

grll merged commit 0d9b394 into develop Aug 3, 2020

LeoTafti deleted the features/indexing branch October 15, 2020 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/indexing #150

Features/indexing #150

grll commented Jul 16, 2020

hrzn Jul 17, 2020

grll Jul 20, 2020

grll Jul 20, 2020

grll Jul 20, 2020

pennfranc Jul 20, 2020

grll Jul 20, 2020

pennfranc Jul 22, 2020

hrzn Jul 22, 2020

pennfranc Jul 22, 2020

grll Jul 22, 2020

hrzn Jul 22, 2020

grll Jul 28, 2020

hrzn Jul 22, 2020

TheMP Jul 27, 2020

grll Jul 28, 2020

Features/indexing #150

Features/indexing #150

Conversation

grll commented Jul 16, 2020

Summary

Other Information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment