Skip to content

Conversation

@daavoo
Copy link
Contributor

@daavoo daavoo commented Nov 29, 2021

  • TabularData: Add ignore_empty flag to drop_duplicates

  • TabularData: Fix drop_duplicates for rich_text NaN

@daavoo daavoo requested a review from a team as a code owner November 29, 2021 21:42
@daavoo daavoo requested a review from karajan1001 November 29, 2021 21:42
@daavoo daavoo force-pushed the drop-duplicates-nan branch from 70769af to b2c7623 Compare December 7, 2021 19:52
@daavoo daavoo changed the title TabularData: Fix drop_duplicates for rich_text NaN. TabularData: Add nan_is_value flag to drop_duplicates. Dec 7, 2021
@pmrowla
Copy link
Contributor

pmrowla commented Dec 17, 2021

If the problem here is that TabularData is being filled with a mix of None and "-" strings (i.e. table._fill_value), this seems like a bug that needs to be fixed in whatever is generating the TabularData, and not in specific UI functions that filter/render TabularData. (It seems like this will probably also show up in places other than drop_duplicates at some point)

TabularData should either

  • always store None values as None (and replace them with _fill_value at the time they are rendered to UI)
  • always replace None values with _fill_value at the time the value is added to the table/row/cell.

Looking at the code it looks like the 2nd option is what is supposed to be happening (@skshetry?), so my understanding is that drop_duplicates should never be encountering None values in the first place. (In which case this needs to be fixed somewhere else, presumably somewhere in exp show?)

@daavoo
Copy link
Contributor Author

daavoo commented Dec 17, 2021

If the problem here is that TabularData is being filled with a mix of None and "-" strings (i.e. table._fill_value)

That's not the problem.

The table is being filled with a mix of _fill_value of type string and _fill_value of type rich.Text (from show_experiments).

This is addressed in the first commit b09d353

and not in specific UI functions that filter/render TabularData

The nan_is_value flag for drop_duplicates is not fixing any problem but allowing to configure 2 valid behaviors that depend on the use case.

drop_duplicates should remove columns where all rows have the same value. There is an ambiguity on how to treat NaNs, i.e the following column:

foo
foo
foo
None

For some use cases, could be considered all duplicates because None it's not considered a relevant value. For others, None is considered a relevant value and thus this column should not be dropped.

always store None values as None (and replace them with _fill_value at the time they are rendered to UI)

I prefer this option. Was trying to not introduce many changes here but it's probably the way to go.

@daavoo daavoo marked this pull request as draft December 17, 2021 10:47
@pmrowla
Copy link
Contributor

pmrowla commented Dec 17, 2021

For some use cases, could be considered all duplicates because None it's not considered a relevant value. For others, None is considered a relevant value and thus this column should not be dropped.

Ah ok, that makes sense then.

always store None values as None (and replace them with _fill_value at the time they are rendered to UI)

I prefer this option. Was trying to not introduce many changes here but it's probably the way to go.

After reading your explanation, I think this PR is probably fine for now, maybe just open a separate issue regarding making TabularData store None types instead of the fill strings.


As a side note, I would say that NaN is not the same thing as a None/null (even if python doesnt have native types for nan) and would maybe name this something else like ignore_none/ignore_empty/skip_empty (to skip/ignore empty cells when determining if the value changed).

@daavoo daavoo force-pushed the drop-duplicates-nan branch from b2c7623 to 7c2c194 Compare December 17, 2021 16:16
When filling missing values with `ui.rich_text` (i.e. in experiments show CMD), those values were not being correctly matched against `self._fill_value`)
@daavoo daavoo force-pushed the drop-duplicates-nan branch from 7c2c194 to c37db07 Compare December 17, 2021 16:47
@daavoo daavoo changed the title TabularData: Add nan_is_value flag to drop_duplicates. TabularData: Add ignore_empty flag to drop_duplicates. Dec 17, 2021
@daavoo daavoo marked this pull request as ready for review December 17, 2021 16:48
@daavoo daavoo requested a review from karajan1001 December 20, 2021 17:43
Configures whether to consider missing values as relevant or not.
@daavoo daavoo force-pushed the drop-duplicates-nan branch from c37db07 to 251b931 Compare December 27, 2021 10:40
@daavoo daavoo merged commit 9a5614d into main Dec 29, 2021
@daavoo daavoo deleted the drop-duplicates-nan branch December 29, 2021 20:33
@efiop efiop added the ui user interface / interaction label Jan 14, 2022
@daavoo daavoo added the skip-changelog Skips changelog label Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog Skips changelog ui user interface / interaction

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants