Flexible data slicing #34

Firenze11 · 2019-09-16T05:13:18Z

Enable flexible data slicing mechanism, based on all of the following fields in state:
- isManualSegmentation: whether to apply manual (filter-based) or automatic (kmeans) data slicing
- baseCols: use which columns to slice (either through creating filters for these columns, or through inputting them to kmeans clustering)
- nClusters: number of clusters to use in automatic slicing (only applicable to automatic slicing)
- segmentFilters: filter logic corresponding to data segment (only applicable to manual slicing)
- segmentGroups: which segments to group together for comparing against each other
Enable automatic updating all fields mentioned above once one of the fields gets changed (usually this happens when user triggers an action to update one of the fields)
- States form a dependency chain: {data, isManualSegmentation} -> {baseCols} -> {nClusters, segmentFilters} -> {segmentGroups}.
- Any change in upstream states could cause changes in downstream states (e.g. change in data will result in change in baseCols, if the current value of baseCols becomes invalid given the current value of data
- Downstream states change won't affect upstream states (e.g changing segmentGroups won't cause nClusters to become invalid
- States update logic: If a state A changes, for one its downstream states B, check if it has become invalid, if so, set it to default based on all other current upstream states; if still valid, do not change the value; change in B could result in further changes in C, so we need to recurse on all downstream states.
- Created a helper function validateAndSetDefaultStatesConfigurator to execute the logic above.

kenns29

Created a helper function validateAndSetDefaultStatesConfigurator to execute the logic above.

Please clarify a thing. Where should the validateAndSetDefaultStatesConfigurator function to be used. Since this function modifies the state, I suppose it shouldn't be used inside a reducer, but rather inside the components which modify the related states (isManualSegmentation, etc), e.g. via triggering an action insidecomponentDidUpdate. If this is the case, you must prepare for the complexity that this type of pattern introduces.

modules/manifold/src/utils/default-states.js

kenns29 · 2019-09-16T16:24:57Z

modules/manifold/src/utils/utils.js

@@ -388,6 +358,32 @@ export function zipObjects(arrays, joinField, rename) {
  });
 }

+export function product(collectionArr) {
+  let result = [];
+  function recur(collection) {


does this have to be done in a recursive way. Maybe a simple loop does the job more efficiently.

There's no space or time complexity difference between recursion and iterative approach.

jest.config.js

modules/manifold/src/constants/constants.js

gnavvy · 2019-09-16T17:18:13Z

modules/manifold/src/reducers/index.js

  // columns: array of array of columns; fields: array of field metadata
  data: {columns: [], fields: []},
-  // collumnTypeRanges: map from "column type" (x, yPred, etc) to array of 2 elements indicating the start and end index of that column type in dataset
+  // map from "column type" (x, yPred, etc) to array of 2 elements indicating the start and end index of that column type in dataset


if we use start and end indices here, are we assuming the columns of the same type will be provided in consecutive order? or we will sort the columns for the user?

Yes, we are assuming the columns of the same type will be provided in consecutive order. We don't remove or insert columns (only replace or append) since we refer to columns by ids, and want to ensure the reference is still valid after replacing or appending.

gnavvy · 2019-09-16T17:23:16Z

modules/manifold/src/selectors/adaptors.js

+    const nonGroup = segmentIds.filter(
+      e => !segmentGroups[0].includes(e) && !segmentGroups[1].includes(e)
+    );
+    return nonGroup.concat(segmentGroups[1]).concat(segmentGroups[0]);


shall we comment on the order here?
[nonGroup, segmentGroups[1], segmentGroups[0]]
why 1 and then 0?

groups are ordered reversely, we assume user cares most about segmentGroups[0], then segmentGroups[1], then nonGroup.

gnavvy · 2019-09-16T17:29:13Z

modules/manifold/src/utils/data-slicing.js

+  const clusteringInputDataset = gatherDataset(data, colIds);
+
+  const {columns} = clusteringInputDataset;
+  const clusterIds = computeClusters(columns, nClusters, true);


not in this diff but what do you think of using one object as the input parameter, instead of three parameters? such that the order of the params won't matter and it is easier to infer the keys (e.g. what are we putting the true for here?)

Sure, true was an indicator of whether input shape is [nDataPoints, nFeatures] or [nFeatures, nDataPoints]. Was added as a quick fix when we did the refactoring to migrate to column based data format. Will update computeClusters so we don't have to use true

gnavvy · 2019-09-16T18:20:54Z

modules/manifold/src/utils/data-processor.js

      },
    ],
  };
 }

 /**
 * compute model score columns given yPred, yTrue columns
- * @param {Array<Array<Number|String>} yPred prediction columns, from all models and all classes
+ * @param {Array<Array<Number>} yPred prediction columns, from all models and all classes


could you explain this change?

It was a mistake before, our documentation https://github.com/uber/manifold#upload-csv-to-demo-app maintains that values in yPred should always be numbers

modules/manifold/src/utils/data-processor.js

gnavvy · 2019-09-16T18:30:43Z

modules/manifold/src/utils/default-states.js

+        segmentFilter.length === baseCols.length &&
+        segmentFilter.every((filter, i) => {
+          const colId = filter.key;
+          // each baseCol and their corresponding filter must be in the same order


Besides returning false, shall we provide (e.g. console.debug or assert) detailed info like this?
Easier to debug.

Was thinking about enabling some logging in validateAndSetDefaultStateSingle or validateAndSetDefaultStatesConfigurator when env === 'development'. Probably not in isValid... functions themselves since this type of invalidity are not an indicator of bugs, but a legal invalidation due to some other states get changed.

gnavvy · 2019-09-16T18:34:27Z

modules/manifold/src/utils/default-states.js

+
+    case FEATURE_TYPE.CATEGORICAL:
+      return [[domain[0]], [domain[1]]];
+


how do we derive the default filter value for FEATURE_TYPE.FUNC?

You mean FILTER_TYPE.FUNC? This has not been implemented. Currently FEATURE_TYPE only has a few options https://github.com/uber/manifold/blob/master/modules/mlvis-common/src/constants/index.js#L14 and only CATEGORICAL and NUMERICAL are allowed to have corresponding filters.

modules/manifold/src/utils/default-states.js

gnavvy · 2019-09-16T18:36:25Z

modules/manifold/src/utils/default-states.js

+    return (
+      filterType === FILTER_TYPE.INCLUDE &&
+      // filter must consain a subset of column domain
+      value.length < domain.length &&


can value.length === domain.length?

No, we assume a filter should contain only a subset of original data.

gnavvy · 2019-09-16T18:38:05Z

modules/manifold/src/utils/default-states.js

+ * @return {boolean}
+ */
+export const isValidSegmentFilterFromFieldDef = (filter, field) => {
+  const {type: filterType, value} = filter;


nit: shall we rename value to range in this function?

since filter.value could also be a function, let's not rename it to "range"

gnavvy · 2019-09-16T18:46:33Z

modules/manifold/src/utils/default-states.js

+  const nSegments = isManualSegmentation ? segmentFilters.length : nClusters;
+  for (let i = 0; i < segmentGroups.length; i++) {
+    if (!segmentGroups[i].length) {
+      return false;


let's provide more detail on why the isValidSegmentGroups check fails.

Added comments

Firenze11 · 2019-09-16T18:51:34Z

Please clarify a thing. Where should the validateAndSetDefaultStatesConfigurator function to be used. Since this function modifies the state, I suppose it shouldn't be used inside a reducer, but rather inside the components which modify the related states (isManualSegmentation, etc), e.g. via triggering an action insidecomponentDidUpdate. If this is the case, you must prepare for the complexity that this type of pattern introduces.

validateAndSetDefaultStatesConfigurator doesn't mutate state, and should be used in reducer. In particular, it should be used in action handlers such as handleUpdateFieldA, where updating fieldA would cause other fields in current state state.fieldB state.fieldC to be invalidated:

const validateAndSetDefaultStates = validateAndSetDefaultStatesConfigurator(
    ['fieldB', 'fieldC'],
    isValidFuncs,
    setDefaultFuncs
);
const handleUpdateFieldA = (state, action) => {
    return validateAndSetDefaultStates({
        ...state,
        fieldA: action.payload,
    });
}

modules/manifold/src/utils/utils.js

Firenze11

Addressed comments

Firenze11 · 2019-09-16T19:48:22Z

modules/manifold/src/utils/default-states.js

+        segmentFilter.length === baseCols.length &&
+        segmentFilter.every((filter, i) => {
+          const colId = filter.key;
+          // each baseCol and their corresponding filter must be in the same order


Was thinking about enabling some logging in validateAndSetDefaultStateSingle or validateAndSetDefaultStatesConfigurator when env === 'development'. Probably not in isValid... functions themselves since this type of invalidity are not an indicator of bugs, but a legal invalidation due to some other states get changed.

Firenze11 · 2019-09-16T19:54:44Z

modules/manifold/src/utils/default-states.js

+
+    case FEATURE_TYPE.CATEGORICAL:
+      return [[domain[0]], [domain[1]]];
+


You mean FILTER_TYPE.FUNC? This has not been implemented. Currently FEATURE_TYPE only has a few options https://github.com/uber/manifold/blob/master/modules/mlvis-common/src/constants/index.js#L14 and only CATEGORICAL and NUMERICAL are allowed to have corresponding filters.

Firenze11 · 2019-09-16T19:55:49Z

modules/manifold/src/utils/default-states.js

+ * @return {boolean}
+ */
+export const isValidSegmentFilterFromFieldDef = (filter, field) => {
+  const {type: filterType, value} = filter;


since filter.value could also be a function, let's not rename it to "range"

Firenze11 · 2019-09-16T19:59:19Z

modules/manifold/src/utils/default-states.js

+  const nSegments = isManualSegmentation ? segmentFilters.length : nClusters;
+  for (let i = 0; i < segmentGroups.length; i++) {
+    if (!segmentGroups[i].length) {
+      return false;


Added comments

Firenze11 · 2019-09-16T20:18:23Z

modules/manifold/src/utils/utils.js

@@ -388,6 +358,32 @@ export function zipObjects(arrays, joinField, rename) {
  });
 }

+export function product(collectionArr) {
+  let result = [];
+  function recur(collection) {


There's no space or time complexity difference between recursion and iterative approach.

This reverts commit 125729c.

* Enable flexible data slicing mechanism, based on all of the following fields in `state`: * `isManualSegmentation`: whether to apply manual (filter-based) or automatic (kmeans) data slicing * `baseCols`: use which columns to slice (either through creating filters for these columns, or through inputting them to kmeans clustering) * `nClusters`: number of clusters to use in automatic slicing (only applicable to automatic slicing) * `segmentFilters`: filter logic corresponding to data segment (only applicable to manual slicing) * `segmentGroups`: which segments to group together for comparing against each other * Enable automatic updating all fields mentioned above once one of the fields gets changed (usually this happens when user triggers an action to update one of the fields) * States form a dependency chain: `{data, isManualSegmentation} -> {baseCols} -> {nClusters, segmentFilters} -> {segmentGroups}`. * Any change in upstream states could cause changes in downstream states (e.g. change in `data` will result in change in `baseCols`, if the current value of `baseCols` becomes invalid given the current value of `data` * Downstream states change won't affect upstream states (e.g changing `segmentGroups` won't cause `nClusters` to become invalid * States update logic: If a state A changes, for one its downstream states B, check if it has become invalid, if so, set it to default based on all other current upstream states; if still valid, do not change the value; change in B could result in further changes in C, so we need to recurse on all downstream states. * Created a helper function `validateAndSetDefaultStatesConfigurator` to execute the logic above.

Firenze11 added 4 commits September 13, 2019 16:26

removed unnecessary states, renamed states

5ffde2c

flexible base cols

0087704

auto validation and set-default for all states when one state is changed

a919be1

fix lint and add inline docs

a30a9f6

kenns29 reviewed Sep 16, 2019

View reviewed changes

gnavvy reviewed Sep 16, 2019

View reviewed changes

modules/manifold/src/utils/utils.js Outdated Show resolved Hide resolved

Firenze11 commented Sep 16, 2019

View reviewed changes

respond to comments, removed legacy isValidSegmentGroup usage

857e5f4

Firenze11 changed the base branch from publish to master September 16, 2019 20:23

Firenze11 changed the base branch from master to publish September 16, 2019 20:24

Firenze11 mentioned this pull request Sep 16, 2019

Flexible data slicing logic #20

Closed

gnavvy approved these changes Sep 16, 2019

View reviewed changes

Firenze11 merged commit 125729c into publish Sep 16, 2019

Firenze11 mentioned this pull request Sep 16, 2019

combined pull request #38

Merged

Firenze11 added a commit that referenced this pull request Sep 16, 2019

Revert "Flexible data slicing (#34)"

7eb0711

This reverts commit 125729c.

Firenze11 mentioned this pull request Sep 28, 2019

Flexible slice ui #50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexible data slicing #34

Flexible data slicing #34

Firenze11 commented Sep 16, 2019

kenns29 left a comment •

edited

kenns29 Sep 16, 2019

Firenze11 Sep 16, 2019

gnavvy Sep 16, 2019

Firenze11 Sep 16, 2019 •

edited

gnavvy Sep 16, 2019

Firenze11 Sep 16, 2019

gnavvy Sep 16, 2019

Firenze11 Sep 16, 2019

gnavvy Sep 16, 2019

Firenze11 Sep 16, 2019

gnavvy Sep 16, 2019

Firenze11 Sep 16, 2019

gnavvy Sep 16, 2019

Firenze11 Sep 16, 2019

gnavvy Sep 16, 2019

Firenze11 Sep 16, 2019

gnavvy Sep 16, 2019

Firenze11 Sep 16, 2019

gnavvy Sep 16, 2019

Firenze11 Sep 16, 2019

Firenze11 commented Sep 16, 2019 •

edited

Firenze11 left a comment

Firenze11 Sep 16, 2019

Firenze11 Sep 16, 2019

Firenze11 Sep 16, 2019

Firenze11 Sep 16, 2019

Firenze11 Sep 16, 2019


		case FEATURE_TYPE.CATEGORICAL:
		return [[domain[0]], [domain[1]]];

Flexible data slicing #34

Flexible data slicing #34

Conversation

Firenze11 commented Sep 16, 2019

kenns29 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Firenze11 Sep 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Firenze11 commented Sep 16, 2019 • edited

Firenze11 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kenns29 left a comment •

edited

Firenze11 Sep 16, 2019 •

edited

Firenze11 commented Sep 16, 2019 •

edited