Flatten nested fields #3676

domoritz · 2018-04-27T23:52:52Z

Always flattens nested fields. Flattening happens when data is parsed. This is similar to what we are doing already since an expression always returns flattened data and we often generate expressions instead of suing vega's native format.parse.

Fixes #3369
Fixes #3368
Fixes #3251
Fixes #3441

The logic is now as follows: the input to any transform could be nested. After the transforms, all data is flat and should be accessed as such.

domoritz · 2018-04-28T04:30:20Z

@nyurik do you have a few test cases that I can try?

nyurik · 2018-04-28T06:03:25Z

Not readily available, no.

domoritz · 2018-04-29T04:42:49Z

In expressions, we require users to write the correct, non-nested, version.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "description": "Unable to truly filter data",
  "data": {
    "values":[
      {
        "rank": "1",
        "options": {
          "price": 10
        }
      },
      {
        "rank": "2",
        "options": {
          "price": 16 
        }
      },
      {
        "rank": "3",
        "options": {
          "price": 17 
        }
      } 
    ]
  },
  "transform": [{"filter": "datum['options.price'] > 10"}],
  "mark": "line",
  "encoding": {
    "x": {"field": "rank", "type": "ordinal"},
    "y": {"field": "options.price", "type": "quantitative"}
  }
}

However, for all other filters, aggregates, and so on where users specify a field, we automatically o the right thing.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "description": "Unable to truly filter data",
  "data": {
    "values":[
      {
        "rank": "1",
        "options": {
          "price": 10
        }
      },
      {
        "rank": "2",
        "options": {
          "price": 16 
        }
      },
      {
        "rank": "3",
        "options": {
          "price": 16
        }
      } 
    ]
  },
  "transform": [{"filter": {"field": "options.price", "equal": 16}}],
  "mark": "line",
  "encoding": {
    "x": {"field": "rank", "type": "ordinal"},
    "y": {"field": "options.price", "type": "quantitative"}
  }
}

kanitw · 2018-04-29T05:28:32Z

src/compile/data/formatparse.ts

@@ -74,6 +79,11 @@ export class ParseNode extends DataFlowNode {
            return;
          }
          parse[fieldDef.field] = 'number';
+        } else if (countAccessPath(fieldDef.field) > 1) {


countAccessPath is a pretty confusing name.

accessPathDepth?

Also why is this case checked after isTimeFieldDef / isNumberFieldDef?

What if a number field def has nested field?

Then we flatten the field already. Note that every expression flattens.

kanitw · 2018-04-29T05:41:00Z

In expressions, we require users to write the correct, non-nested, version.

Why is that the case? What if you never have an encoding that uses the field "options.price"?

Your example is correct because the encoding makes options.price get flattened so datum['options.price'] in the expression is fine.

but in general, isn't it more correct to filter by the nested expression?

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "description": "Unable to truly filter data",
  "data": {
    "values":[
      {
        "rank": "1",
        "options": {
          "price": 10
        }
      },
      {
        "rank": "2",
        "options": {
          "price": 16 
        }
      },
      {
        "rank": "3",
        "options": {
          "price": 17 
        }
      } 
    ]
  },
  "transform": [{"filter": "datum.options.price > 10"}],
  "mark": "point",
  "encoding": {
    "x": {"field": "rank", "type": "ordinal"}
  }
}

Note that what you do here is not literally flattening, but rather augment each data point with a flattened field, while the old nested field still remains.

kanitw · 2018-04-29T05:42:46Z

examples/specs/test_subobject_missing.vl.json

@@ -0,0 +1,29 @@
+{
+  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
+  "description": "Unable to truly filter data",


This description is very confusing.

The description suggests that this spec does NOT work, but it seems to be working. So I don't understand what's the point you're trying to make here. (What is "truly filter" data?)

Ups, copy pasta.

domoritz · 2018-04-29T16:30:05Z

@kanitw The case you described is okay. Yes, you don't have to use the flattened field and there may be fields that we don't know how to flatten.

kanitw · 2018-04-29T16:32:04Z

Yes, you don't have to use the flattened field and there may be fields that we don't know how to flatten.

I don't understand this sentence. Can you explain more?

domoritz · 2018-04-29T16:35:05Z

The idea with this PR is that as long as you only use fields and no expressions, everything will just work even with nested fields because we automatically flatten them. However, if you use an expression, we don't parse them and thus you will be able to access the nested fields as well. This is okay and I don't think we want to prevent it.

kanitw · 2018-04-29T17:38:46Z

examples/compiled/test_subobject_nested.vg.json

@@ -84,17 +84,17 @@
            "transform": [
                {
                    "type": "formula",
-                    "expr": "toNumber(datum[\"source\"][\"reco\"])",
+                    "expr": "toNumber(datum[\"source\"] && datum[\"source\"][\"reco\"])",


if source is missing, what this does output?

kanitw · 2018-04-29T17:40:28Z

In #3369, there are links to many other related issues (mostly in the field nesting milestones).

Does this PR fix each of those, if so, listed them so we can close once we merge?

kanitw · 2018-04-29T23:11:15Z

src/compile/data/formatparse.ts

+        } else if (accessPathDepth(fieldDef.field) > 1) {
+          // For non-date/non-number (strings and booleans), derive a flattened field for a referenced nested field. 
+          // (Parsing numbers / dates already flatten numeric and temporal fields.)
+          parse[fieldDef.field] = 'flatten';


@domoritz Don't you have to add the same logic above for transform -- for all of aggregate, timeUnit, window, filter.

Side issue: I also wonder if the original logic above is behaving correctly for standalone timeUnit transform.
(the field in timeUnit won't get included in parse -- see line_timeunit_transform for example.)

Don't you have to add the same logic above for transform -- for all of aggregate, timeUnit, window, filter.

Hmm, yes. I'm surprised we don't do that already. That seems like an orthogonal bug, doesn't it?

Side issue: I also wonder if the original logic above is behaving correctly for standalone timeUnit transform.

It looks correct although we parse the generates date as date again (which is not necessary but we need to track fields to remove it).

{ "$schema": "https://vega.github.io/schema/vega-lite/v2.json", "data": {"url": "data/seattle-weather.csv"}, "mark": "point", "transform": [{ "timeUnit": "month", "field": "foo.date", "as": "month" }], "encoding": { "x": { "field": "month", "type": "temporal" } } }

Vega

{ "name": "source_0", "url": "data/seattle-weather.csv", "format": {"type": "csv"}, "transform": [ { "type": "formula", "as": "month", "expr": "datetime(0, month(datum[\"foo\"] && datum[\"foo\"][\"date\"]), 1, 0, 0, 0, 0)" }, { "type": "formula", "expr": "toDate(datum[\"month\"])", "as": "month" }, { "type": "filter", "expr": "datum[\"month\"] !== null && !isNaN(datum[\"month\"])" } ] }

kanitw · 2018-04-29T23:59:34Z

@domoritz Don't you have to add the same logic above for transform -- for all of aggregate, timeUnit, window, filter.

Hmm, yes. I'm surprised we don't do that already. That seems like an orthogonal bug, doesn't it?

It's not orthogonal since you didn't add the new nested field logic (this PR's concern) to above? This means that nested "field" in aggregate, timeUnit, filter, window, bin may not work?

(It might, but you better check)

domoritz · 2018-04-30T00:02:10Z

domoritz · 2018-04-30T01:19:17Z

To test filter

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "data": {
    "values": {
      "hits": {
        "hits": [
          {"source": {"reco": 2, "yes": 1}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 2, "yes": 0}},
          {"source": {"reco": 1, "yes": 3}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 0}},
          {"source": {"reco": 1, "yes": 1}}
        ]
      }
    },
    "format": {"property": "hits.hits"}
  },
  "transform": [{
    "filter": {
      "field": "source.reco",
      "equal": 1
    }
  }],
  "mark": "point",
  "encoding": {
    "x": {"field": "source.reco", "type": "ordinal"},
    "y": {"field": "source.yes", "type": "ordinal"}
  }
}

To test calculate

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "data": {
    "values": {
      "hits": {
        "hits": [
          {"source": {"reco": 2, "yes": 1}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 2, "yes": 0}},
          {"source": {"reco": 1, "yes": 3}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 0}},
          {"source": {"reco": 1, "yes": 1}}
        ]
      }
    },
    "format": {"property": "hits.hits"}
  },
  "transform": [{
    "calculate": "datum.source.reco",
    "as": "reco"
  }],
  "mark": "point",
  "encoding": {
    "x": {"field": "reco", "type": "ordinal"},
    "y": {"field": "source.yes", "type": "ordinal"}
  }
}

to test aggregate

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "data": {
    "values": {
      "hits": {
        "hits": [
          {"source": {"reco": 2, "yes": 1}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 2, "yes": 0}},
          {"source": {"reco": 1, "yes": 3}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 0}},
          {"source": {"reco": 1, "yes": 1}}
        ]
      }
    },
    "format": {"property": "hits.hits"}
  },
  "transform": [{
    "aggregate": [{
      "op": "mean",
      "field": "source.reco",
      "as": "mean_reco"
    }],
    "groupby": ["source.yes"]
  }],
  "mark": "point",
  "encoding": {
    "x": {"field": "mean_reco", "type": "ordinal"},
    "y": {"field": "source.yes", "type": "ordinal"}
  }
}

to test bin

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "data": {
    "values": {
      "hits": {
        "hits": [
          {"source": {"reco": 2, "yes": 1}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 2, "yes": 0}},
          {"source": {"reco": 1, "yes": 3}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 0}},
          {"source": {"reco": 1, "yes": 1}}
        ]
      }
    },
    "format": {"property": "hits.hits"}
  },
  "transform": [{
    "bin": true,
    "field": "source.reco",
    "as": "reco_binned"
  }],
  "mark": "point",
  "encoding": {
    "x": {"field": "reco_binned", "type": "ordinal"},
    "y": {"field": "source.yes", "type": "ordinal"}
  }
}

to test window

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "data": {
    "values": {
      "hits": {
        "hits": [
          {"source": {"reco": 2, "yes": 1}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 2, "yes": 0}},
          {"source": {"reco": 1, "yes": 3}},
          {"source": {"reco": 3, "yes": 4}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 1}},
          {"source": {"reco": 1, "yes": 0}},
          {"source": {"reco": 1, "yes": 1}}
        ]
      }
    },
    "format": {"property": "hits.hits"}
  },
  "transform": [{
    "window": [{
      "op": "mean",
      "field": "source.reco",
      "as": "mean_reco"
    }],
    "groupby": ["source.yes"]
  }],
  "mark": "point",
  "encoding": {
    "x": {"field": "mean_reco", "type": "ordinal"},
    "y": {"field": "source.yes", "type": "ordinal"}
  }
}

Test escaped dots

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "data": {
    "values": {
      "hits": {
        "hits": [
          {"source.reco": 2 ,"source.yes": 1},
          {"source.reco": 3 ,"source.yes": 4},
          {"source.reco": 2 ,"source.yes": 0},
          {"source.reco": 1 ,"source.yes": 3},
          {"source.reco": 3 ,"source.yes": 4},
          {"source.reco": 1 ,"source.yes": 1},
          {"source.reco": 1 ,"source.yes": 1},
          {"source.reco": 1 ,"source.yes": 1},
          {"source.reco": 1 ,"source.yes": 0},
          {"source.reco": 1 ,"source.yes": 1}
        ]
      }
    },
    "format": {"property": "hits.hits"}
  },
  "mark": "point",
  "encoding": {
    "x": {"field": "source\\.reco", "type": "ordinal"},
    "y": {"field": "source\\.yes", "type": "ordinal"}
  }
}

…ways access flattened fields after transforms.

domoritz changed the title ~~Flatten nested fields. Fixes #3369~~ Flatten nested fields Apr 28, 2018

domoritz changed the title ~~Flatten nested fields~~ [WIP] Flatten nested fields Apr 28, 2018

Flatten nested fields. Fixes #3369

6fdd02d

domoritz force-pushed the dom/flatten branch from 3e9640c to 6fdd02d Compare April 28, 2018 00:34

domoritz mentioned this pull request Apr 28, 2018

Unable to filter out null objects if referenced by encoding #3441

Closed

Add datum to access path. Fixes #3441

1593760

domoritz force-pushed the dom/flatten branch from 4bbae85 to 27b150e Compare April 28, 2018 04:14

domoritz requested a review from kanitw April 28, 2018 04:19

This was referenced Apr 29, 2018

Nested data accesses break output #3368

Closed

Incorrect processing of tooltip vega/vega-parser#64

Closed

Only add parse for non-nested fields. Always flatten nested fields.

512172a

domoritz force-pushed the dom/flatten branch from 27b150e to 09ba777 Compare April 29, 2018 04:34

domoritz changed the title ~~[WIP] Flatten nested fields~~ Flatten nested fields Apr 29, 2018

domoritz force-pushed the dom/flatten branch from e5befd1 to 9c4cbe9 Compare April 29, 2018 05:25

kanitw reviewed Apr 29, 2018

View reviewed changes

domoritz force-pushed the dom/flatten branch from 23c3c6a to 2b50edf Compare April 29, 2018 16:44

kanitw reviewed Apr 29, 2018

View reviewed changes

domoritz force-pushed the dom/flatten branch from 8bf93e5 to e40978e Compare April 29, 2018 23:49

domoritz changed the title ~~Flatten nested fields~~ [WIP] Flatten nested fields Apr 30, 2018

domoritz force-pushed the dom/flatten branch from e60708a to 39bf57f Compare April 30, 2018 03:46

domoritz changed the title ~~[WIP] Flatten nested fields~~ Flatten nested fields Apr 30, 2018

domoritz force-pushed the dom/flatten branch 2 times, most recently from 2fd6033 to a855ad9 Compare April 30, 2018 05:59

domoritz added 3 commits April 30, 2018 15:00

Add example with missing subobject. Add more tests to formatparse. Al…

d382fb2

…ways access flattened fields after transforms.

Rebuild examples

b2cffed

Update and simplify CO2 example

73c8e49

domoritz force-pushed the dom/flatten branch from d4dc47f to 73c8e49 Compare April 30, 2018 22:04

kanitw approved these changes May 1, 2018

View reviewed changes

domoritz merged commit 73c8e49 into master May 3, 2018

domoritz deleted the dom/flatten branch May 3, 2018 16:05

nyurik mentioned this pull request Jun 5, 2018

Faceting a Plot into a Trellis Plot doesn't work in kibana vega for parent-child type filed elastic/kibana#19642

Closed

domoritz mentioned this pull request Aug 28, 2019

Selections are not working with nested data #5334

Closed

arvind mentioned this pull request Sep 9, 2019

Selection fields are flattened, so should be directly accessed. vega/vega#2002

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flatten nested fields #3676

Flatten nested fields #3676

domoritz commented Apr 27, 2018 •

edited

Loading

domoritz commented Apr 28, 2018

nyurik commented Apr 28, 2018

domoritz commented Apr 29, 2018

kanitw Apr 29, 2018

kanitw Apr 29, 2018

domoritz Apr 29, 2018

kanitw commented Apr 29, 2018

kanitw Apr 29, 2018

domoritz Apr 29, 2018

domoritz commented Apr 29, 2018 •

edited

Loading

kanitw commented Apr 29, 2018 •

edited by domoritz

Loading

domoritz commented Apr 29, 2018

kanitw Apr 29, 2018

domoritz Apr 29, 2018

kanitw commented Apr 29, 2018

kanitw Apr 29, 2018 •

edited

Loading

kanitw Apr 29, 2018 •

edited

Loading

domoritz Apr 29, 2018

domoritz Apr 29, 2018 •

edited

Loading

kanitw commented Apr 29, 2018

domoritz commented Apr 30, 2018 •

edited

Loading

domoritz commented Apr 30, 2018 •

edited

Loading

Flatten nested fields #3676

Flatten nested fields #3676

Conversation

domoritz commented Apr 27, 2018 • edited Loading

domoritz commented Apr 28, 2018

nyurik commented Apr 28, 2018

domoritz commented Apr 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kanitw commented Apr 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domoritz commented Apr 29, 2018 • edited Loading

kanitw commented Apr 29, 2018 • edited by domoritz Loading

domoritz commented Apr 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kanitw commented Apr 29, 2018

kanitw Apr 29, 2018 • edited Loading

Choose a reason for hiding this comment

kanitw Apr 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domoritz Apr 29, 2018 • edited Loading

Choose a reason for hiding this comment

kanitw commented Apr 29, 2018

domoritz commented Apr 30, 2018 • edited Loading

domoritz commented Apr 30, 2018 • edited Loading

domoritz commented Apr 27, 2018 •

edited

Loading

domoritz commented Apr 29, 2018 •

edited

Loading

kanitw commented Apr 29, 2018 •

edited by domoritz

Loading

kanitw Apr 29, 2018 •

edited

Loading

kanitw Apr 29, 2018 •

edited

Loading

domoritz Apr 29, 2018 •

edited

Loading

domoritz commented Apr 30, 2018 •

edited

Loading

domoritz commented Apr 30, 2018 •

edited

Loading