-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flatten nested fields #3676
Flatten nested fields #3676
Conversation
@nyurik do you have a few test cases that I can try? |
Not readily available, no. |
In expressions, we require users to write the correct, non-nested, version. {
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"description": "Unable to truly filter data",
"data": {
"values":[
{
"rank": "1",
"options": {
"price": 10
}
},
{
"rank": "2",
"options": {
"price": 16
}
},
{
"rank": "3",
"options": {
"price": 17
}
}
]
},
"transform": [{"filter": "datum['options.price'] > 10"}],
"mark": "line",
"encoding": {
"x": {"field": "rank", "type": "ordinal"},
"y": {"field": "options.price", "type": "quantitative"}
}
} However, for all other filters, aggregates, and so on where users specify a field, we automatically o the right thing. {
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"description": "Unable to truly filter data",
"data": {
"values":[
{
"rank": "1",
"options": {
"price": 10
}
},
{
"rank": "2",
"options": {
"price": 16
}
},
{
"rank": "3",
"options": {
"price": 16
}
}
]
},
"transform": [{"filter": {"field": "options.price", "equal": 16}}],
"mark": "line",
"encoding": {
"x": {"field": "rank", "type": "ordinal"},
"y": {"field": "options.price", "type": "quantitative"}
}
} |
src/compile/data/formatparse.ts
Outdated
@@ -74,6 +79,11 @@ export class ParseNode extends DataFlowNode { | |||
return; | |||
} | |||
parse[fieldDef.field] = 'number'; | |||
} else if (countAccessPath(fieldDef.field) > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
countAccessPath is a pretty confusing name.
accessPathDepth
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why is this case checked after isTimeFieldDef
/ isNumberFieldDef
?
What if a number field def has nested field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we flatten the field already. Note that every expression flattens.
Why is that the case? What if you never have an encoding that uses the field Your example is correct because the encoding makes but in general, isn't it more correct to filter by the nested expression?
Note that what you do here is not literally flattening, but rather augment each data point with a flattened field, while the old nested field still remains. |
@@ -0,0 +1,29 @@ | |||
{ | |||
"$schema": "https://vega.github.io/schema/vega-lite/v2.json", | |||
"description": "Unable to truly filter data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is very confusing.
The description suggests that this spec does NOT work, but it seems to be working. So I don't understand what's the point you're trying to make here. (What is "truly filter" data?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, copy pasta.
@kanitw The case you described is okay. Yes, you don't have to use the flattened field and there may be fields that we don't know how to flatten. |
I don't understand this sentence. Can you explain more? |
The idea with this PR is that as long as you only use fields and no expressions, everything will just work even with nested fields because we automatically flatten them. However, if you use an expression, we don't parse them and thus you will be able to access the nested fields as well. This is okay and I don't think we want to prevent it. |
@@ -84,17 +84,17 @@ | |||
"transform": [ | |||
{ | |||
"type": "formula", | |||
"expr": "toNumber(datum[\"source\"][\"reco\"])", | |||
"expr": "toNumber(datum[\"source\"] && datum[\"source\"][\"reco\"])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if source is missing, what this does output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #3369, there are links to many other related issues (mostly in the field nesting milestones). Does this PR fix each of those, if so, listed them so we can close once we merge? |
src/compile/data/formatparse.ts
Outdated
} else if (accessPathDepth(fieldDef.field) > 1) { | ||
// For non-date/non-number (strings and booleans), derive a flattened field for a referenced nested field. | ||
// (Parsing numbers / dates already flatten numeric and temporal fields.) | ||
parse[fieldDef.field] = 'flatten'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@domoritz Don't you have to add the same logic above for transform
-- for all of aggregate
, timeUnit
, window
, filter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side issue: I also wonder if the original logic above is behaving correctly for standalone timeUnit transform.
(the field in timeUnit
won't get included in parse -- see line_timeunit_transform
for example.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you have to add the same logic above for transform -- for all of aggregate, timeUnit, window, filter.
Hmm, yes. I'm surprised we don't do that already. That seems like an orthogonal bug, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side issue: I also wonder if the original logic above is behaving correctly for standalone timeUnit transform.
It looks correct although we parse the generates date as date again (which is not necessary but we need to track fields to remove it).
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {"url": "data/seattle-weather.csv"},
"mark": "point",
"transform": [{
"timeUnit": "month",
"field": "foo.date",
"as": "month"
}],
"encoding": {
"x": {
"field": "month",
"type": "temporal"
}
}
}
Vega
{
"name": "source_0",
"url": "data/seattle-weather.csv",
"format": {"type": "csv"},
"transform": [
{
"type": "formula",
"as": "month",
"expr": "datetime(0, month(datum[\"foo\"] && datum[\"foo\"][\"date\"]), 1, 0, 0, 0, 0)"
},
{
"type": "formula",
"expr": "toDate(datum[\"month\"])",
"as": "month"
},
{
"type": "filter",
"expr": "datum[\"month\"] !== null && !isNaN(datum[\"month\"])"
}
]
}
It's not orthogonal since you didn't add the new nested field logic (this PR's concern) to above? This means that nested (It might, but you better check) |
test
|
To test filter {
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"values": {
"hits": {
"hits": [
{"source": {"reco": 2, "yes": 1}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 2, "yes": 0}},
{"source": {"reco": 1, "yes": 3}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 0}},
{"source": {"reco": 1, "yes": 1}}
]
}
},
"format": {"property": "hits.hits"}
},
"transform": [{
"filter": {
"field": "source.reco",
"equal": 1
}
}],
"mark": "point",
"encoding": {
"x": {"field": "source.reco", "type": "ordinal"},
"y": {"field": "source.yes", "type": "ordinal"}
}
} To test calculate {
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"values": {
"hits": {
"hits": [
{"source": {"reco": 2, "yes": 1}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 2, "yes": 0}},
{"source": {"reco": 1, "yes": 3}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 0}},
{"source": {"reco": 1, "yes": 1}}
]
}
},
"format": {"property": "hits.hits"}
},
"transform": [{
"calculate": "datum.source.reco",
"as": "reco"
}],
"mark": "point",
"encoding": {
"x": {"field": "reco", "type": "ordinal"},
"y": {"field": "source.yes", "type": "ordinal"}
}
} to test aggregate {
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"values": {
"hits": {
"hits": [
{"source": {"reco": 2, "yes": 1}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 2, "yes": 0}},
{"source": {"reco": 1, "yes": 3}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 0}},
{"source": {"reco": 1, "yes": 1}}
]
}
},
"format": {"property": "hits.hits"}
},
"transform": [{
"aggregate": [{
"op": "mean",
"field": "source.reco",
"as": "mean_reco"
}],
"groupby": ["source.yes"]
}],
"mark": "point",
"encoding": {
"x": {"field": "mean_reco", "type": "ordinal"},
"y": {"field": "source.yes", "type": "ordinal"}
}
} to test bin {
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"values": {
"hits": {
"hits": [
{"source": {"reco": 2, "yes": 1}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 2, "yes": 0}},
{"source": {"reco": 1, "yes": 3}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 0}},
{"source": {"reco": 1, "yes": 1}}
]
}
},
"format": {"property": "hits.hits"}
},
"transform": [{
"bin": true,
"field": "source.reco",
"as": "reco_binned"
}],
"mark": "point",
"encoding": {
"x": {"field": "reco_binned", "type": "ordinal"},
"y": {"field": "source.yes", "type": "ordinal"}
}
} to test window {
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"values": {
"hits": {
"hits": [
{"source": {"reco": 2, "yes": 1}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 2, "yes": 0}},
{"source": {"reco": 1, "yes": 3}},
{"source": {"reco": 3, "yes": 4}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 1}},
{"source": {"reco": 1, "yes": 0}},
{"source": {"reco": 1, "yes": 1}}
]
}
},
"format": {"property": "hits.hits"}
},
"transform": [{
"window": [{
"op": "mean",
"field": "source.reco",
"as": "mean_reco"
}],
"groupby": ["source.yes"]
}],
"mark": "point",
"encoding": {
"x": {"field": "mean_reco", "type": "ordinal"},
"y": {"field": "source.yes", "type": "ordinal"}
}
} Test escaped dots {
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"values": {
"hits": {
"hits": [
{"source.reco": 2 ,"source.yes": 1},
{"source.reco": 3 ,"source.yes": 4},
{"source.reco": 2 ,"source.yes": 0},
{"source.reco": 1 ,"source.yes": 3},
{"source.reco": 3 ,"source.yes": 4},
{"source.reco": 1 ,"source.yes": 1},
{"source.reco": 1 ,"source.yes": 1},
{"source.reco": 1 ,"source.yes": 1},
{"source.reco": 1 ,"source.yes": 0},
{"source.reco": 1 ,"source.yes": 1}
]
}
},
"format": {"property": "hits.hits"}
},
"mark": "point",
"encoding": {
"x": {"field": "source\\.reco", "type": "ordinal"},
"y": {"field": "source\\.yes", "type": "ordinal"}
}
} |
2fd6033
to
a855ad9
Compare
…ways access flattened fields after transforms.
Always flattens nested fields. Flattening happens when data is parsed. This is similar to what we are doing already since an expression always returns flattened data and we often generate expressions instead of suing vega's native format.parse.
Fixes #3369
Fixes #3368
Fixes #3251
Fixes #3441
The logic is now as follows: the input to any transform could be nested. After the transforms, all data is flat and should be accessed as such.