Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't impute binned field #4939

Open
iliatimofeev opened this issue May 4, 2019 · 6 comments
Open

can't impute binned field #4939

iliatimofeev opened this issue May 4, 2019 · 6 comments
Labels
Area - Data & Transform Bug 🐛 P3 Should be fixed at some point

Comments

@iliatimofeev
Copy link

Vega-lite partial spec (full VL spec)

  "encoding": {
    "x": {
      "bin": {"maxbins": 30},
      "field": "IMDB_Rating",
      "type": "quantitative"
    },
    "y": {
      "field": "IMDB_Votes",
      "type": "quantitative",
      "aggregate": "sum",
      "impute": {"value": 0}
    },
    "color": {"field": "Major_Genre", "type": "nominal"}

image

Expected (proposed Vega spec):

visualization - 2019-05-05T021304 126

problem with generated spec

"defined": {
    "signal": "datum[\"bin_maxbins_30_IMDB_Rating\"] !== null && 
              !isNaN(datum[\"bin_maxbins_30_IMDB_Rating\"]) &&                                                                              
              datum[\"sum_IMDB_Votes\"] !== null &&
              !isNaN(datum[\"sum_IMDB_Votes\"])"
 }

bin_maxbins_30_IMDB_Rating is undefined after impute, impute key is bin_maxbins_30_IMDB_Rating_mid

@iliatimofeev
Copy link
Author

Another not working impute of binned field
visualization - 2019-05-20T212224 116
vega editor

@domoritz
Copy link
Member

domoritz commented Jun 3, 2019

Looks like we have not implemented imputation correctly for binned data. I suspect we would be okay with imputing before binning but I'm not sure.

@kanitw kanitw added Bug 🐛 P2 Important Issues that should be fixed soon labels Jun 3, 2019
@iliatimofeev
Copy link
Author

I'm a bit confused with imputing before binning.

If I'm correct impute in the encoding of X and Y chanal interpreted as imputing with the key as the second axis. And if the second axis is binned than key should be binned field. So as I understand it should be binning before imputing

@kanitw kanitw added Help wanted 🙏 P3 Should be fixed at some point and removed P2 Important Issues that should be fixed soon labels Jun 8, 2019
@kanitw kanitw added this to the x.x Data & Transforms milestone Jun 8, 2019
@domoritz
Copy link
Member

domoritz commented Jun 9, 2019

imputing before binning: impute over the raw values (probably super expensive)
impute after binning: impute the binned values and add 0 in your example (probably better now that I think about it)

@kanitw
Copy link
Member

kanitw commented Jun 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Data & Transform Bug 🐛 P3 Should be fixed at some point
Projects
None yet
Development

No branches or pull requests

3 participants