New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do I understand binning correctly? #59
Comments
Not sure about the second point, you mentioned having two If you really means having two (Please correct me if I'm wrong.) |
I think the decision whether ordinal or quantitative depends on the number of bins. |
The point I made is only valid for binning (which makes it O). Not O in general. |
True, I think ordinal might make sense with some binning method. (e.g., binning ordinal/string with some hashing function) However, given we currently only do uniform binning for quantative with size ≤ 20. |
Agreed. What I meant is the way we show the ticks on the axis. If we only have 5 mins, we can label each with 0-10, 10-20, 20-30 and 30-40. Otherwise just show ticks like we do for Q. |
I guess we’re done with this question. Please reopen if you disagree. |
I'm just trying to confirm the semantics of binning.
To create a histogram, we need to use the same column for x and y (y could also be another quantitative field) and choose to bin x and sum y (in some cases another aggregation might make sense). Also, we need to set the type of the binned field to O to fix the labels. I believe at some point we want to show labels that show the range and not hide groups with no values but we can leave that for later.
Vegalite atomically chooses a good bin size that optimizes for 1) not too many bins 2) nice bin widths (based on the base, usually 10). The only input to this binning function are the min and max.
Choosing two ordinals only makes sense if we also add alpha, size, or a color scale to show the count (without a field). As an alternative, we could use the sum, max, ... of the quantitative scale to map to the alpha, size, or color. But it will never make sense to use another field that we haven't used (?).
What I'm trying to understand is what the limitations are and what things we can propagate automatically (or disable in the interface). I'm not yet seeing the generalized rules but will think more about this.
The text was updated successfully, but these errors were encountered: