Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxplots with many identical values or just one value are missing the median line #8126

Closed
joelostblom opened this issue Apr 25, 2022 · 8 comments · Fixed by #8339
Closed
Labels

Comments

@joelostblom
Copy link
Contributor

For datasets with many identical values, it is understandable that there is no box drawn for q1 and q3, but the median line at q2 should always be present. Currently only outliers are shown which is confusing since it gives the indication that the dataset only contains a few observations, rather than potentially many observations compressed at the same value.

image
Open the Chart in the Vega Editor

My expectation would be to see the chart like it is shown in seaborn:
image

@joelostblom
Copy link
Contributor Author

joelostblom commented Apr 26, 2022

On a closer look, it appears that the median line is actually there, but since it is drawn in white, it is invisible unless the colored box is present or the chart background is dark:

image

It would be nice to add some logic that changes the color of this line to the color of the box/outliers when the box is not present (so blue in this case). This could be thought of as compressing the box (q1 and q3) to a line at q2 and draw it on top of the median line.

Maybe also increasing the thickness slightly to 2 (only when there is no box), leading to this appearance, which I think makes it clear what is going on:

image

@domoritz
Copy link
Member

Ahh, good catch. The tricky bit is that Vega-Lite never sees the data so we have to build the logic in Vega spec.

@joelostblom
Copy link
Contributor Author

joelostblom commented Apr 30, 2022

Another scenario where the current behavior makes it hard to detect the median, is if it is the same as one of the quartiles as in this case:

image

An alternative to introducing logic for these special cases on the Vega side of things would be to change the default median line to a black thicker line (the same grey as the whiskers is hard to see):

image

This doesn't look quite as great as white in most cases, but it does solve both the edge cases I have reported here.

image

image
Open the Chart in the Vega Editor

Another example:

image

image

@domoritz
Copy link
Member

Could we add a colored outline around a white line?

@joelostblom
Copy link
Contributor Author

I tried that a little before, but it was difficult to get the top and bottom of the outline flush with the box, since the corners seem to be a bit rounded regardless of the cap style I choose:

image

If you are OK with the median line being contained within the box (rather than the current appearance of splitting the box in two), then I think it can work:

image
Open the Chart in the Vega Editor (not sure if some might consider it incorrect that a bit of the box seems to stick out under the median due to the outline although their value is exactly the same, this is a very small difference though).

image
Open the Chart in the Vega Editor

@domoritz
Copy link
Member

domoritz commented May 3, 2022

I'll defer to @kanitw who might have a better idea.

@kanitw kanitw changed the title Boxplots with many identical values are missing the median line Boxplots with many identical values or just one value are missing the median line Nov 14, 2023
@kanitw
Copy link
Member

kanitw commented Nov 14, 2023

Boxplot with just one value also suffers from this problem.

I think another option to consider is to do conditional encoding (don't use white color if max === median === max)?

@joelostblom
Copy link
Contributor Author

Yes, that sounds like a good alternative too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants