-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support of ordinal based on pandas' ordered Categorical type? #245
Comments
Hi, thanks for the report! Currently, because of a bug in behavior for categoricals in Pandas' This is something that we should figure out how to address. In the meantime, you can specify the order manually within the encoding with, e.g. |
@jakevdp thanks for the feedback. I had to adapt your suggestion, since it's the color I wants, rather than x. this is what I end up with: c = Chart(data_samp)
cut_cat = ['Fair', 'Good', 'Very Good', 'Premium', 'Ideal']
cut_scale = Scale(domain=cut_cat, type='ordinal')
c.mark_circle().encode(x='carat', y='price',
color=Color('cut:O', scale=cut_scale)) so there are two remaining issues:
|
Huh... that's not great. I'm not certain why Vega-Lite requires you to specify 'ordinal' in both places, or why the color order is so strange. Maybe @kanitw would have ideas? |
Workaround for color ordering vega/altair#245 is imperfect.
Having the same issue. See this notebook where I use nominal ordering as a workaround to the color issue: Another issue arises where the legend ordering doesn't match the ordering in the plot. The legend is in the right order, but the area marks are incorrectly ordered. |
I think this would be worth posting as a bug to Vega-Lite itself. |
@pierre-haessig -- Thanks for reporting. This is definitely a bug. I just created vega/vega-lite#1732. The issue contains a workaround for this: using nominal type like @dhimmel suggests and set custom color range manually. (You might find colorbrewer useful.) We will make sure to fix this for the 2.0 release. (We probably won't fix this in 1.x since it should be very easy to fix this in Vega 3, but quite complicated to do this in Vega 2. Since we have a temporary workaround, we will focus our efforts on 2.0 development.) |
@kanitw I posted vega/vega-lite#1732 (comment) before I saw the previous comment.
It's not quite a workaround because the marks (bands) are not in the right order? Is there a way to fix that? |
For other people following this issue, here is a workaround for the following question.
|
FWIW the to_json segfault issue for Categorical dtype appears to be fixed in pandas: pandas-dev/pandas#12802 |
@dsaxton Did you PR work for using pandas categoricals? I would love this functionality i Altair and it seems promising that the pandas json bug has been fixed! |
It seemed to be working, although I didn't do any testing outside of Altair's CI |
Just to add another example where this issue is limiting. The Stacked Bar Chart example with Sorted Segments in the Example's gallery doesn't sort if one uses custom ordering on a pandas categorical data type. from vega_datasets import data
source = data.barley()
# custom ordering of categorical data type
site_lst = ['Crookston', 'Morris', 'University Farm','Duluth', 'Grand Rapids', 'Waseca']
source.site = pd.Categorical(source.site, site_lst, ordered = True)
# The stacks are not ordered according to the 'site' variable ordering
# They use alphabetical sort by default and I have no idea how to alter this to work.
alt.Chart(source).mark_bar().encode(
x='sum(yield)',
y='variety',
color='site',
order=alt.Order(
# Sort the segments of the bars by this field
'site',
sort= 'ascending'
)
) Things I have tried to no avail:
|
@sadzart Custom sorting is possible, by accessing undocumented created new fields during compilation.. probably not recommended. (related to vega/vega-lite#1734 (comment)) import altair as alt
import pandas as pd
from vega_datasets import data
source = data.barley()
# custom ordering of categorical data type
site_lst = ['Crookston', 'Morris', 'University Farm','Duluth', 'Grand Rapids', 'Waseca']
source.site = pd.Categorical(source.site, site_lst, ordered = True)
alt.Chart(source).mark_bar().encode(
x='sum(yield)',
y='variety',
color=alt.Color('site', sort=alt.Sort(site_lst)),
order=alt.Order('color_site_sort_index:Q',
sort='ascending'
)
) By introducing a To observe what happens to your chart you can inspect the data-viewer in the Vega editor. |
@mattijn Worked like a charm. Thanks |
That's a really neat solution @mattijn - but I'm having trouble with my dataset. I can change the legend order, but the order on the actual plot doesn't change. Here's a minimal example:
Ideal order would be - from left to right - N/A, 5 - Strongly Agree, 4 - Agree, etc... What I am able to accomplish is my ideal order from right to left. Inverting the sort_order list just changes the legend order, but the order the data is plotted. FYI: I know this is an issue with my plot because I can reproduce your example just fine. Thanks in advance |
The syntax of the undocumented feature is |
Whoops!! Good catch! Thanks that works. Full code for someone looking to reproduce:
|
This is surprisingly weird. The official documentation actually mentions a issue post that advises referencing an undocumented internal variable from vega-lite. Docs: https://altair-viz.github.io/user_guide/encodings/channels.html#order The issue referenced: vega/altair#245 (comment) I'm not quite sure how the alt.Order object works. But this implementation works for now. I would like to carefully verify or test this though.
I've just started to play with altair, using the diamonds dataset. Here is the notebook to clarify what I did https://gist.github.com/pierre-haessig/09fa9268aa0a0e7d91356f681f96ca18
Since, I'm not familiar with altair, I maybe missed something, but I've got the feeling that ordered Categorical types from pandas are not supported.
Indeed, if I use a color='cut' encoding, when cut is a pandas Series with an ordered category dtype, I get by default a nominal type of coloring (with "unordered" colors).
On the other hand, if I force the use of ordered with color='cut:O', I indeed get the ordered colored (the shades of green), but the order is wrong! (I get Fair, Good, Ideal, Premium, Very Good, while the correct order is 'Fair', 'Good', 'Very Good', 'Premium', 'Ideal', as manually defined in pandas' category)
The text was updated successfully, but these errors were encountered: