Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

facet_wrap() fails with mixed-type values for color, such as str and float #589

Open
wildmichael opened this issue Jan 21, 2017 · 4 comments

Comments

@wildmichael
Copy link

When I try to do something like this:

df = pd.DataFrame([
        [0, 0, 0, 'foo'],
        [1, 1, 1, 'bar'],
        [2, 4, 8, 'froz'],
        [3, 9, 27, 'baz']],
    columns=['A', 'B', 'C', 'D'])
df = pd.melt(df, id_vars=['A', 'B'])
ggplot(df, aes(x='A', y='B', color='value')) + geom_point() + facet_wrap('variable', ncol=2)

I get the following TypeError:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\Anaconda2\envs\data\lib\site-packages\IPython\core\formatters.py in __call__(self, obj)
    670                 type_pprinters=self.type_printers,
    671                 deferred_pprinters=self.deferred_printers)
--> 672             printer.pretty(obj)
    673             printer.flush()
    674             return stream.getvalue()

C:\Anaconda2\envs\data\lib\site-packages\IPython\lib\pretty.py in pretty(self, obj)
    381                             if callable(meth):
    382                                 return meth(obj, self, cycle)
--> 383             return _default_pprint(obj, self, cycle)
    384         finally:
    385             self.end_group()

C:\Anaconda2\envs\data\lib\site-packages\IPython\lib\pretty.py in _default_pprint(obj, p, cycle)
    501     if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
    502         # A user-provided repr. Find newlines and replace them with p.break_()
--> 503         _repr_pprint(obj, p, cycle)
    504         return
    505     p.begin_group(1, '<')

C:\Anaconda2\envs\data\lib\site-packages\IPython\lib\pretty.py in _repr_pprint(obj, p, cycle)
    699     """A pprint that just redirects to the normal repr function."""
    700     # Find newlines and replace them with p.break_()
--> 701     output = repr(obj)
    702     for idx,output_line in enumerate(output.splitlines()):
    703         if idx:

C:\Anaconda2\envs\data\lib\site-packages\ggplot\ggplot.py in __repr__(self)
    114 
    115     def __repr__(self):
--> 116         self.make()
    117         # this is nice for dev but not the best for "real"
    118         if os.environ.get("GGPLOT_DEV"):

C:\Anaconda2\envs\data\lib\site-packages\ggplot\ggplot.py in make(self)
    626             self.apply_scales()
    627 
--> 628             legend, groups = self._construct_plot_data()
    629             self._aes.legend = legend
    630             for _, group in groups:

C:\Anaconda2\envs\data\lib\site-packages\ggplot\ggplot.py in _construct_plot_data(self)
    367                     continue
    368 
--> 369                 for item in sorted(data[colname].unique()):
    370                     mapper[item] = next(mapping)
    371 

TypeError: unorderable types: str() < int()

Removing color='value' from the aes() call does not trigger the issue.

Admittedly I am a novice to ggplot... But I suspect the problem is that the system tries to construct a common color scale, which of course is bound to fail the way it is handled now.

@DSLituiev
Copy link

DSLituiev commented Jan 22, 2017

The issue arises as ggplot.py:369 tries to sort mixed data type. One workaround is to use pd.Categorical df.value = df.value.astype('category'), however, than sorting will fail with current code. Moreover, the sorting seems redundant as far as ordinary dict is used, as the keys need to be sorted again in legend.py. However, in legend.py there is no way to sort the keys if they are of different data type (ordering is lost).

The only solution I see is to use sorted_unique function from utils.py in line ggplot.py: L362 and L369 and several lines earlier define mapper as an OrderedDict.

DSLituiev added a commit to DSLituiev/ggplot that referenced this issue Jan 22, 2017
@wildmichael
Copy link
Author

Haven't had the opportunity to try your commit, but just to note: If there was a way (and for my application I would prefer) to get an individual legend per graph, the problem would be also solved for me, because within a single graph I wouldn't be mixing data types.

@DSLituiev
Copy link

then i'd just split the data frame in two by a mask and plot them separately.

@wildmichael
Copy link
Author

wildmichael commented Jan 22, 2017 via email

wildmichael pushed a commit to wildmichael/ggplot that referenced this issue Jan 23, 2017
wildmichael pushed a commit to wildmichael/ggplot that referenced this issue Jan 23, 2017
wildmichael added a commit to wildmichael/ggplot that referenced this issue Jan 23, 2017
DSLituiev added a commit to DSLituiev/ggplot that referenced this issue Jan 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants