Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring: rebuild geom like the layers in ggplot2 #221

Closed
10 tasks
jankatins opened this issue Feb 23, 2014 · 27 comments
Closed
10 tasks

Refactoring: rebuild geom like the layers in ggplot2 #221

jankatins opened this issue Feb 23, 2014 · 27 comments

Comments

@jankatins
Copy link
Contributor

Currently a layer is just data (all variables needed for the matplotlib plot functions in a dic of lists https://github.com/yhat/ggplot/blob/master/ggplot/ggplot.py#L455 and http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html). In ggplot2 it is more (see https://github.com/hadley/ggplot2/blob/master/R/layer.r):

  • It contains the geom (and the plotting method -> like currently the geom subclasses)
  • it contains a dataframe with the needed variables (-> Like currently the result of ggplot._get_layer() and the _transform() methods)
  • if has a default "statistics", which would do some "translations" of values so that they can be used in the specific geom (e.g. if it is a geom_bar, use binning if x is a continuous value); can be doing nothing (-> similar to the current _transform() method)
  • it has some default scales (e.g. if it is geom_bar, use discrete scales)
  • a default position (jitter, bar charts with 100% or not,...)
  • a flag to display a guide/legend

The ggplot2 object contains a list of layers, like ggplot.geoms. So one way to refactor would be to rename geom to layer (or simple see the geom as a layer...) and add all the functionality. Adding it to a new layer object has the advantage that the layer can know about the ggplot object and the data in there, the geom must not have a reference to that (you must be able to add a geom to two different ggplot objects)

  • add a new layer object and refactor all __radd__() code to add such a object and not the geom itself.
  • pass in the layer data as a pandas.DataFrame and not as a dic of lists -> This would mean that geom_point.plot_layer() has to deal with different colors by doing a groupby instead of ggplot._get_layers(). As some geoms already construct again a dataframe from the dict of lists, I think this would make these geoms faster :-) [this step is independed of all other stuff here]
  • include a default statistics in ayer -> This means that something similar to _apply_transforms needs to be called if statistics != "indentity" and for each variable name like '..xxx..' the statistics needs to be asked to provide the values.
  • include default scale -> should add a scale_* to the plot?
  • include default position in geom -> In some cases that means changing the data which is plotted (jitter), in some cases changes the parameter to mathplotlib (geom_bar -> stacked, 100%, next to each other)
  • move the _get_layer code into the layer/geom and pass in the original (or transformed?) dataframe via a generic layer.plot(data, ax) (or refactor the current geom.plot_layer -> see geom: improvements to the base class #175)
  • move the legend triggering to the geom.plot_layer() code path -> only print legends for what we use not everything which is in aes(...)

Todo:

  • investigate what happens when you add geoms with different default scales to the ggplot object in ggplot2. Errors? Last one wins?
  • investigate when the transformation should happen and how to prevent data copying. factor() needs to happen at first but it would be nice to not copy the whole dataframe just to apply a different x axis statistics -> just store the diff to the main dataframe in the layer and then combine them during plotting?
  • How to make it possible to add stat_bin and get a complete layer? -> the statistics need to have a way to add a layer too? Could be done by defining __radd__() to produce the default geom for the statistics, adding the statistics to these geom and then add the geom to the ggplot object.
@has2k1
Copy link

has2k1 commented Mar 3, 2014

Looks like this is meant to resolve the TODOs in ggplot.py. It would
also make aesthetics applied to geom_* aware of the dataframe in ggplot.py, and maybe
even help clean up the way assign_visual_mapping is used. I have fixed a bug (#232) related to this but I think it all ends at this issue.

Is this correct per your vision; a ggplot object instantiates each layer, which takes a geom as a parameter?

@jankatins
Copy link
Contributor Author

No, the geoms add a layer and ggplot.draw() then does everything nessesary to build the needed things for the layer by calling the appropriate methods in the layer (data, ...). Currently this is mostly done in ggplot.draw() directly.

@has2k1
Copy link

has2k1 commented Mar 15, 2014

I have had more time to think about this, skimmed the ggplot2 code and now have a clearer sense of how the it would all come together.

I will focus on the geoms and stats, thats where I would like to contribute.

First the picture.

The layer would contain a geom and a stat. At some pointboth the geom and stat must be instantiated and ready to call the key methods.

The key methods are something like.

For the stat:

  • stat_xxx.calculate(data)
  • stat_xxx.calculate_groups(data)

For the geom:

  • geom_xxx.draw(data, ax),
  • geom_xxx.draw_groups(data, ax)

Other than in the declarations and in __radd__, the geom does not need to know about the stat, and viceversa. The data from the stat calculation is passed into the geom draw function all within the layer. The layer makes sure the aesthetics requirements are met and also deals with
their precedence.

Based on this picture and if you have a similar view, then from the stuff I have done in #246 I can rework the geoms and the stats into this shape. I could leave the plot method in place so that the code still works.

Also within the geoms and stats I would specify the DEFAULT_AES as a dictionary with the default aesthetic values. Together with the REQUIRED_AES that would make up all
the valid aesthetics. The defaults would stop being set in the plot function or hidden in the matplotlib specifications.

As the data splitting now moves to the compute and draw functions, we could have atleast two helper functions. One function returns a straight data.to_dict and the
other a custom groupby. That would cover majority of the munging scenarios.

@jankatins
Copy link
Contributor Author

IMO this looks good!

I've just one question: Why would you need a draw_groups and calculate_groups? For the facets? If so it needs to have a signature of draw_groups(data, groupby_variables. axes) but I think that faceting is better done in a method in ggplot (see my ideas in the other refactoring issue, which would basically all ggplot drawing into a faceting one, but in case of no faceting use a [[None,all_data]] list to iterate over instead of the groupby result).

If this is for the different lines (like currently in case we have different colors for different series of a line plot -> the reason why _get_layers() returns a list of data): I think that would be something which should be hidden in the plot_layer() method and data should become a DataFrame. plot_layer can now do a groupby (and some other plotting ones don't need to construct their own dataframes) and plot in a loop.

Otherwise that's almost exactly as I understand the ggplot2 code :-)

Thanks for your work!

@has2k1
Copy link

has2k1 commented Mar 15, 2014

Yes it is the case of different series of a line, but my mistake, I had ruled out the _groups functions as key. The geoms and stats can partition the work as they please.

I think I'm on the same track as to where the splitting would happen. Here is how
I imagine the stats and geoms getting to work.

ggplot.draw() in ggplot.py

for layer in self.layers:
    ax = plt.gca()
    layer.plot(ax)

And layer.plot() in layer.py

data = self.stat.compute(data)  # <-- groupbys in here
self.geom.draw(data, ax)          # <-- groupbys in here

Currently, the problem with _get_layers() is the location, but the code in it can be fashioned into helper function(s).

@jankatins
Copy link
Contributor Author

My idea was

# in ggplot.draw()
# construct the axis vie gridlayout, 
# must be arranged to suite the facets and the order in 
# which the groupby is done further down in layer.plot()
axis = [<axis for each facet or just one ax>]
for layer in self.layers:
    # stores the data in the layer -> color is the same over all facets, etc
    # does first additional aes transforms and then the statistics transforms
    # stores the data in the layer
    # must also submit/ return legends and axis limits? 
    # Not sure how to do that
    legends, limits = layer.compute_data(self.orig_data)
    self.adjust_axis_limits(limits)
    self.add_legends(legends)
    # now plot the layer:
    layer.plot(self, axis)

# in layer.plot(ggplot, axis)
if ggplot.facets:
     plot_data = self.data.groupby(ggplot.facets)
else:
     plot_data = [None, self.data]

# Here we need to have the same order...
for  ax, key, data in zip(axis, plot_data):
    self.geom.plot_layer(data, ax)

# in *some* specific geoms, like the line one:
# geom_line.plot(data, ax):
if "color" in data:
     # plot each line separatly
     for line_color, line_data in data.groupby("color"):
         ax.plot(line_color, line_data.x, line_data.y) 
else:
    ax.plot(default_color, data.x, data.y) 

@has2k1
Copy link

has2k1 commented Mar 16, 2014

Great, I am on the same page.

Anything missing from this?

class geom(object):
    """Base class of all Geoms"""
    DEFAULT_AES = dict()
    REQUIRED_AES = set()
    PARAMS = dict()
    TRANSLATIONS = dict()

    data = None
    aes = None
    manual_aes = None
    params = None

    def plot_layer(self, data, ax):
        pass
class stat(object):
    """Base class of all Stats"""
    DEFAULT_AES = dict()
    REQUIRED_AES = set()
    PARAMS = dict()
    TRANSLATIONS = dict()

    data = None
    aes = None
    manual_aes = None
    params = None

    def compute(self, data, ax):
        stuff = None   # For the legend and limits ??
        return (data, stuff)

About the legends and limits. I may have seen that hickup in components.assign_visual_mapping and functions it calls, especially colors.assign_colors().

Here is what I noted, uncertainity on my part and not a solution.

def assign_colors(data, aes, gg):
# TODO: Potential problem ahead. May be just move all legend stuff
# to ggplot, need gg.colormap, gg.legend and data[color_col] in one place.
# Why does the current code use gg.data, is it an error?
# Is it better to have the colormap set in the layer?

@jankatins
Copy link
Contributor Author

As I understood the stats they don't need a DEFAULT_AES and so on, just a compute and a __radd__(gg), which would add a default_geom(stat=self) to the gg.

Legends: currently the legends get set to everything which is in one (or more) of the aes, so in case you specify fill but no geom uses it it is still in the legends. That's why the geom should set legends, which know which aes will be used.

I think noone minds where they live. They need to now about stuff in ggplot (colors and so on), but actually they are a result of the geoms (which knows hat it has to plot...)

@has2k1
Copy link

has2k1 commented Mar 17, 2014

You are right about the stats, they use the default aesthetics for the geoms. As such the stat translations would also go away. Between my limited R familiarity, the proto system ggplot2 uses and thinking about whether we would be able to generate the documentation, I just kept it similar to the geoms. Filling them out would have been completely duplicate work. Until I am sure of the final form that you will expect I will not touch anything.

There is also an issue of the translations, as you may need to modify how they happen. In #246 I did not follow through in converting potential conflict translations e.g. {'color': 'edgecolor', 'fill': 'color'} into an ordered dictionary. I was too mindful of uniformity but I think that is something the aesthetic renaming function can expect and should have no problem with.

About the legends, somewhere in the ggplot2 documentation is a good but unimplemented idea of turning the show_guide parameter from a boolean into a list that controls which legends get shown. Can we can have this? Thanks anyway.

@jankatins
Copy link
Contributor Author

Re doc: I think it should be possible to use a decorator to generate/append the documentation about the aes, defaults, stats and so on.

re translations. I don't know what you mean. :-) Currently that's well defined: the first color is the color aestetic, the second color a variable name in the data frame. As such I would think that transform could be done by simple moving the def _apply_transforms(data, aes) function into the geom (or into a helper module which is then usable in ggplot (for the default data) and each geom (if it has aes/data). In the end each geom instance has a data property which either points to the same dataframe as ggplot.data (no geom specifc aes/data) or a new/copy DataFrame (a new aes/data)

Re legends: legends are currently in a state of "works but also need lots of love". ggplot2 has a way to disable/enable legends ("guides") and merge them (color + shape set to the same variable -> we show two legends, ggplot2 only one), ggplot can move them around,...

@has2k1
Copy link

has2k1 commented Mar 17, 2014

About translations, the potential conflict comes from the order in which they are carried out.

Consider these two,

{'color': 'edgecolor', 'fill': 'color'}
{'fill': 'color', 'color': 'edgecolor'}

It is the same dictionary but if renaming is done in the order of the second, the effective rename is

{'fill': 'edgecolor'}

@jankatins
Copy link
Contributor Author

I don't understand: if that's the aes (as in aes("color"='edgecolor', 'fill'='color')) and data is a pandas.DataFrame with two variables edgecolor and color then geom.data (or whatever the property will be called) will end up with a dataframe with two variables color (the former edgecolor) and fill (the former color variable). See https://github.com/yhat/ggplot/blob/master/ggplot/ggplot.py#L522, only that the resulting dataframe will not contain variables other than aes ones (basically as the first step don't data = data.copy() but only use the index to create a new one).

layer.transform_aes() would do:

# start with the copy of the default values for that geom
_aes = self.geom.default_aes.copy()
# Update it with the values n ggplot object:
# -> ggplot(aes(color="color_var")) -> color would be updatet, if it is 
# also in the default vars for that geom
for key, val in _aes:
    if key in self.ggplot.aes:
        _aes[key] = self.ggplot.aes[key]
# as a last step do the overwrites from the geom
if self.geom.aes:
    # must be all valid aes for that geom -> test in geom.__init__()
   _aes.update(self.geom.aes)
# utils is the former ggplot.py#_apply_transforms()
_data = self.ggplot.data if not self.geom.data else self.geom.data
self.data = utils.transform_aes(_aes, _data)

This is without caching (if no data or aes is in the geom, we could reuse self.ggplot.data but then we would have more than the needed variables in there).

The next step would then be to layer.transform_stats(), which would call self.data = self.stat.transform(self.data)

@has2k1
Copy link

has2k1 commented Mar 17, 2014

Sorry for the confusion,

I am referring to the translation declarations that a geom would declare.

TRANSLATIONS = {'color': 'edgecolor', 'fill': 'color'}

# same as above  but rearranged to show the edge case
# if renaming happens in this order.
TRANSLATIONS = {'fill': 'color', 'color': 'edgecolor'}

# solution would be an `OrderedDict` but the renaming function
# would have to be aware of this.
TRANSLATIONS = OrderedDict([('color', 'edgecolor'), ('fill', 'color')])

The geom would then expect that the plot_data passed to it has names compatible with matplolib.

Otherwise, the geoms are littered with duplicate code that does the renaming, see, the
translations achieve this.

On the issue related to aesthetics and transforms, have you thought about computed aesthetics, i.e

aes(y='..density..')

I will put up a gist for the structure of thegeoms and stats to cross-check with you before I start making the changes.

@jankatins
Copy link
Contributor Author

Ok, now I understand :-) I think that's one place which I would not abstract away.

I thing the "API" functions should take the data as produced from the aes/stats and the last step in that function should be to "translate" that into the mpl code. So splitting data into multiple series to plot different lines and renaming variables would go there. as well as constructing a dict from both the data and the other parameter (passed in via ax.plot(**variables)). If there is too much code duplication I think a helper function would be better.

..density.. is one of the interesting cases. I think it would be easiest if all variable names which satisfy r"^\\.\\.([a-zA-z._]+)\\.\\.$" (from ggplot2) should be filtered out in the transform_aes() call and should be added as a additional list to the transform_stats() call. transform_stats() would then have a dict("..density..":density_function,...) and would add this variables to the dataframe. ggplot also removes the dots, so the variable names get shorten to 'density' in legend, labels and so on.

@jankatins
Copy link
Contributor Author

BTW: this is the layer code in ggplot2: https://github.com/hadley/ggplot2/blob/master/R/layer.r and this is where it all comes together: https://github.com/hadley/ggplot2/blob/master/R/plot-build.r#L14

@has2k1
Copy link

has2k1 commented Mar 19, 2014

Makes sense, I like the idea of helper functions not being automatically applied, less magic stuff.

Here is the "roadmap" I will be following to get the geoms and stats in shape.

https://gist.github.com/has2k1/9637948

stats will declare the columns they create in CREATE. geom_position is how the pieces would come together. I have also included the positions, I don't know if you are including their adjustments in the layer / plotting building code that shouldn't affect the position classes.

There is still the issue of legends and limits, and as I understand that information will somehow be passed back from geom.plot_layer since that is where the final "groupbys" are.

@jankatins
Copy link
Contributor Author

geoms

I think DEFAULT_AES needs to be a dict (default = default values for each aes? In geom_point you actually use a dict :-)) and VALID_AES must be kept (as set()). I would initialize PARAMS as None, so that it needs to be set in each instantiated object (adding to the set will keep the object, which is then share between each object). I would also rename PARAMS as params, to be in line with aes and data. Maybe add a DEFAULT_PARAMS = dict(...) Idea: All class variables in CAPS, only the current set of data, aes mappings and params belong to each object.

I still think that the "translation" part should not become API or visible outside of the final plot() call, so I would (re)move the TRANSLATIONS variable to the plot() function.

I don't like the idea of a self._groupby() helper: I think

for key, _data in data.groupby("color"):
   _data = _data.to_dict()
   ax.plot(**_data) 

is actually much more readable?

One interesting thing is the handling of position: ggplot2 does the complete drawing (every plot is assembled from lines, boxes,...) but we are at a higher level, as we use matplotlib function for each type. So in some cases we transform x/y (adding jitter, -> which could belong to the layer) but in some other cases it means we have to use different plotting functions (bar charts: stack vs next to each other? ggplot2 would transform the x-y coordinates of the boxes...). I haven't look into what needs to handled here, so I think for now we should leave position out of the picture for this refactoring and use it as a parameter to the geom which does jittering in the plot function (as it is now). If this refactoring is one this can be revisited.

Stats

I don't think stats needs self.data and self.aes. Both are thingies of the geom or better the layer, the stat is only to map data (of the geom/layer) from one form into another. It needs input from the geom/layer, but I think that's better passed in via the method parameter (which you actually do in the stat_indentity(self, data) method :-) ). What stats needs is a DEFAULT_GEOM, so that __radd__() can construct such a thing and add that to the ggplot object:

def __radd__(gg):
   if self.DEFAULT_GEOM is None:
       raise Exception("no default geom associated with this statistics")
   geom = self.DEFAULT_GEOM(stats=self, aes(x=..., y=...))
   return gg + geom

Layer

This is currently missing:

class layer(object):
    gg = None # the ggplot handle, which this layer belongs to
    geom = None # the geom which will plot
    statistics = None # data transformation

    def __init__(geom, stats):
        #...assign...
        # geom.__radd__(gg) would then do 
        #      gg = deepcopy(gg)
        #      l = layer(self, self.stats)
        #      gg.layers.append(l)
        #      l.gg = gg

    def compute_data():
         # assemble the final aes mapping
         aes = self.geom.DEFAULT_AES.copy()
         # ... and so on: see above...
         # assemble every *data* which is needed for the plotting in one dataframe
         # the former ggplot._apply_transforms(...)
         data, legends = self.compute(self.geom.data or self.gg.data, aes)
         # do statistics transformations
         data = self.stats.compute(data)
         self.data = data
         self.legends = legends # or return? no idea...

   def plot(axis):
      if self.gg.facets:
             plot_params = zip(self.data.groupby(self.gg.facets), axis)
      else:
             plot_params = [((None, self.data), axis[0])]  
      for key, data, ax in plot_params: # not sure if unpacking work as it is actually `(key, data), ax`
           self.geom.plot_layer(data, axis)

legends and limits

ggplot2 has a step between compute_data and plot: it "trains" the limits and so on. Maybe we could use that as well?

legends: see above, I think that should be done somewhere in the layer.compute().

Limits is again tricky, similar to "position": it also depends on position (if it is stacked you get other y limits than simply using max(y) :-/ ) Not sure yet where it is best handled... Maybe also keep that as is in this refactoring.

It would be really interesting to check what happens if you use 'geom_point' and 'position_dodge' (or whatever...) in ggplot2 together... I will have to try that and see what happens :-)

@has2k1
Copy link

has2k1 commented Mar 19, 2014

Great input. I made some modifications to that reflection and also correct a few slip ups.

I am following the convention of all class variables as caps, though I did not realise that the TRANSLATIONS leak out into the API territory -- very undesirable. Also, as the valid
aesthetics can be extracted from the default aesthetics a (dict) and required aesthetics a (set) stating them would create a duplication.

About the stats having the data and aes variables, I still can't see how to avoid them. A stat created e.g stat_smooth(df, aes(x='x', y='y'), method='lm') would
have to hold the data and the aesthetics somewhere until they are passed on to the geom
specified in stat.DEFAULT_PARAMS['geom'] or stat.params['geom']. The other option is putting them in the stat.params.

I had imagined that both stats and geoms would have the form

def __radd__(self, gg):
    l = layer()
    # add stuff geom, stat, data, aes to layer
    gg.layers.append(l)

but if stat creates a geom then something like

def __radd__(gg):
   if self.DEFAULT_GEOM is None:
       raise Exception("no default geom associated with this statistics")
   geom = self.DEFAULT_GEOM(data=self.data, self.aes, stat=self)
   return gg + geom

For the positions, I'm only doing them to get proper jittering for the geom_point, but I got carried away when I saw that they are a simple concept in ggplot2. As per your query on position='dodge' for geom_point, nothing happens. Positions choose whether to apply themselves based on what aesthetics are mapped to the 'x' and 'y' scales. They choose from the following.

X = {'x', 'xmin', 'xmax', 'xend', 'xintercept'}
Y = {'y', 'ymin', 'xmax', 'yend', 'yintercept'}

So position='dodge' only cares about xmin and xmax and it creates them if they are missing. But geom_point uses x and y to do the potting. What is interesting though is what happens when you give position='jitter' to geom_bar, the world stays intact though.

@jankatins
Copy link
Contributor Author

Ok, I didn't see the stats_...(data, aes) case. Is that possible with ggplot2? I think it would be more consistent to put that into a hidden variable (_data , _aes or even _cache_aes), as it is not the same as geom.data.

Re position: when is ggplot2 actually applying position stuff? Somehow it look like it is in the geom? If so that should become a property of the geom, which would need to do some proper calling.

right now I can't think about anything else which is missing. I think we should go forward now: can you prepare a some more commits? One easy one would be to change the whole thing to let geom.plot_layer() accept Dataframes and change the 'one call per line' into 'one call per geom, do the line thingy groupby in geom.plot_layer' (this should still let all unittests pass). From there the next change would be the rest of the refactoring.

@has2k1
Copy link

has2k1 commented Mar 20, 2014

The positions are adjusted during the plot building, somewhere after the stats are
computed. In ggplot2 the user interface functions for stats and geoms i.e stat_abc and geom_abc are just helper functions to create a layer. So not even geom.data (or rather geom$data) exists, only layer.data, likewise for the aesthetic mappings and the position.

Comparing with ggplot2 layer state information

geom <- NULL          # a geom object with no state information 
geom_params <- NULL   # has the "manual aesthetics"
stat <- NULL          # a stat object with no state information
stat_params <- NULL   # has the "manual aesthetics"
data <- NULL          # from either the geom_xxx or stat_xxx
mapping <- NULL       # from either the geom_xxx or stat_xxx
position <- NULL      # from either the geom_xxx or stat_xxx
params <- NULL        # extra stuff
inherit.aes <- FALSE  # i haven't figured out how it is used

The ggplot2 stats and geoms are functional and keep no state. The
stat$compute and geom$draw functions are passed layer$stat_params
and layer$geom_params arguments respectively.

My picture of our current vision

layer.geom                    # object with plot_layer(), uses self.params
layer.geom.params             # args to geom_xxx excluding data, aes, manual_aes
layer.stat                    #object with compute(), uses self.params
layer.stat.params             # args to stat_xxx excluding data, aes, manual_aes
layer.geom.data               # from geom_xxx or stat_xxx
layer.geom.aes                # from geom_xxx or stat_xxx
layer.geom.manual_aes         # from geom_xxx or stat_xxx
layer.geom.params['position'] # from geom_xxx or stat_xxx
layer.params                  # ??

Our stats are stripped/hidden of some of their state information, i.e aes and data.

Whereas I think mimicking this aspect of ggplot2 (within reason of course) would not
involve much from what we are going for, I have been unable to think about
the potential consequences of it. My independent judgement has been compromised.

On the way forward, that would work.

@jankatins
Copy link
Contributor Author

I think we should go with this. The only other way I can think of is going with the ggplot2 way, with stat_* and geom_* beeng functions which produce layers and layer.geom (differently named) being a simple pointer to the 'plot_layer' function. But for my taste, passing around objects is nicer than passing around functions, so I would go for the variant like you described it above.

-> I thing further discussion can only happen directly with the code.

@has2k1
Copy link

has2k1 commented Mar 20, 2014

I have got geoms accepting dataframes, on this branch. I have replayed your commits in #235 onto current master, then my commit on top.

I will adapt some of the changes from #246 and clean up the duplication. The tests should still pass after that. While I have made changes to ggplot.py, I will not need to do so again.

How should we do this?

  • A PR
  • Parallel PRs, where I rebase my contributions onto your changes so that it's always a fast-forward.
  • ??

@jankatins
Copy link
Contributor Author

Just use your current PR #246 (or a branch which is based on that, if you want to play save). In the end that can be merged.

Yikes, isn't there any case where matplotlib accepts a rect + color instead of plotting each rect individually if it has a different color? Seems that I missed something here :-/ This code duplication is really awfull, sorry... Seems that we should declare a grouping set, which lists all grouping variables, and then pull the groupby code into geom? What do you think?

@has2k1
Copy link

has2k1 commented Mar 20, 2014

Yes the code duplication is sickening. For the grouping a single helper function should do. Each geom can control the groups using its own grouping set/list, [See](https://github.com/has2k1/ggplot/blob/refactor-layers-like ggplot2/ggplot/geoms/geom_point.py#L14), then call the helper e.g self._get_grouped_data(data, groups).

That way the geom is still in control, and well, if we get tired of seeing even those 3 or so lines everywhere once the project is a little more mature, then they can be tucked away.

A single grouping set as was done using ggplot.DISCRETE_AES has its own problems. Depending on how the matplotlib plot commands are used you could in some cases do without grouping by some of the otherwise discrete aesthetics, especially the color. So, for the rectangles color can be a list as ax.bar() does accept it.

I haven't thought about how the group parameter (once introduced) would affect such a set up,
stuff would probably be handled outside the geom.

jankatins added a commit that referenced this issue Mar 31, 2014
WIP: Refactor layers like ggplot2

Closes: #221
Closes: #160
@jankatins
Copy link
Contributor Author

This is merged in.

@jankatins
Copy link
Contributor Author

As the layer refactoring itself is not done, reopen.

@jankatins
Copy link
Contributor Author

@has2k1 has2k1 mentioned this issue Aug 18, 2014
11 tasks
@glamp glamp closed this as completed Jun 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants