Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select data from external data frame for aesthetics? #262

Closed
naught101 opened this issue Apr 4, 2014 · 6 comments
Closed

Select data from external data frame for aesthetics? #262

naught101 opened this issue Apr 4, 2014 · 6 comments

Comments

@naught101
Copy link

Here is some R code, to explain what I want to do:

library(ggplot2)
# Create a dataframe that has a column mean per species:
iris_types <- by(iris[, c('Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width')], iris$Species, colMeans)
iris_types_df <-do.call(rbind, iris_types)
# Plot the original data, using the species averages as aesthetics:
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, colour=iris_types_df[iris$Species,'Petal.Length'])) + geom_point()

Which results in:
ggplot

Is there a way to do the above with python ggplot? I tried doing the same, but it gives an error:

from pandas import rpy
from ggplot import *
iris = rpy.load_data('iris')
iris_types_df = iris.groupby('Species').mean()
ggplot(iris, aes(x='Sepal.Length', y='Sepal.Width', 
                 colour=iris_types_df.ix[iris.Species,'Petal.Length'])) + geom_point()
...
TypeError: 'Series' objects are mutable, thus they cannot be hashed

Obviously, this is a pretty useless example, and I could have just put the means in the original data frame. But some of the data I'm plotting is really big, so putting a handful of cluster means into the original data frame leads to massive data redundancy, requiring lots of extra memory.

@has2k1
Copy link

has2k1 commented Apr 4, 2014

To break it down, there are two issues hidden in this. Being unable to accomplish the same objective by

either

ggplot(iris, aes(x='Sepal.Length', y='Sepal.Width', color=[5]*len(iris))) + geom_point()

or

ggplot(iris, aes(x='Sepal.Length', y='Sepal.Width')) + geom_point(aes(color=[5]*len(iris)))

There is some re-factoring going on -- the conclusion of which should have this both sides of this issue resolved or more straight forward to fix.

@naught101
Copy link
Author

So, the two issues are that:

  1. aesthetics should accept a list (or Series?) with length equal to the number of rows in the DataFrame, (as opposed to a string specifying a column) and
  2. ? that the subsetting is difficult if you're trying to do it via a string?

is #252 the refactoring you're talking about?

@has2k1
Copy link

has2k1 commented Apr 8, 2014

Yes for 1, as long as aesthetics do accept a list or more generally an array-type then your situation should work. The 2nd on has to do with the current state, in that all aes() mappings have to be put in the ggplot call.

#252 is done and the continuation is #266. However, I think it should end up being covered by the refactoring done by @JanSchulz. Somewhere in the comments at #221 is some structure of the refactorings although this issue isn't mentioned explicitly.

@jankatins
Copy link
Contributor

The "accept a list of values" should be easy: that's another if-case in ggplot.ggplot._apply_transforms().

@jankatins
Copy link
Contributor

this will probably be handled in #285

@jankatins
Copy link
Contributor

fun example: :

from pandas import rpy
from ggplot import *
iris = rpy.load_data('iris')
iris_types_df = iris.groupby('Species').mean()
ggplot(iris, aes(x='Sepal.Length', y='Sepal.Width', 
                 colour=iris_types_df.ix[iris.Species,'Petal.Length'])) + geom_point()

I that case len(iris) > len(iris_types_df) but len(iris_types_df.ix[iris.Species,'Petal.Length']) == len(iris)

This is the case from @has2k1, where one specifies a series as a mapping (aes(..., color=[....])) and will be handled in 285.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants