Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsetting data in plot #116

Closed
kevindavenport opened this issue Dec 4, 2013 · 8 comments · Fixed by #124
Closed

Subsetting data in plot #116

kevindavenport opened this issue Dec 4, 2013 · 8 comments · Fixed by #124

Comments

@kevindavenport
Copy link

In R I can plot additional points based on some other criteria using the R subset command as follows:

%%R -i DF_diff_xy # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
library(ggplot2)
df = data.frame(DF_diff_xy)
plot = ggplot(df, aes(x = X, y = Y)) + 
geom_point(alpha = .8, color = 'dodgerblue',size = 5) +
geom_point(data=subset(df, Y >= 6.7 | X >= 4), color = 'red',size = 6) +
theme(axis.text.x = element_text(size= rel(1.5),angle=90, hjust=1)) +
ggtitle('Distance Pairs with outliers highlighted in red')
print(plot)

In Python my thinking was I could specify a row slice of a dataframe for additional highlight as so:

from ggplot import *

ggplot(DF_diff_xy, aes(x = 'X', y ='Y')) + \
    geom_point(alpha=1, size=100, color='dodgerblue') + \
    geom_point(data = DF_diff_xy[:1], alpha=1, color='black')

This didn't work however, any ideas?

Thanks,
Kevin Davenport
http://kldavenport.com

@jankatins
Copy link
Contributor

Currently it is not possible to specify data per geom.

@glamp Currently the ggplot._get_layers() method is not really the equivalent to to ggplots2 "layer": the real layer information is inggplot.geoms. But as the data is set inggplot._get_layers(...)` and the geoms are iterated afterwards, the geom (real "Layer") can't set it's own dataset. I would suggest change the iteration to

for geom in geoms:
    _data = geom.data or self.data
    for sub_layer in self._get_layers(_data):
         [...]

geom._init__() would then pop data from the args and save it to geom.data.

@jankatins
Copy link
Contributor

Ok, it's (of course... :-/ ) not as easy: when you do that (and transform the data from the geom with the aes like ggplot.__init__() does... geom specific aes mapping was also not implemented yet), then there is an error because the plotting code assumes that there are some "assigned colors", but as this is a new dataset, they aren't... So actually this also needs to look into how to assign colors and so on...

One way would be to refactor the assign_*(gg) functions to build_*_mapping(data, aes, legend, gg), which would set the needed columns in data based on the passed in aes and gg (only manual color mapping and so on...). But that's for tomorrow...

@jankatins jankatins mentioned this issue Dec 10, 2013
@jankatins
Copy link
Contributor

This can be closed

@kevindavenport
Copy link
Author

Awesome Jan, thank you for your contribution. I think I can update http://kldavenport.com/mahalanobis-distance-and-outliers/ now :)

@jankatins
Copy link
Contributor

Let's see if it works for you :-)

@jankatins
Copy link
Contributor

downloaded you ipynb and run it here: it works :-)

@jankatins
Copy link
Contributor

Just for the reference, here are the changes I had to do to the :

# needed because in latest pandas, the series are not anymore numpy arrays...
# see https://github.com/pydata/pandas/issues/5698
xydata = DF_diff_xy.values
xycols = DF_diff_xy.columns
--
%%R -i xydata,xycols # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
library(ggplot2)
df = data.frame(xydata)
names(df) <- c(xycols)
plot = ggplot(df, aes(x = X, y = Y)) + 
geom_point(alpha = .8, color = 'dodgerblue',size = 5) +
geom_point(data=subset(df, Y >= 6.7 | X >= 4), color = 'red',size = 6) +
theme(axis.text.x = element_text(size= rel(1.5),angle=90, hjust=1)) +
ggtitle('Distance Pairs with outliers highlighted in red')
print(plot)
--
from ggplot import *

ggplot(DF_diff_xy, aes(x = 'X', y ='Y')) + \
    geom_point(alpha=1, size=100, color='dodgerblue') + \
    geom_point(data = DF_diff_xy[(DF_diff_xy.Y >= 6.7) | (DF_diff_xy.X >= 4)],alpha=1, size = 100, color='red')  

@kevindavenport
Copy link
Author

Just tried it, works perfectly! Will start updating my blog post now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants