Skip to content

Allow fun in stat_summary_2d to return data.frame #2519

@baffelli

Description

@baffelli

I'm working with very dense timeseries (indeed very dense series of images), where I want to plot the spatial/temporal variability of certain parameters in a 2D histogram-like object. Since I have too many points, I do not want to simply use geom_raster or geom_tile, but I'd like to bin them into larger units first.
Of course, this is possible with stat_summary_2d; however I'd like to have more flexibility in controlling how the statistics are mapped to the aesthetics.
As an, example, I will be using a dataset from gstat. Here I'm plotting the mean PM10 concentration measured by several stations versus time and station altitude, where I bin the time in units of 30 days:

library(gstat)
library(sp)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(tidyverse)

data("DE_RB_2005")

msd <- function(x)
{
  data.frame(mean=mean(x), sd=sd(x))
}

#Re-add dates to STFDF object
DE_RB_2005.df <- as.data.frame(DE_RB_2005) %>% mutate(time=ymd(time))

#Binning units
bins <- c(7, 100)

p.base <- ggplot(DE_RB_2005.df, aes(x = time, y = station_altitude, z = PM10))

#This works
p1 <-  p.base + geom_raster(stat = "summary_2d",aes(fill = ..value..), binwidth = bins)

p1

And clearly this first approach works.
Now, suppose that stat_summary2d would accept a function such as msd, which takes a numerical vector and returns a dataframe; where the "secret double dot variables" ..mean and ..sd.. would be pulled from the dataframe returned when msd is called for each bin. This would be useful to simultaneously display several pieces of information about the distribution of PM10 inside each bin, by mapping these variables to different aesthetic:

 
#This does not work, would require fun to accept data.frame as output
p2 <- p.base + geom_raster(stat="summary_2d", fun=msd, aes(fill =..mean.., alpha=..sd..))

p2
#> Warning: Computation failed in `stat_summary2d()`:
#> replacement has 1302 rows, data has 651

This does not work, I presume the error message means that the returned dataframe is somehow flattened by appending both columns together.
Would it be possible/easy to change the behavior of stat_summary_2d to accept a dataframe as an output and to map the columns to double dot variables?

The "orthogonal" operations would be very useful as well, allowing to select multiple variables for z and having fun applied to each of the variables, similarly to dplyr::summarise_all:

#This is crazy and definetly does not work
p3 <- ggplot(DE_RB_2005.df, aes(x = time, y = station_altitude, z = c("PM10", "annual_mean_PM10"))) + geom_raster(stat = "summary_2d", fun = msd, aes(fill =..value_PM10.., alpha=..value_annual_mean_PM10..))

p3
#> Error in FUN(X[[i]], ...): object '..value_PM10..' not found

Clearly this does not work at the moment, I suppose it would require tidy evaluation or something of that sort.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions