Stacking for geom_area doesn't properly handle missing entries #280

Closed
wch opened this Issue Dec 7, 2011 · 4 comments

Comments

Projects
None yet
3 participants
Collaborator

wch commented Dec 7, 2011

With a a stacked area graph, it requires an entry at every x for every group. If for a given x value, one group doesn't have an entry in the data frame, then it will behave as though the y value of that group at that x is zero.

The pictures will illustrate better:

dat <- data.frame(
        g=rep(LETTERS[1:3], each=4),
        x=rep(1:4, 3),
        y=rep(3:14))

# Remove row with g=B, x=3 
dat <- dat[-7,] 
dat

# Lines all look straight
ggplot(dat, aes(x=x, y=y, colour=g)) + geom_line()

# With a stacked area graph, there's a dip at x=3 
ggplot(dat, aes(x=x, y=y, fill=g)) + geom_area()

Test code:

test_that("Stacked area graph interpolates missing values", {
  dat <- data.frame(
           g=rep(LETTERS[1:3], each=4),
           x=rep(1:4, 3),
           y=rep(3:14))

  # Remove row with g=B, x=3 
  dat <- dat[-7,] 

  p <- ggplot_build(ggplot(dat, aes(x=x, y=y, fill=g)) + geom_area())

  topgroup_y <- with(p$data[[1]], y[x==3 & group==3] )
  expect_equal(topgroup_y, 27)  
})

I think fixing this one would require doing some interpolation. Perhaps solving this one is better left to the large changes to stacking code in the future?

Collaborator

kohske commented Dec 7, 2011

Just a note, although this is useful in some cases, I don't think this kind of automatic interpolation is good idea.
The purpose of visualization is to visually inspect how the data is.
But the automatic interpolation will make users miss the missing values.
Furthermore, there is no reason to apply liner interpolation. Why not smoothing, why not other filtering?
So, in my view, interpolation should be done by users' hand.
Or, at least, the explicit way, such as stat_interpolate, should be provided. But maybe this is beyond the scope of "plotting."
Another way is to simply induce an error or a warning when missing values are detected.

Owner

hadley commented Dec 7, 2011

Yeah, I'm totally with @koshke on this one. It gets even more complicated if you consider longitudinal data where possibly none of the time points align.

But I think it's worth having some tool that will do this, just not automatically. Something to consider for 1.0

Collaborator

wch commented Dec 7, 2011

That makes sense. I think it would be a good idea to have an informative warning message so that people know how to deal with the issue if they encounter it.

Owner

hadley commented Feb 24, 2014

This sounds like a great feature, but unfortunately we don't currently have the development bandwidth to support it. If you'd like to submit a pull request that implements this feature, please follow the instructions in the development vignette.

hadley closed this Feb 24, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment