Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geom_area with holes in the negative y-range #2802

Closed
PabloRMira opened this issue Aug 4, 2018 · 12 comments
Closed

geom_area with holes in the negative y-range #2802

PabloRMira opened this issue Aug 4, 2018 · 12 comments

Comments

@PabloRMira
Copy link

PabloRMira commented Aug 4, 2018

As I already described here

https://stackoverflow.com/questions/51656490/holes-with-geom-area-ggplot2-for-negative-y-range

there is an issue with geom_area. When using geom_area with negative values, some holes arise on the negative y-range. Interestingly, this weird behavior does not occur on the positive y-range. Here is a reproducible example, that illustrates what I mean

library(ggplot2)
library(data.table)

data <- data.table(index = c(0, 1), 
                   x1 = c(-1, -1.5), 
                   x2 = c(-1, 0), 
                   x3 = c(0, -1))
mdt <- melt(data, id.vars = "index")

print(data)
#>    index   x1 x2 x3
#> 1:     0 -1.0 -1  0
#> 2:     1 -1.5  0 -1

# Negative range: Holes
ggplot(data = mdt, aes(x = index, y = value, fill = variable)) +
  geom_area(position = "stack")

# Positive range: No holes
ggplot(data = mdt, aes(x = index, y = abs(value), fill = variable)) +
  geom_area(position = "stack")

Created on 2018-08-04 by the reprex package (v0.2.0).

In the first plot the problem is that x2 is changing from -1 to 0 and x3 from 0 to -1 at the same time, hence producing a hole. But again, this only seems to occur on the negative y-range. As you can see in the second plot, the stacking behavior of geom_area is completely satisfactory with no holes between the areas. Therefore my supposition that this may be due to a bug.

Thank you in advance for you help!

EDIT: As pointed out by @smouksassi (many thanks for this!), a quick & dirty workaround would be to set the zeros to some very small negative value as here:

library(ggplot2)
library(data.table)

data <- data.table(index = c(0, 1),
                   x1 = c(-1, -1.5),
                   x2 = c(-1, -1e-36), # <- Changed!
                   x3 = c(-1e-36, -1)) # <- Changed!
mdt <- melt(data, id.vars = "index")

print(data)
#>    index   x1     x2     x3
#> 1:     0 -1.0 -1e+00 -1e-36
#> 2:     1 -1.5 -1e-36 -1e+00

# Negative range: No holes anymore
ggplot(data = mdt) +
  geom_area(aes(x = index, y = value, fill = variable), position = "stack")

Created on 2018-08-04 by the reprex package (v0.2.0).

@batpigandme

This comment has been minimized.

@PabloRMira PabloRMira changed the title geom_area with holes on the negative y-axis geom_area with holes in the negative y-range Aug 4, 2018
@smouksassi
Copy link

This is because we are mixing negative values and zeros ( which are considered positive ?).

The issues goes if you replace the zero y values by a small negative e.g. -0.01.

@ptoche
Copy link

ptoche commented Aug 4, 2018

Possibly related: #2803

@clauswilke
Copy link
Member

I think we need to step back for a second and have a discussion about what stacking with negative areas is supposed to mean and do. It is not clear to me, and I'd say a priori it's not well defined. The current code seems to work (mostly) when all areas are negative, and it interprets that to mean that the stacking should go down rather than up. But when we mix positive and negative values, what should happen? Stacking implies that there should be no holes and no overlapping areas, but a mix of positive and negative numbers will possibly create one or the other or both.

@ptoche
Copy link

ptoche commented Aug 4, 2018

Fair question. My understanding of stacking with negative values is what the first graph of #2803 does (old behaviour), that is subtract from current height. That first chart is a reproduction of a chart made with some other software and published in a book, suggesting that it's the expected behaviour for some people at least. Also the reprex for #2803 (current behaviour) cannot possibly be the expected behaviour.

@clauswilke
Copy link
Member

The current code stacks positive values up and negative values down. It works fine when each series is entirely positive or entirely negative. However, when a series contains both positive and negative values, then things get wonky. I'm still not sure how the second example should be treated.

library(ggplot2)
df <- data.frame(
  id = rep(letters[1:4], each = 3),
  x = rep(1:3, 4),
  y = c(1, 2, 1, 2, 2, 3, -1, -2, -2, -3, -1, -2)
)

ggplot(df, aes(x, y, fill = id)) +
  geom_area(position = "stack")

df <- data.frame(
  id = rep(letters[1:4], each = 3),
  x = rep(1:3, 4),
  y = c(1, 2, 1, 2, 2, 3, -1, 1, -1, -3, -1, -2)
)

ggplot(df, aes(x, y, fill = id)) +
  geom_area(position = "stack")

Created on 2018-08-04 by the reprex package (v0.2.0).

@clauswilke
Copy link
Member

@ptoche I'm not convinced the behavior you describe would be appropriate for position_stack(). In my mind, stacked areas should not overlap. One could write an alternative position adjustment that behaves the way you describe, and maybe call it position_accumulate() (or more originally, position_stack2() :-) ).

For the current position_stack(), maybe there should be a warning for data series containing a mix of positive and negative numbers?

@PabloRMira
Copy link
Author

@clauswilke: a practical way to avoid the overlapping and the holes for mixed series would be to separate the positive and the negative series as showed below:

library(ggplot2)

df <- data.frame(
  id = rep(letters[1:4], each = 3),
  x = rep(1:3, 4),
  y = c(1, 2, 1, 2, 2, 3, -1, 1, -1, -3, -1, -2)
)

df$positive <- ifelse(df$y >= 0, df$y, 0)
df$negative <- ifelse(df$y < 0, df$y, -1e-36)

ggplot(df) +
  geom_area(aes(x=x, y=positive, fill=id)) +
  geom_area(aes(x=x, y=negative, fill=id))

Created on 2018-08-04 by the reprex package (v0.2.0).

@hadley
Copy link
Member

hadley commented Aug 4, 2018

Given the discussion, it appears to me that the stacking behaviour is correct for simple cases (all positive and all negative) and it’s not clear what should happen for mixed cases. For that reason, I think this is out of scope for ggplot2: a fuller implementation would be more appropriate in an extension package.

@hadley hadley closed this as completed Aug 4, 2018
@ptoche
Copy link

ptoche commented Aug 4, 2018

@clauswilke, yes I think your suggestion is reasonable. One could just pull the old code out and name it position_stack2() or position_cumulate() or something along these lines. The chart I show in #2803 is part of a replication project, so it's a bit of a shame to lose the ability to reproduce that series of charts. A warning along the lines of "The current implementation of position_stack() only handles strictly positive or strictly negative values". This issue is certainly going to bite every once in a while.

@clauswilke
Copy link
Member

@ptoche I've thought about adding a warning, but the current behavior is completely fine for other geoms (see below). And the current behavior is properly documented. So I agree with @hadley that there's nothing to be done on the ggplot2 codebase itself.

library(ggplot2)
df <- data.frame(
  id = rep(letters[1:4], each = 3),
  x = rep(1:3, 4),
  y = c(1, 2, 1, 2, 2, 3, -1, 1, -1, -3, -1, -2)
)

ggplot(df, aes(x, y, fill = id)) +
  geom_col(position = "stack")

Created on 2018-08-04 by the reprex package (v0.2.0).

@lock
Copy link

lock bot commented Jun 13, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jun 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants