-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default ECDF behavior is not empirical #1467
Comments
Even the step has two unneeded points, though, at about (-2000, 0) and (23700, 1). Wouldn't more rational endpoints be (min(x), 0) and (max(x), 0)? I don't see what the horizontal lines at top and bottom add. |
While I personally have no problem with that (it's not wrong and base R does the same although with a different visualization) I see what you mean. This can be certainly be adressed with a few changes. If if the devs don't want it implemented you can create a custom stat: stat_myecdf <- function(mapping = NULL, data = NULL, geom = "step",
position = "identity", n = NULL, na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE, direction="vh", ...) {
layer(
data = data,
mapping = mapping,
stat = StatMyecdf,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
n = n,
na.rm = na.rm,
direction=direction,
...
)
)
}
StatMyecdf <- ggproto("StatMyecdf", Stat,
compute_group = function(data, scales, n = NULL) {
# If n is NULL, use raw values; otherwise interpolate
if (is.null(n)) {
# Dont understand why but this version needs to sort the values
xvals <- sort(unique(data$x))
} else {
xvals <- seq(min(data$x), max(data$x), length.out = n)
}
y <- ecdf(data$x)(xvals)
x1 <- max(xvals)
y0 <- 0
data.frame(x = c(xvals, x1), y = c(y0, y))
},
default_aes = aes(y = ..y..),
required_aes = c("x")
)
then any of: ggplot(data=data.frame(x = exp(1:10)), aes(x)) + geom_step(stat="myecdf")
ggplot(data=data.frame(x = exp(1:10)), aes(x)) + stat_myecdf() |
I've found that when plotting an ecdf with ggplot, the two points at y=0 and y=1 are undesirable and frankly arbitrary. This is especially true for distributions with certain properties (eg, strictly positive or only in a specific range). Is there a way to get around this default behavior, or any plans to change it?
Example:
The text was updated successfully, but these errors were encountered: