Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default ECDF behavior is not empirical #1467

Closed
BrandanW opened this issue Jan 4, 2016 · 3 comments
Closed

Default ECDF behavior is not empirical #1467

BrandanW opened this issue Jan 4, 2016 · 3 comments

Comments

@BrandanW
Copy link

@BrandanW BrandanW commented Jan 4, 2016

I've found that when plotting an ecdf with ggplot, the two points at y=0 and y=1 are undesirable and frankly arbitrary. This is especially true for distributions with certain properties (eg, strictly positive or only in a specific range). Is there a way to get around this default behavior, or any plans to change it?

Example:

ggplot(data.frame(x = exp(1:10)),
       aes(x)) +
  geom_line(stat = "ecdf")

rplot

@felasa
Copy link

@felasa felasa commented Jan 7, 2016

I believe stat_ecdf is meant to be used with geom_step

ggplot(data.frame(x = exp(1:10)),  aes(x)) +
 stat_ecdf()

or

ggplot(data.frame(x = exp(1:10)),  aes(x)) +
 geom_step(stat="ecdf")

both produce this

image

@BrandanW
Copy link
Author

@BrandanW BrandanW commented Jan 7, 2016

Even the step has two unneeded points, though, at about (-2000, 0) and (23700, 1). Wouldn't more rational endpoints be (min(x), 0) and (max(x), 0)? I don't see what the horizontal lines at top and bottom add.

@felasa
Copy link

@felasa felasa commented Jan 7, 2016

While I personally have no problem with that (it's not wrong and base R does the same although with a different visualization) I see what you mean.

This can be certainly be adressed with a few changes. If if the devs don't want it implemented you can create a custom stat:

stat_myecdf <- function(mapping = NULL, data = NULL, geom = "step",
                      position = "identity", n = NULL, na.rm = FALSE,
                      show.legend = NA, inherit.aes = TRUE, direction="vh", ...) {
  layer(
    data = data,
    mapping = mapping,
    stat = StatMyecdf,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      n = n,
      na.rm = na.rm,
      direction=direction,
      ...
    )
  )
}

StatMyecdf <- ggproto("StatMyecdf", Stat,
                    compute_group = function(data, scales, n = NULL) {

                      # If n is NULL, use raw values; otherwise interpolate
                      if (is.null(n)) {
                      # Dont understand why but this version needs to sort the values
                        xvals <- sort(unique(data$x))
                      } else {
                        xvals <- seq(min(data$x), max(data$x), length.out = n)
                      }

                      y <- ecdf(data$x)(xvals)
                      x1 <- max(xvals)
                      y0 <- 0                      
                      data.frame(x = c(xvals, x1), y = c(y0, y))
                    },

                    default_aes = aes(y = ..y..),

                    required_aes = c("x")
)

then any of:

ggplot(data=data.frame(x = exp(1:10)), aes(x)) + geom_step(stat="myecdf")
ggplot(data=data.frame(x = exp(1:10)), aes(x)) + stat_myecdf()

image

@hadley hadley closed this in 2f270c0 Jan 26, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants