-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
geom_density()
for bounded data
#3387
Comments
Here are more appropriate plots with plot limits completely representing current behavior: library(ggplot2)
set.seed(101)
ggplot(data.frame(x = runif(100)), aes(x)) +
geom_density() +
geom_density(color = "blue", bounds = c(0, 1)) +
stat_function(data = data.frame(x = c(0, 1)), fun = dunif, color = "red") ggplot(data.frame(x = rexp(100)), aes(x)) +
geom_density() +
geom_density(color = "blue", bounds = c(0, Inf)) +
stat_function(data = data.frame(x = c(0, 5)), fun = dexp, color = "red") Created on 2019-07-03 by the reprex package (v0.3.0) |
Sorry, a naive question: if |
Not a naive question at all! Arguments Here is an example. Black line demonstrates "raw" estimation, that goes outside of [0; 1] segment (which, we know should never happen). Blue line is a result with library(ggplot2)
set.seed(42)
x <- runif(100)
den_default <- as.data.frame(density(x)[c("x", "y")])
den_cut <- as.data.frame(density(x, from = 0, to = 1)[c("x", "y")])
ggplot(mapping = aes(x, y)) +
geom_line(data = den_cut, color = "blue", size = 2) +
geom_line(data = den_default, color = "black") Created on 2020-04-30 by the reprex package (v0.3.0) |
ok, thanks for the explanation!! |
Hello @echasnovski, This proposed function looks really cool and useful. If you don't mind me asking, could you share your code that created it? Thanks. Edit: Fantastic. Thank you so much. |
Thanks, @spspitze. I decided to make PR (#4013) for this feature to make it more publicly accessible.
# install.packages("remotes")
remotes::install_github("echasnovski/ggplot2", ref="geom_density-bounds")
|
Hi @echasnovski - thanks very much for this: I used your fork to good effect for a plot I was making, and also this issue made me aware of how the default clipped Reading further into the subject, and for anyone else interested, I found the evmix package, and trying the examples for I suspect ultimately it will be necessary to do the computations for this outside of ggplot2 (perhaps one can use these pre-calculated densities with |
This issue was fixed by #4013 |
Currently
geom_density()
directly usesdensity()
to compute kernel density estimation. Due to its nature, it has "extending property": output density can imply that values outside the range of input data are possible (with default "gaussian" kernel). This is a realistic practical setup, but there are cases when this is not true and data is bounded (for example, when only positive values are possible). It can be a good idea to support kernel density estimation on this type of bounded data with newbounds
argument ofgeom_density()
.In my opinion, one of the methods most easiest to implement, understand, and teach, is "reflection" method. There is one possible description on this page. Basically, boundary correction is done by doing "standard" kernel density estimation first, and then "reflecting" tails outside of desired interval to be inside. Densities inside and outside of desired interval are added together in "symmetric fashion":
d(x) = d_f(x) + d_f(l - (x-l)) + d_f(r + (r-x))
, whered_f
is density of input,d
is density of output,l
andr
are left and right edges of desired interval.I made some quick and dirty changes to ggplot2 for demonstration.
stat_density()
getsbounds
argument with default value ofc(-Inf, Inf)
. Here are some examples of proposed functionality:The text was updated successfully, but these errors were encountered: