Small cleanup of stat density #554

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
2 participants
@jiho
Contributor

jiho commented May 11, 2012

To keep the code clear and the documentation in line with the code.

jiho added some commits May 11, 2012

Allow density to expand scale bounds
In the previous implementation, density was estimated only within the
range of the scale, which is determined by default by the range of the
data. So, to see the effect of not trimming the data, the user had to
explicitly expand the scale.

In the new implementation, with trim = FALSE, the density is indeed
estimated up to 3 bandwidths after the bounds of the data, and this
expands the scale. But trim = TRUE by default (as was said in the
documentation but not respected in the code) so the final aspect should
stay the same in most circumstances.
@jiho

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho May 11, 2012

Contributor

The most important change is explained in defail in the commit message of 538d75a

Contributor

jiho commented May 11, 2012

The most important change is explained in defail in the commit message of 538d75a

@hadley

This comment has been minimized.

Show comment
Hide comment
@hadley

hadley May 11, 2012

Member

I'm pretty sure you have to explicitly supply the range of the density, otherwise stacked density don't work because they're not estimated on the same points.

Member

hadley commented on R/stat-density.r in 538d75a May 11, 2012

I'm pretty sure you have to explicitly supply the range of the density, otherwise stacked density don't work because they're not estimated on the same points.

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho May 11, 2012

Contributor

hum, probably indeed. But in that case, should we extend this range by cut bandwidth beyond the range of the full dataset, to allow the "normal" behaviour of density?
Otherwise, in the current implementation, there's no difference between trim=TRUE or FALSE (and this probably explains the discrepancy between the documentation and the code regarding the default value of trim)

Contributor

jiho replied May 11, 2012

hum, probably indeed. But in that case, should we extend this range by cut bandwidth beyond the range of the full dataset, to allow the "normal" behaviour of density?
Otherwise, in the current implementation, there's no difference between trim=TRUE or FALSE (and this probably explains the discrepancy between the documentation and the code regarding the default value of trim)

This comment has been minimized.

Show comment
Hide comment
@hadley

hadley May 11, 2012

Member

Hmmm, I don't think that will work either because we're not setting the bandwidth and so it will vary by group. Maybe trim should just be deprecated?

Alternatively, we could switch to some other density method that would work with all the data simultaneously.

Member

hadley replied May 11, 2012

Hmmm, I don't think that will work either because we're not setting the bandwidth and so it will vary by group. Maybe trim should just be deprecated?

Alternatively, we could switch to some other density method that would work with all the data simultaneously.

This comment has been minimized.

Show comment
Hide comment
@jiho

jiho May 11, 2012

Contributor

Or it could be computed with bw.nrd and then passed to density(); but that probably adds another layer of computation and complexity. On the other hand, I can't remember a situation where I needed trim indeed.

If trim is suppressed, the from and to arguments must be reinstated and there probably should be a message in the help indicating that, in that case, the area under the curve is not actually 1 (since this is what you would expect from a density estimation).

Contributor

jiho replied May 11, 2012

Or it could be computed with bw.nrd and then passed to density(); but that probably adds another layer of computation and complexity. On the other hand, I can't remember a situation where I needed trim indeed.

If trim is suppressed, the from and to arguments must be reinstated and there probably should be a message in the help indicating that, in that case, the area under the curve is not actually 1 (since this is what you would expect from a density estimation).

This comment has been minimized.

Show comment
Hide comment
@hadley

hadley May 12, 2012

Member

Maybe we should move to a model more like stat_bin - i.e. compute the bandwidth based on the complete data, and output a message saying what was used. Then trim could actually work because there would be a common bandwidth across all groups.

Member

hadley replied May 12, 2012

Maybe we should move to a model more like stat_bin - i.e. compute the bandwidth based on the complete data, and output a message saying what was used. Then trim could actually work because there would be a common bandwidth across all groups.

@hadley

This comment has been minimized.

Show comment
Hide comment
@hadley

hadley Aug 3, 2012

Member

See discussion of #595 - any fix is going to require backward incompatible changes.

Member

hadley commented Aug 3, 2012

See discussion of #595 - any fix is going to require backward incompatible changes.

@hadley

This comment has been minimized.

Show comment
Hide comment
@hadley

hadley Feb 24, 2014

Member

Unfortunately that change would break existing code so isn't suitable for inclusion in a very mature package like ggplot2.

Member

hadley commented Feb 24, 2014

Unfortunately that change would break existing code so isn't suitable for inclusion in a very mature package like ggplot2.

@hadley hadley closed this Feb 24, 2014

@jiho jiho deleted the jiho:fix/clean-stat-density branch Mar 17, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment