Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.2.0 #25

Merged
merged 5 commits into from
May 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
.Rhistory
.RData
.Ruserdata
revdep
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: kde1d
Type: Package
Title: Univariate Kernel Density Estimation
Version: 0.1.2
Version: 0.2.0
Authors@R: c(
person("Thomas", "Nagler",, "mail@tnagler.com", role = c("aut", "cre")),
person("Thibault", "Vatter",, "thibault.vatter@gmail.com", role = c("aut"))
Expand Down
17 changes: 17 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
# kde1d 0.2.0

NEW FEATURES

* improved stability of density estimates near a boundary (#21).

BUG FIXES

* consistent behavior when `dkde1d()` and `pkde1d()` are called with
non-`ordered` input although data are discrete (#19).

* fixed bug in computation of kernel density estimates (#20).

* adapt minimum `bw` allowed for discrete data to truncated Gaussian kernel
(#20).


# kde1d 0.1.2

NEW FEATURES
Expand Down
60 changes: 6 additions & 54 deletions R/kde1d.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,33 +50,31 @@
#' Statistical Society, Series B, 53, 683–690.
#'
#' @examples
#' ## For reproducibility
#' set.seed(0)
#'
#' ## unbounded data
#' x <- rnorm(100) # simulate data
#' x <- rnorm(500) # simulate data
#' fit <- kde1d(x) # estimate density
#' dkde1d(1000, fit) # evaluate density estimate
#' dkde1d(0, fit) # evaluate density estimate
#' summary(fit) # information about the estimate
#' plot(fit) # plot the density estimate
#' curve(dnorm(x), add = TRUE, # add true density
#' col = "red")
#'
#' ## bounded data, log-linear
#' x <- rgamma(100, shape = 1) # simulate data
#' x <- rgamma(500, shape = 1) # simulate data
#' fit <- kde1d(x, xmin = 0, deg = 1) # estimate density
#' dkde1d(1000, fit) # evaluate density estimate
#' dkde1d(seq(0, 5, by = 1), fit) # evaluate density estimate
#' summary(fit) # information about the estimate
#' plot(fit) # plot the density estimate
#' curve(dgamma(x, shape = 1), # add true density
#' add = TRUE, col = "red",
#' from = 1e-3)
#'
#' ## discrete data
#' x <- rbinom(100, size = 5, prob = 0.5) # simulate data
#' x <- rbinom(500, size = 5, prob = 0.5) # simulate data
#' x <- ordered(x, levels = 0:5) # declare as ordered
#' fit <- kde1d(x) # estimate density
#' dkde1d(2, fit) # evaluate density estimate
#' dkde1d(sort(unique(x)), fit) # evaluate density estimate
#' summary(fit) # information about the estimate
#' plot(fit) # plot the density estimate
#' points(ordered(0:5, 0:5), # add true density
Expand Down Expand Up @@ -109,49 +107,3 @@ kde1d <- function(x, xmin = NaN, xmax = NaN, mult = 1, bw = NA, deg = 2) {
class(fit) <- "kde1d"
fit
}

#' check and pre-process arguments passed to kde1d()
#' @noRd
check_arguments <- function(x, mult, xmin, xmax, bw, deg) {
stopifnot(NCOL(x) == 1)

if (!is.ordered(x) & is.factor(x))
stop("Factors not allowed; use kdevine::kdevine() or cctools::cckde().")

stopifnot(mult > 0)

if (is.ordered(x) & (!is.nan(xmin) | !is.nan(xmax)))
stop("xmin and xmax are not meaningful for x of type ordered.")

if (!is.nan(xmax) & !is.nan(xmin)) {
if (xmin > xmax)
stop("xmin is larger than xmax.")
if (any(x < xmin) || any(x > xmax))
stop("Not all data are contained in the interval [xmin, xmax].")
} else if (!is.nan(xmin)) {
if (any(x < xmin))
stop("Not all data are larger than xmin.")
} else if (!is.nan(xmax)) {
if (any(x > xmax))
stop("Not all data are samller than xmax.")
}

if (!(deg %in% 0:2))
stop("deg must be either 0, 1, or 2.")
}

#' adjusts observations and evaluation points for boundary effects
#' @importFrom stats qnorm
#' @noRd
boundary_transform <- function(x, xmin, xmax) {
if (!is.nan(xmin) & !is.nan(xmax)) { # two boundaries
x <- qnorm((x - xmin) / (xmax - xmin + 1e-1))
} else if (!is.nan(xmin)) { # left boundary
x <- log(x - xmin + 1e-3)
} else if (!is.nan(xmax)) { # right boundary
x <- log(xmax - x + 1e-3)
}

x
}

51 changes: 31 additions & 20 deletions R/kde1d_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,21 @@ dkde1d <- function(x, obj) {
if (!is.ordered(x))
stopifnot(!is.factor(x))

x <- expand_as_numeric(x)
fhat <- dkde1d_cpp(x, obj)
# adjust grid to stabilize estimates
rng <- diff(range(obj$grid_points))
if (!is.nan(obj$xmin))
obj$grid_points[1] <- obj$xmin - 0.1 * rng
if (!is.nan(obj$xmax))
obj$grid_points[length(obj$grid_points)] <- obj$xmax + 0.1 * rng

if (length(obj$jitter_info$i_disc) == 1 & !is.ordered(x))
x <- ordered(x, obj$jitter_info$levels$x)

fhat <- dkde1d_cpp(expand_as_numeric(x), obj)

if (length(obj$jitter_info$i_disc) == 1) {
# for discrete variables we can normalize
x_all_num <- expand_as_numeric(as.ordered(obj$jitter_info$levels$x))
f_all <- dkde1d_cpp(x_all_num, obj)
f_all <- dkde1d_cpp(seq_along(obj$jitter_info$levels$x) - 1, obj)
fhat <- fhat / sum(f_all)
}

Expand All @@ -60,15 +68,14 @@ pkde1d <- function(q, obj) {
if (!is.ordered(q))
stopifnot(!is.factor(q))

q <- expand_as_numeric(q)

if (length(obj$jitter_info$i_disc) != 1) {
p <- pkde1d_cpp(q, obj)
} else {
# for discrete variables we have to add the missing probability mass
x_all_num <- expand_as_numeric(as.ordered(obj$jitter_info$levels$x))
f_all <- dkde1d(x_all_num, obj)
p <- sapply(q, function(y) sum(f_all[x_all_num <= y]))
if (!is.ordered(q))
q <- ordered(q, obj$jitter_info$levels$x)
x_all <- as.ordered(obj$jitter_info$levels$x)
p_all <- dkde1d(x_all, obj)
p <- sapply(q, function(y) sum(p_all[x_all <= y]))
p <- pmin(pmax(p, 0), 1)
}

Expand All @@ -81,19 +88,17 @@ pkde1d <- function(q, obj) {
qkde1d <- function(p, obj) {
stopifnot(all(na.omit(p) > 0.0) & all(na.omit(p) < 1.0))
if (length(obj$jitter_info$i_disc) != 1) {
## qkde1d_cpp for continuous variables
q <- qkde1d_cpp(p, obj)
} else {
## for discrete variables compute quantile from the density
x_all_num <- expand_as_numeric(as.ordered(obj$jitter_info$levels$x))
x_all <- as.ordered(obj$jitter_info$levels$x)

# pdf at all possible values of x
dd <- dkde1d(x_all_num, obj)
pp <- c(cumsum(dd)) / sum(dd)
pp <- pkde1d(x_all, obj)

# generalized inverse
q <- x_all_num[vapply(p, function(y) which(y <= pp)[1], integer(1))]
q <- ordered(obj$jitter_info$levels$x[q + 1],
levels = obj$jitter_info$levels$x)
q <- x_all[vapply(p, function(y) which(y <= pp)[1], integer(1))]
}

q
Expand Down Expand Up @@ -156,13 +161,19 @@ plot.kde1d <- function(x, ...) {
ev <- ordered(x$jitter_info$levels$x,
levels = x$jitter_info$levels$x)
plot_type <- "h" # for discrete variables, use a histrogram
x$values <- dkde1d(ev, x)
x$grid_points <- ev
} else {
# adjust grid if necessary
ev <- x$grid_points
if (!is.nan(x$xmin))
ev[1] <- x$xmin
if (!is.nan(x$xmax))
ev[length(ev)] <- x$xmax
}
vals <- dkde1d(ev, x)

pars <- list(
x = x$grid_points,
y = x$values,
x = ev,
y = vals,
type = plot_type,
xlab = "x",
ylab = "density",
Expand Down
53 changes: 53 additions & 0 deletions R/tools.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#' check if data violate boundaries
#' @noRd
check_boundary_violations <- function(x, xmin, xmax) {
if (!is.nan(xmax) & !is.nan(xmin)) {
if (any(x < xmin) || any(x > xmax))
stop("Not all data are contained in the interval [xmin, xmax].")
} else if (!is.nan(xmin)) {
if (any(x < xmin))
stop("Not all data are larger than xmin.")
} else if (!is.nan(xmax)) {
if (any(x > xmax))
stop("Not all data are samller than xmax.")
}
}

#' check and pre-process arguments passed to kde1d()
#' @noRd
check_arguments <- function(x, mult, xmin, xmax, bw, deg) {
stopifnot(NCOL(x) == 1)

if (!is.ordered(x) & is.factor(x))
stop("Factors not allowed; use kdevine::kdevine() or cctools::cckde().")

stopifnot(mult > 0)

if (is.ordered(x) & (!is.nan(xmin) | !is.nan(xmax)))
stop("xmin and xmax are not meaningful for x of type ordered.")

if (!is.nan(xmax) & !is.nan(xmin)) {
if (xmin > xmax)
stop("xmin is larger than xmax.")
}
check_boundary_violations(x, xmin, xmax)

if (!(deg %in% 0:2))
stop("deg must be either 0, 1, or 2.")
}

#' adjusts observations and evaluation points for boundary effects
#' @importFrom stats qnorm
#' @noRd
boundary_transform <- function(x, xmin, xmax) {
if (!is.nan(xmin) & !is.nan(xmax)) { # two boundaries
x <- qnorm((x - xmin) / (xmax - xmin + 1e-1))
} else if (!is.nan(xmin)) { # left boundary
x <- log(x - xmin + 1e-3)
} else if (!is.nan(xmax)) { # right boundary
x <- log(xmax - x + 1e-3)
}

x
}

6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ devtools::install_github("tnagler/kde1d")
``` r
x <- rnorm(100) # simulate data
fit <- kde1d(x) # estimate density
dkde1d(1000, fit) # evaluate density estimate
dkde1d(0, fit) # evaluate density estimate
summary(fit) # information about the estimate
plot(fit) # plot the density estimate
curve(dnorm(x), add = TRUE, # add true density
Expand All @@ -48,7 +48,7 @@ curve(dnorm(x), add = TRUE, # add true density
``` r
x <- rgamma(100, shape = 1) # simulate data
fit <- kde1d(x, xmin = 0, deg = 1) # estimate density
dkde1d(1000, fit) # evaluate density estimate
dkde1d(seq(0, 5, by = 1), fit) # evaluate density estimate
summary(fit) # information about the estimate
plot(fit) # plot the density estimate
curve(dgamma(x, shape = 1), # add true density
Expand All @@ -61,7 +61,7 @@ curve(dgamma(x, shape = 1), # add true density
x <- rbinom(100, size = 5, prob = 0.5) # simulate data
x <- ordered(x, levels = 0:5) # declare as ordered
fit <- kde1d(x) # estimate density
dkde1d(2, fit) # evaluate density estimate
dkde1d(sort(unique(x)), fit) # evaluate density estimate
summary(fit) # information about the estimate
plot(fit) # plot the density estimate
points(ordered(0:5, 0:5), # add true density
Expand Down
25 changes: 2 additions & 23 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,13 @@
## New submission after package has been archived

Problem 1: ASAN failures
Solution: Fixed heap buffer overflows on clang-ASAN related to max-comparison
of a size_t and a negative number.

Problem 2: Failed without long doubles
Solution: Fixed cdf calculations for discrete data which made /tests fail.

Problem 3: Threw a C++ exception under valgrind (fatal to the R process)
Solution: This is a known issue of valgrind + boost; fixed by adding
'#define BOOST_MATH_PROMOTE_DOUBLE_POLICY false' in /inst/include/stats.hpp
(as suggested in http://svn.boost.org/trac10/ticket/10005).

We are sorry for the delay in fixing these issues, reproducing the bugs reported
by clang-ASAN was way harder than expected.

## Test environments
* Debian Linux with clang-4.0-ASAN enabled on rocker/r-devel-ubsan-clang (devel)
* Ubuntu 16.04 with clang-7.0.0-ASAN enabled (devel)
* Ubuntu 16.04 without long double support (devel)
* Ubuntu 16.04 with valgrind (devel)
* local OS X install (release)
* ubuntu 12.04 on travis-ci (oldrel, release, devel)
* ubuntu 12.04 on travis-ci (release, devel)
* win-builder (devel)
* Windows Server 2012 R2 x64 + x86 (release)

## R CMD check results

0 errors | 0 warnings | 0 notes

## Reverse dependencies

This is a new release, so there are no reverse dependencies.
There are no reverse dependencies.
2 changes: 1 addition & 1 deletion docs/LICENSE-text.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/authors.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions docs/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.