-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange behaviour of geom_smooth and splines::ns #2426
Comments
In the future, if's useful to have such issues as a self-contained reprex (short for minimal reproducible example). One of the best features is that it deals with generating and uploading the plots for you, so it's especially useful for graphics-related issues. If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful. library(splines)
library(ggplot2)
set.seed(12345)
dat <- data.frame(claim = rbinom(1000, 1, 0.5))
mns <- c(3.4, 3.6)
sds <- c(0.24, 0.35)
dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd = sds[dat$claim + 1]))
dat <- dat[order(dat$wind), ]
ggplot(dat, aes(wind, claim)) + geom_point() +
geom_smooth(
method = "glm",
method.args = list(family = binomial),
formula = y ~ ns(x, df = 5)
) +
geom_smooth(
method = "glm",
method.args = list(family = binomial),
formula = y ~ splines::ns(x, df = 5),
) Created on 2018-01-28 by the reprex package (v0.1.1.9000). In the one below, I've separated out the two methods so it's clear which is which. library(splines)
library(ggplot2)
set.seed(12345)
dat <- data.frame(claim = rbinom(1000, 1, 0.5))
mns <- c(3.4, 3.6)
sds <- c(0.24, 0.35)
dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd = sds[dat$claim + 1]))
dat <- dat[order(dat$wind), ]
ggplot(dat, aes(wind, claim)) + geom_point() +
geom_smooth(
method = "glm",
method.args = list(family = binomial),
formula = y ~ ns(x, df = 5)
) +
ggtitle("y ~ ns edition") ggplot(dat, aes(wind, claim)) + geom_point() +
geom_smooth(
method = "glm",
method.args = list(family = binomial),
formula = y ~ splines::ns(x, df = 5),
) +
ggtitle("splines::ns edition") Created on 2018-01-28 by the reprex package (v0.1.1.9000). |
Even better is to draw them in one plot with labels: ggplot(dat, aes(wind, claim)) +
geom_point() +
geom_smooth(
method = "glm",
method.args = list(family = binomial),
formula = y ~ splines::ns(x, df = 5),
aes(color = "splines::ns"),
se = FALSE
) +
geom_smooth(
method = "glm",
method.args = list(family = binomial),
formula = y ~ ns(x, df = 5),
aes(color = "ns"),
se = FALSE
) It is utterly mystifying to me why these should be returning different results. |
If I instrument the code to print the model, I see: (Intercept) splines::ns(x, df = 5)1 splines::ns(x, df = 5)2 splines::ns(x, df = 5)3
0.5194712 -0.8687737 -0.6803954 4.0838947
splines::ns(x, df = 5)4 splines::ns(x, df = 5)5
2.3908674 4.1564128
(Intercept) ns(x, df = 5)1 ns(x, df = 5)2 ns(x, df = 5)3 ns(x, df = 5)4 ns(x, df = 5)5
0.5194712 -0.8687737 -0.6803954 4.0838947 2.3908674 4.1564128 So it certainly looks like the fitted model is identical. |
Hmmm, this appears to be a bug in glm? library(splines)
set.seed(12345)
dat <- data.frame(claim = rbinom(1000, 1, 0.5))
mns <- c(3.4, 3.6)
sds <- c(0.24, 0.35)
dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd = sds[dat$claim + 1]))
dat <- dat[order(dat$wind), ]
m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial)
unname(coef(m1))
#> [1] 0.5194712 -0.8687737 -0.6803954 4.0838947 2.3908674 4.1564128
unname(coef(m2))
#> [1] 0.5194712 -0.8687737 -0.6803954 4.0838947 2.3908674 4.1564128
newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5))
unname(predict(m1, newdata = newdf))
#> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
unname(predict(m2, newdata = newdf))
#> [1] 0.5194712 -0.5666554 -0.1731268 2.8134844 3.9295814 Created on 2018-04-27 by the reprex package (v0.2.0). |
This doesn't seem to be a bug in ggplot2, and I've posted to r-devel to see if we can learn more. |
Thanks for your efforts!
… On 27 Apr 2018, at 11:25 pm, Hadley Wickham ***@***.***> wrote:
Closed #2426 <#2426>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#2426 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJdwfGdVeL67t7CkepVzALEEzLkKpJrOks5tsxxagaJpZM4Rv4oG>.
|
To close this out, it's definitely a bug in |
And for anyone’s interested, here’s the thread on r-devel where the bug is discussed: https://stat.ethz.ch/pipermail/r-devel/2018-April/075889.html |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
The following example produces two different smooths where I believe they should be the same.
(This is a simulation based on a rather larger real data set)
The only difference in the geom_smooth is the use of splines::ns versus ns
If using R 3.4.3 on windows and ggplot2 version 2.2.1
The text was updated successfully, but these errors were encountered: