Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour of geom_smooth and splines::ns #2426

Closed
glennstone opened this issue Jan 28, 2018 · 9 comments
Closed

Strange behaviour of geom_smooth and splines::ns #2426

glennstone opened this issue Jan 28, 2018 · 9 comments

Comments

@glennstone
Copy link

The following example produces two different smooths where I believe they should be the same.
(This is a simulation based on a rather larger real data set)

The only difference in the geom_smooth is the use of splines::ns versus ns

If using R 3.4.3 on windows and ggplot2 version 2.2.1

require(splines)
require(ggplot2)

set.seed(12345)
dat = data.frame(claim=rbinom(1000,1,0.5))
mns = c(3.4,3.6)
sds = c(0.24, 0.35)
dat$wind = exp(rnorm(nrow(dat), mean=mns[dat$claim+1], sd=sds[dat$claim+1]))
dat = dat[order(dat$wind),]

ggplot(dat, aes(wind,claim))+geom_point()+
  geom_smooth(method = "glm",
              method.args = list(family = binomial), 
              formula = y ~ splines::ns(x, df=5))+
  geom_smooth(method = "glm",
              method.args = list(family = binomial), 
              formula = y ~ ns(x, df=5))
@batpigandme
Copy link
Contributor

In the future, if's useful to have such issues as a self-contained reprex (short for minimal reproducible example). One of the best features is that it deals with generating and uploading the plots for you, so it's especially useful for graphics-related issues.

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

library(splines)
library(ggplot2)

set.seed(12345)
dat <- data.frame(claim = rbinom(1000, 1, 0.5))
mns <- c(3.4, 3.6)
sds <- c(0.24, 0.35)
dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd = sds[dat$claim + 1]))
dat <- dat[order(dat$wind), ]

ggplot(dat, aes(wind, claim)) + geom_point() +
  geom_smooth(
    method = "glm",
    method.args = list(family = binomial),
    formula = y ~ ns(x, df = 5)
  ) +
  geom_smooth(
    method = "glm",
    method.args = list(family = binomial),
    formula = y ~ splines::ns(x, df = 5),
  )

Created on 2018-01-28 by the reprex package (v0.1.1.9000).

In the one below, I've separated out the two methods so it's clear which is which.

library(splines)
library(ggplot2)

set.seed(12345)
dat <- data.frame(claim = rbinom(1000, 1, 0.5))
mns <- c(3.4, 3.6)
sds <- c(0.24, 0.35)
dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd = sds[dat$claim + 1]))
dat <- dat[order(dat$wind), ]

ggplot(dat, aes(wind, claim)) + geom_point() +
  geom_smooth(
    method = "glm",
    method.args = list(family = binomial),
    formula = y ~ ns(x, df = 5)
  ) +
  ggtitle("y ~ ns edition")

ggplot(dat, aes(wind, claim)) + geom_point() +
  geom_smooth(
    method = "glm",
    method.args = list(family = binomial),
    formula = y ~ splines::ns(x, df = 5),
  ) +
  ggtitle("splines::ns edition")

Created on 2018-01-28 by the reprex package (v0.1.1.9000).

@hadley
Copy link
Member

hadley commented Apr 27, 2018

Even better is to draw them in one plot with labels:

ggplot(dat, aes(wind, claim)) + 
  geom_point() +
  geom_smooth(
    method = "glm",
    method.args = list(family = binomial),
    formula = y ~ splines::ns(x, df = 5),
    aes(color = "splines::ns"),
    se = FALSE
  ) +
  geom_smooth(
    method = "glm",
    method.args = list(family = binomial),
    formula = y ~ ns(x, df = 5),
    aes(color = "ns"),
    se = FALSE
  )

It is utterly mystifying to me why these should be returning different results.

@hadley
Copy link
Member

hadley commented Apr 27, 2018

If I instrument the code to print the model, I see:

            (Intercept) splines::ns(x, df = 5)1 splines::ns(x, df = 5)2 splines::ns(x, df = 5)3 
              0.5194712              -0.8687737              -0.6803954               4.0838947 
splines::ns(x, df = 5)4 splines::ns(x, df = 5)5 
              2.3908674               4.1564128 

   (Intercept) ns(x, df = 5)1 ns(x, df = 5)2 ns(x, df = 5)3 ns(x, df = 5)4 ns(x, df = 5)5 
     0.5194712     -0.8687737     -0.6803954      4.0838947      2.3908674      4.1564128 

So it certainly looks like the fitted model is identical.

@hadley
Copy link
Member

hadley commented Apr 27, 2018

Hmmm, this appears to be a bug in glm?

library(splines)

set.seed(12345)
dat <- data.frame(claim = rbinom(1000, 1, 0.5))
mns <- c(3.4, 3.6)
sds <- c(0.24, 0.35)
dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd = sds[dat$claim + 1]))
dat <- dat[order(dat$wind), ]

m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial)

unname(coef(m1))
#> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
unname(coef(m2))
#> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128

newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5))
unname(predict(m1, newdata = newdf))
#> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
unname(predict(m2, newdata = newdf))
#> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814

Created on 2018-04-27 by the reprex package (v0.2.0).

@hadley
Copy link
Member

hadley commented Apr 27, 2018

This doesn't seem to be a bug in ggplot2, and I've posted to r-devel to see if we can learn more.

@hadley hadley closed this as completed Apr 27, 2018
@glennstone
Copy link
Author

glennstone commented Apr 28, 2018 via email

@hadley
Copy link
Member

hadley commented May 1, 2018

To close this out, it's definitely a bug in glm(), and it'll be fixed in the next version of R.

@huftis
Copy link
Contributor

huftis commented May 1, 2018

And for anyone’s interested, here’s the thread on r-devel where the bug is discussed: https://stat.ethz.ch/pipermail/r-devel/2018-April/075889.html

@lock
Copy link

lock bot commented Oct 28, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Oct 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants