Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for naturalSpline() does not match journal article #17

Closed
mclements opened this issue Dec 23, 2022 · 2 comments
Closed

Comments

@mclements
Copy link

mclements commented Dec 23, 2022

Great package.

As a possible bug report, I believe that splines2::naturalSpline() does not follow Equation (11) in the Appendix to https://doi.org/10.6339/21-JDS1020. Which is correct - the published formula or the implementation?

I wrote some quick R code to re-implement naturalSpline(), including a flag to use the published formula or, otherwise, to use the implementation in naturalSpline.h:

Hmat = function(B, published=FALSE, standardise=TRUE) {
    Boundary.knots = attr(B,"Boundary.knots")
    knots = attr(B,"knots")
    Aknots = sort(c(rep(Boundary.knots,each=4), knots))
    m = length(knots)
    if (m==0) {
        Ht = matrix(c(3:0,0:3)+0.0,2,4,TRUE)
    } else {
        C = t(splineDesign(x=Boundary.knots,knots=Aknots,derivs=c(2L,2L)))
        p = nrow(C)
        if (m==1)
            Ht = matrix(c(-C[2,1]/C[1,1], 1, 0,0,0,
                          0,-C[3,1]/C[2,1],1,-C[p-2,2]/C[p-1,2],0,
                          0,0,0,1,-C[p-1,2]/C[p,2]), 3,5,TRUE)
        if (m>1) {
            nrow = m+2 # for Ht; spline_df_ variable in NaturalSplines.h
            ncol = m+4
            Ht = matrix(0,nrow,ncol)
            if (published) {
                Ht[,2:(ncol-1)] = diag(nrow)
                Ht[1,1:3] <- Ht[2,2] <- Ht[nrow-1,ncol-1] <- Ht[nrow,ncol-(0:2)] <- 1
                Ht[2,3] = -C[2,1]/C[3,1]
                Ht[nrow-1,ncol-2] = -C[p-1,2]/C[p-2,2]
            } else { # as implemented
                Ht[1,1:3] <- Ht[3,2] <- Ht[4,nrow+1] <- Ht[2,nrow+2-(0:2)] <- 1
                Ht[3,3] = -C[2,1]/C[3,1]
                Ht[4,nrow] = -C[p-1,2]/C[p-2,2]
                if (m>2) {
                    for (index in 4:(nrow-1))
                        Ht[index+1,index] = 1
                }
            }
        }
    }
    if (standardise)
        Ht = Ht / rowSums(Ht)
    return(t(Ht))
}
naturalSpline2 = function(x,df, intercept=TRUE, ...) {
    stopifnot(intercept)
    B = bs(x, df=df+2, intercept=TRUE)
    B %*% Hmat(B, ...)
}

As a check, naturalSpline2 and naturalSpline give similar answers when published=FALSE:

x = seq(0,1,length=11)
for (df in 2:6)
     print(range(naturalSpline2(x,df=df,published=FALSE) - naturalSpline(x,df=df,intercept=TRUE)))
[1] 0 0
[1] 0 0
[1] 0 0
[1] 0 0
[1] 0.000000e+00 6.938894e-18

However, the H matrix is very different from the form of the published formula and from my implementation of the published formula:

t(Hmat(bs(x, df=7, intercept=TRUE), published=FALSE))
          [,1]      [,2]      [,3] [,4]      [,5]      [,6]      [,7]
[1,] 0.3333333 0.3333333 0.3333333    0 0.0000000 0.0000000 0.0000000
[2,] 0.0000000 0.0000000 0.0000000    0 0.3333333 0.3333333 0.3333333
[3,] 0.0000000 0.2500000 0.7500000    0 0.0000000 0.0000000 0.0000000
[4,] 0.0000000 0.0000000 0.0000000    0 0.7500000 0.2500000 0.0000000
[5,] 0.0000000 0.0000000 0.0000000    1 0.0000000 0.0000000 0.0000000
t(Hmat(bs(x, df=7, intercept=TRUE), published=TRUE,standardise=FALSE))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    1    1    0    0    0    0
[2,]    0    1    3    0    0    0    0
[3,]    0    0    0    1    0    0    0
[4,]    0    0    0    0    3    1    0
[5,]    0    0    0    0    1    1    1
t(Hmat(bs(x, df=7, intercept=TRUE), published=TRUE))
          [,1]      [,2]      [,3] [,4]      [,5]      [,6]      [,7]
[1,] 0.3333333 0.3333333 0.3333333    0 0.0000000 0.0000000 0.0000000
[2,] 0.0000000 0.2500000 0.7500000    0 0.0000000 0.0000000 0.0000000
[3,] 0.0000000 0.0000000 0.0000000    1 0.0000000 0.0000000 0.0000000
[4,] 0.0000000 0.0000000 0.0000000    0 0.7500000 0.2500000 0.0000000
[5,] 0.0000000 0.0000000 0.0000000    0 0.3333333 0.3333333 0.3333333

Is this a bug or a feature?

Sincerely, Mark.

@wenjie2wang
Copy link
Owner

Thanks for bringing this to my attention! It is a great question.

I reviewed the paper and the code I wrote. In fact, I did the implementation before writing the paper. The arrangement of the H matrix was purely for ease of coding. The resulting spline basis matrices from the code and the paper are different only in terms of the column arrangement and thus should be equivalent for modeling purposes.

I will update the underlying implementation so that it matches the paper. Thanks again for this issue.

@mclements
Copy link
Author

Nice! I perhaps should have seen the column re-arrangement from the example -- although it was less obvious from the code.

Thank you for the prompt reply. You can close the issue:).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants