Implementation for naturalSpline() does not match journal article #17

mclements · 2022-12-23T16:00:45Z

Great package.

As a possible bug report, I believe that splines2::naturalSpline() does not follow Equation (11) in the Appendix to https://doi.org/10.6339/21-JDS1020. Which is correct - the published formula or the implementation?

I wrote some quick R code to re-implement naturalSpline(), including a flag to use the published formula or, otherwise, to use the implementation in naturalSpline.h:

Hmat = function(B, published=FALSE, standardise=TRUE) {
    Boundary.knots = attr(B,"Boundary.knots")
    knots = attr(B,"knots")
    Aknots = sort(c(rep(Boundary.knots,each=4), knots))
    m = length(knots)
    if (m==0) {
        Ht = matrix(c(3:0,0:3)+0.0,2,4,TRUE)
    } else {
        C = t(splineDesign(x=Boundary.knots,knots=Aknots,derivs=c(2L,2L)))
        p = nrow(C)
        if (m==1)
            Ht = matrix(c(-C[2,1]/C[1,1], 1, 0,0,0,
                          0,-C[3,1]/C[2,1],1,-C[p-2,2]/C[p-1,2],0,
                          0,0,0,1,-C[p-1,2]/C[p,2]), 3,5,TRUE)
        if (m>1) {
            nrow = m+2 # for Ht; spline_df_ variable in NaturalSplines.h
            ncol = m+4
            Ht = matrix(0,nrow,ncol)
            if (published) {
                Ht[,2:(ncol-1)] = diag(nrow)
                Ht[1,1:3] <- Ht[2,2] <- Ht[nrow-1,ncol-1] <- Ht[nrow,ncol-(0:2)] <- 1
                Ht[2,3] = -C[2,1]/C[3,1]
                Ht[nrow-1,ncol-2] = -C[p-1,2]/C[p-2,2]
            } else { # as implemented
                Ht[1,1:3] <- Ht[3,2] <- Ht[4,nrow+1] <- Ht[2,nrow+2-(0:2)] <- 1
                Ht[3,3] = -C[2,1]/C[3,1]
                Ht[4,nrow] = -C[p-1,2]/C[p-2,2]
                if (m>2) {
                    for (index in 4:(nrow-1))
                        Ht[index+1,index] = 1
                }
            }
        }
    }
    if (standardise)
        Ht = Ht / rowSums(Ht)
    return(t(Ht))
}
naturalSpline2 = function(x,df, intercept=TRUE, ...) {
    stopifnot(intercept)
    B = bs(x, df=df+2, intercept=TRUE)
    B %*% Hmat(B, ...)
}

As a check, naturalSpline2 and naturalSpline give similar answers when published=FALSE:

x = seq(0,1,length=11)
for (df in 2:6)
     print(range(naturalSpline2(x,df=df,published=FALSE) - naturalSpline(x,df=df,intercept=TRUE)))
[1] 0 0
[1] 0 0
[1] 0 0
[1] 0 0
[1] 0.000000e+00 6.938894e-18

However, the H matrix is very different from the form of the published formula and from my implementation of the published formula:

t(Hmat(bs(x, df=7, intercept=TRUE), published=FALSE))
          [,1]      [,2]      [,3] [,4]      [,5]      [,6]      [,7]
[1,] 0.3333333 0.3333333 0.3333333    0 0.0000000 0.0000000 0.0000000
[2,] 0.0000000 0.0000000 0.0000000    0 0.3333333 0.3333333 0.3333333
[3,] 0.0000000 0.2500000 0.7500000    0 0.0000000 0.0000000 0.0000000
[4,] 0.0000000 0.0000000 0.0000000    0 0.7500000 0.2500000 0.0000000
[5,] 0.0000000 0.0000000 0.0000000    1 0.0000000 0.0000000 0.0000000
t(Hmat(bs(x, df=7, intercept=TRUE), published=TRUE,standardise=FALSE))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    1    1    0    0    0    0
[2,]    0    1    3    0    0    0    0
[3,]    0    0    0    1    0    0    0
[4,]    0    0    0    0    3    1    0
[5,]    0    0    0    0    1    1    1
t(Hmat(bs(x, df=7, intercept=TRUE), published=TRUE))
          [,1]      [,2]      [,3] [,4]      [,5]      [,6]      [,7]
[1,] 0.3333333 0.3333333 0.3333333    0 0.0000000 0.0000000 0.0000000
[2,] 0.0000000 0.2500000 0.7500000    0 0.0000000 0.0000000 0.0000000
[3,] 0.0000000 0.0000000 0.0000000    1 0.0000000 0.0000000 0.0000000
[4,] 0.0000000 0.0000000 0.0000000    0 0.7500000 0.2500000 0.0000000
[5,] 0.0000000 0.0000000 0.0000000    0 0.3333333 0.3333333 0.3333333

Is this a bug or a feature?

Sincerely, Mark.

The text was updated successfully, but these errors were encountered:

wenjie2wang · 2022-12-23T16:47:53Z

Thanks for bringing this to my attention! It is a great question.

I reviewed the paper and the code I wrote. In fact, I did the implementation before writing the paper. The arrangement of the H matrix was purely for ease of coding. The resulting spline basis matrices from the code and the paper are different only in terms of the column arrangement and thus should be equivalent for modeling purposes.

I will update the underlying implementation so that it matches the paper. Thanks again for this issue.

mclements · 2022-12-23T17:35:19Z

Nice! I perhaps should have seen the column re-arrangement from the example -- although it was less obvious from the code.

Thank you for the prompt reply. You can close the issue:).

wenjie2wang closed this as completed in c6a875b Dec 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation for naturalSpline() does not match journal article #17

Implementation for naturalSpline() does not match journal article #17

mclements commented Dec 23, 2022 •

edited

Loading

wenjie2wang commented Dec 23, 2022

mclements commented Dec 23, 2022

Implementation for naturalSpline() does not match journal article #17

Implementation for naturalSpline() does not match journal article #17

Comments

mclements commented Dec 23, 2022 • edited Loading

wenjie2wang commented Dec 23, 2022

mclements commented Dec 23, 2022

mclements commented Dec 23, 2022 •

edited

Loading