noweb/zph.Rnw

\section{The cox.zph function}
The simplest test of proportional hazards is to use a time dependent
coefficient $\beta(t) = a + bt$.
Then $\beta(t) x = ax + b*(tx)$, and the extended coefficients $a$ and $b$
can be obtained from a Cox model with an extra 'fake' covariate $tx$.
More generally, replace $t$ with some function $g(t)$, which gives rise to
an entire family of tests.
An efficient assessment of this extended model can be done using a score
test.
\begin{itemize}
  \item Augment the original variables $x_1, \ldots x_k$ with $k$ new ones
$g(t)x_1, \ldots, g(t)x_k$
  \item Compute the first and second derivatives $U$ and $H$ of the Cox model
at the starting estimate of $(\hat\beta, 0)$; prior covariates at their
prior values, and the new covariates at 0.  No iteration is done.
This can be done efficiently with a modified version of the primary C routines
for coxph.
  \item By design, the first $k$ elements of $U$ will be zero. Thus the 
first iteration of the new coefficients, and the score tests for them, are
particularly easy.  
\end{itemize}

The information or Hessian matrix for a Cox model is 
$$ \sum_{j \in deaths} V(t_j)  = \sum_jV_j$$
where $V_j$ is the variance matrix of the weighted covariate values, over
all subjects at risk at time $t_j$.
Then the expanded information matrix for the score test is
\begin{align*}
  H &= \left(\begin{array}{cc}  H_1 & H_2 \\ H_2' & H_3 \end{array} \right) \\
  H_1 &= \sum V(t_j) \\
  H_2 &= \sum V(t_j) g(t_j) \\
  H_3 &= \sum V(t_j) g^2(t_j)
\end{align*}
The inverse of the matrix will be more numerically stable if $g(t)$ is centered
at zero, and this does not change the test statistic.
In the usual case $V(t)$ is close to constant in time --- the variance of
$X$ does not change rapidly --- and then $H_2$ is approximately zero.
The original cox.zph used an approximation, which is to assume that
$V(t)$ is exactly constant.
In that case $H_2=0$ and $H_3= \sum V(t_j) \sum g^2(t_j)$ and the test
is particularly easy to compute.
This assumption of identical components can fail badly for models with a
covariate by strata interaction, and for some models with covariate
dependent censoring.
Multi-state models finally forced a change.

The newer version of the routine has two separate tracks: for the formal test
and another for the residuals.

<<cox.zph>>=
cox.zph <- function(fit, transform='km', terms=TRUE, singledf =FALSE, 
                    global=TRUE) {
    Call <- match.call()
    if (!inherits(fit, "coxph") && !inherits(fit, "coxme")) 
        stop ("argument must be the result of Cox model fit")
    if (inherits(fit, "coxph.null"))
	stop("there are no score residuals for a Null model")
    if (!is.null(attr(terms(fit), "specials")[["tt"]]))
        stop("function not defined for models with tt() terms")

    if (inherits(fit, "coxme")) {
        # drop all mention of the random effects, before getdata
        fit$formula <- fit$formula$fixed
        fit$call$formula <- fit$formula
     }

    cget <- coxph.getdata(fit, y=TRUE, x=TRUE, stratax=TRUE, weights=TRUE)
    y <- cget$y
    ny <- ncol(y)
    event <- (y[,ny] ==1)
    if (length(cget$strata)) 
        istrat <- as.integer(cget$strata) - 1L # number from 0 for C
    else istrat <- rep(0L, nrow(y))

    # if terms==FALSE the singledf argument is moot, but setting a value
    #   leads to a simpler path through the code
    if (!terms) singledf <- FALSE 
    
    <<zph-setup>>
    <<zph-transform>>
    <<zph-terms>>
    <<zph-schoen>>

    rval$transform <- tname
    rval$call <- Call
    class(rval) <- "cox.zph"
    return(rval)
}

print.cox.zph <- function(x, digits = max(options()$digits - 4, 3),
                          signif.stars=FALSE, ...)  {
    invisible(printCoefmat(x$table, digits=digits, signif.stars=signif.stars, 
                           P.values=TRUE, has.Pvalue=TRUE, ...))
}
@ 

The user can use $t$ or $g(t)$ as the multiplier of the covariates.
The default is to use the KM, only because that seems to be best at
avoiding edge cases.

<<zph-transform>>=
times <- y[,ny-1]
if (is.character(transform)) {
    tname <- transform
    ttimes <- switch(transform,
                     'identity'= times,
                     'rank'    = rank(times),
                     'log'     = log(times),
                     'km' = {
                         temp <- survfitKM(factor(rep(1L, nrow(y))),
                                           y, se.fit=FALSE)
                         # A nuisance to do left continuous KM
                         indx <- findInterval(times, temp$time, left.open=TRUE)
                         1.0 - c(1, temp$surv)[indx+1]
                     },
                     stop("Unrecognized transform"))
	}
    else {
	tname <- deparse(substitute(transform))
        if (length(tname) >1) tname <- 'user'
	ttimes <- transform(times)
	}
    gtime <- ttimes - mean(ttimes[event]) 

    # Now get the U, information, and residuals
    if (ny==2) {
        ord <- order(istrat, y[,1]) -1L
        resid <- .Call(Czph1, gtime, y, X, eta,
                        cget$weights, istrat, fit$method=="efron", ord)
    }
    else {
        ord1 <- order(-istrat, -y[,1]) -1L   # reverse time for zph2
        ord  <- order(-istrat, -y[,2]) -1L
        resid <- .Call(Czph2, gtime, y, X, eta,
                        cget$weights, istrat, fit$method=="efron", 
                        ord1, ord)
    }
@

The result has a score vector of length $2p$ where $p$ is the number of
variables and an information matrix that is $2p$ by $2p$.
This is done with C code that
is a simple variation on iteration 1 for a coxph model.

If \code{singledf} is TRUE then treat each term as a single degree of
freedom test, otherwise as a multi-degree of freedom.
If terms=FALSE test each covariate individually.
If all the variables are univariate this is a moot point.
The survival routines return Splus style assign components, that is a list
with one element per term, each element an integer vector of coefficient
indices.

The asgn vector is our main workhorse: loop over asgn to process term by
term.
\begin{itemize}
  \item if term=FALSE, set make a new asgn with one coef per term
  \item if a coefficient is NA, remove it from the relevant asgn vector
  \item frailties and penalized coxme coefficients are ignored: remove
    their element from the asgn list
\end{itemize} 
   
For random effects models, including both frailty and coxme results, the
random effect is included in the linear.predictors component of the 
fit.  This allows us to do score tests for the other terms while effectively
holding the random effect fixed.

If there are any NA coefficients these are redundant variables.  It's
easiest to simply get rid of them at the start by fixing up X, varnames,
asgn, and nvar.
<<zph-setup>>=
eta <- fit$linear.predictors
X <- cget$x
varnames <- names(fit$coefficients)
nvar <- length(varnames)

if (!terms) {
    # create a fake asgn that has one value per coefficient
    asgn <- as.list(1:nvar)
    names(asgn) <- names(fit$coefficients)
}
else if (inherits(fit, "coxme")) {
    asgn <- attrassign(cget$x, terms(fit))
    # allow for a spelling inconsistency in coxme, later fixed
    if (is.null(fit$linear.predictors)) 
        eta <- fit$linear.predictor
    fit$df <- NULL  # don't confuse later code
}
else   asgn <- fit$assign
    
if (!is.list(asgn)) stop ("unexpected assign component")

frail <- grepl("frailty(", names(asgn), fixed=TRUE)
if (any(frail)) {
    dcol <- unlist(asgn[frail])    # remove these columns from X
    X <- X[, -dcol, drop=FALSE]
    asgn <- asgn[!frail]
    # frailties don't appear in the varnames, so no change there
}
nterm <- length(asgn)
termname <- names(asgn)

if (any(is.na(fit$coefficients))) {
    keep <- !is.na(fit$coefficients)
    varnames <- varnames[keep]
    X <- X[,keep]

    # fix up assign 
    new <- unname(unlist(asgn))[keep] # the ones to keep
    asgn <- sapply(asgn, function(x) {
        i <- match(x, new, nomatch=0)
        i[i>0]})
    asgn <- asgn[sapply(asgn, length)>0]  # drop any that were lost
    termname <- names(asgn)
    nterm <- length(asgn)   # asgn will be a list
    nvar <- length(new)
}
@ 

The zph1 and zph2 functions do not consider penalties, so we need to add
those back in after the call. 
Nothing needs to be done wrt the first derivative: we already ignore the
first ncoef elements of the returned first derivative (u) vector, which would
have had a penalty.  The second portion of u is for beta=0, and all of the
penalties that currently are implemented have first derivative 0 at 0.
For the second derivative, the current penalties (frailty, rigde, pspline) have
a second derivative penalty that is independent of beta-hat.  
The coxph result contains the numeric value of the penalty at the solution,
and we use a score test that would penalize the new time*pspline() term in
the same way as the pspline term was penalized.

If no coefficients were missing then allvar will be 1:n, otherwise it
will have holes.  

<<zph-terms>>=                   
test <- double(nterm+1)
df   <- rep(1L, nterm+1)
u0 <- rep(0, nvar)
if (!is.null(fit$coxlist2)) { # there are penalized terms
    pmat <- matrix(0., 2*nvar, 2*nvar) # second derivative penalty
    pmat[1:nvar, 1:nvar] <- fit$coxlist2$second
    pmat[1:nvar + nvar, 1:nvar + nvar] <- fit$coxlist2$second
    imatr <- resid$imat + pmat
}
else imatr <- resid$imat

for (ii in 1:nterm) {
    jj <- asgn[[ii]]
    kk <- c(1:nvar, jj+nvar)
    imat <- imatr[kk, kk]
    u <- c(u0, resid$u[jj+nvar])
    if (singledf && length(jj) >1) {
        vv <- solve(imat)[-(1:nvar), -(1:nvar)]
        t1 <- sum(fit$coef[jj] * resid$u[jj+nvar])
        test[ii] <- t1^2 * (fit$coef[jj] %*% vv %*% fit$coef[jj])
        df[ii] <- 1
    }
    else {
        test[ii] <- drop(solve(imat,u) %*% u)
        if (is.null(fit$df)) df[ii] <- length(jj)
        else df[ii] <- fit$df[ii]
    }
}

#Global test
if (global) {
    u <- c(u0, resid$u[-(1:nvar)])
    test[nterm+1] <- solve(imatr, u) %*% u
    if (is.null(fit$df))  df[nterm+1]   <- nvar
    else df[nterm+1] <- sum(fit$df)

    tbl <- cbind(test, df, pchisq(test, df, lower.tail=FALSE))
    dimnames(tbl) <- list(c(termname, "GLOBAL"), c("chisq", "df", "p"))
}
else {
    tbl <- cbind(test, df, pchisq(test, df, lower.tail=FALSE))[1:nterm,, drop=FALSE]
    dimnames(tbl) <- list(termname, c("chisq", "df", "p"))
}

# The x, y, residuals part is sorted by time within strata; this is
#  what the C routine zph1 and zph2 return
indx <- if (ny==2) ord +1 else rev(ord) +1  # return to 1 based subscripts
indx <- indx[event[indx]]                   # only keep the death times
rval <- list(table=tbl, x=unname(ttimes[indx]), time=unname(y[indx, ny-1]))
if (length(cget$strata)) rval$strata <- cget$strata[indx]
@ 

The matrix of scaled Schoenfeld residuals is created one stratum at a
time. 
The ideal for the residual $r(t_i)$, contributed by an event for subject
$i$ at time $t_i$ is to use $r_iV^{-1}(t_i)$, the inverse of the  variance 
matrix of $X$ at that time and for the relevant stratum.
What is returned as \code{resid\$imat} is $\sum_i V(t_i)$.
One option would have been to return all the individual $\hat V_i$ matrices,
but that falls over when the number at risk is too small and it cannot
be inverted.
Option 2 would be to use a per stratum averge of the $V_i$, but that falls
flat for models with a large number of strata, a nested case-control model
for instance. 
We take a different average that may not be the best, but seems to be
good enough and doesn't seem to fail.
\begin{enumerate}
  \item The \code{resid\$used} matrix contains the number of deaths for
    each strata (row) that contributed to the sum for each variable (column).
    The value is either 0 or the number of events in the stratum, zero for those
    variables that are constant within the stratum.  From this we can get the
    number of events that contributed to each element of the \code{imat} total.
    Dividing by this gives a per-element average \code{vmean}.  
  \item For a given stratum, some of the covariates may have been unused.  For
    any of those set the scaled Schoenfeld residual to NA, and use the other
    rows/columns of the \code{vmean} matrix to scale the rest.
\end{enumerate}
Now if some variable $x_1$ has a large variance at some time points and a
small variance at others, or a large variance in one stratum and a small
variance in another, the above smoothing won't catch that subtlety.
However we expect such an issue to be rare. 
The common problem of strata*covariate interactions is the target of the
above manipulations.

<<zph-schoen>>=
# Watch out for a particular edge case: there is a factor, and one of the
#   strata happens to not use one of its levels.  The element of resid$used will
#   be zero, but it really should not.
used <-resid$used
for (i in asgn) {
    if (length(i) > 1 && any(used[,i] ==0)) 
        used[,i] <- apply(used[,i,drop=FALSE], 1, max)
}
    
# Make the weight matrix
wtmat <- matrix(0, nvar, nvar)
for (i in 1:nrow(used))
    wtmat <- wtmat + outer(used[i,], used[i,], pmin)
# with strata*covariate interactions (multi-state models for instance) the
#  imatr matrix will be block diagonal.  Don't divide these off diagonal zeros
#  by a wtmat value of zero.
vmean <- imatr[1:nvar, 1:nvar, drop=FALSE]/ifelse(wtmat==0, 1, wtmat)

sresid <- resid$schoen
if (terms && any(sapply(asgn, length) > 1)) { # collase multi-column terms
    temp <- matrix(0, ncol(sresid), nterm)
    for (i in 1:nterm) {
        j <- asgn[[i]]
        if (length(j) ==1) temp[j, i] <- 1
        else temp[j, i] <- fit$coefficients[j]
    }

    sresid <- sresid %*% temp
    vmean <- t(temp) %*% vmean %*% temp
    used <- used[, sapply(asgn, function(x) x[1]), drop=FALSE]
}

dimnames(sresid) <- list(signif(rval$time, 4), termname)

# for each stratum, rescale the Schoenfeld residuals in that stratum
sgrp <- rep(1:nrow(used), apply(used, 1, max))
for (i in 1:nrow(used)) {
    k <- which(used[i,] > 0)
    if (length(k) >0)  { # there might be no deaths in the stratum
        j <- which(sgrp==i)
        if (length(k) ==1) sresid[j,k] <- sresid[j,k]/vmean[k,k]
        else sresid[j, k] <- t(solve(vmean[k, k], t(sresid[j, k, drop=FALSE])))
        sresid[j, -k] <- NA
    }
} 

# Add in beta-hat.  For a term with multiple columns we are testing zph for
#  the linear predictor X\beta, which always has a coefficient of 1
for (i in 1:nterm) {
    j <- asgn[[i]]
    if (length(j) ==1) sresid[,i] <- sresid[,i] + fit$coefficients[j]
    else sresid[,i] <- sresid[,i] +1
}

rval$y <- sresid
rval$var <- solve(vmean)  
@ 

<<cox.zph>>=
"[.cox.zph" <- function(x, ..., drop=FALSE) {
    i <- ..1
    if (!is.null(x$strata)) {
        y2 <- x$y[,i,drop=FALSE]
        ymiss <- apply(is.na(y2), 1, all)
        if (any(ymiss)) {
            # some deaths played no role in these coefficients
            #  due to a strata * covariate interaction, drop unneeded rows
            z<- list(table=x$table[i,,drop=FALSE], x=x$x[!ymiss], 
                     time= x$time[!ymiss], 
                     strata = x$strata[!ymiss],
                     y = y2[!ymiss,,drop=FALSE],
                     var=x$var[i,i, drop=FALSE], 
                     transform=x$transform, call=x$call)
            }
        else z<- list(table=x$table[i,,drop=FALSE], x=x$x, time= x$time, 
                      strata = x$strata,
                      y = y2,  var=x$var[i,i, drop=FALSE], 
                      transform=x$transform, call=x$call)
    }
    else
        z<- list(table=x$table[i,,drop=FALSE], x=x$x, time= x$time, 
                 y = x$y[,i,drop=FALSE],
                 var=x$var[i,i, drop=FALSE],
                 transform=x$transform, call=x$call)
    class(z) <- class(x)
    z
}
@