Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up
We have relied on functions
as.mlm.ccaandas.mlm.rdato castccaandrdaresult objects to multiple linear model objects (class"mlm"), and to find influence statistics via these usingmlmand univariatelmmethods. I started to look at implementing these statistics directly. First I did this on intellectual curiosity, but then I understood that we can develop more adequate tools with dedicated methods (and I also found out that some of the stats method gave wrong results for"mlm"objects in R).Basically we need only three (or two) basic statistics to derive the main influence statistics:
Hatvalues giving the leverage of each point. These are defined only by constraints, and the dependent (community) data has no influence. The hatvalues are also similar in univariate
lm. So these are easy and non-problematic.Raw residuals. Basically these are differences of observed and fitted values, but this concept is far from clear in constrained ordination. Currently I have implemented two alternatives:
type = "response"uses the residuals of species after fitting constraints, andtype = "canoco"uses the differences of WA and LC scores. The problem with response residuals is that they have the same dimensions as the original community data which can be too much for intuitive use. Further, they have no clear correspondence to ordination axes, but they are only based on all constraints. The canoco residuals are given for each axis and are much more manageable, but I fail to see what makesmodel$CCA$wa - model$CCA$uto be meaningful residuals. This difference is in no way related to residual variation after constrained ordination.Residual standard deviation or
sigma. This is actually not completely independent statistic, but it should be consistent with raw residuals (and I have now implemented this so that it will be derived from the sum of squared raw residuals). For response residuals, this is based on column sums of squared residual community matrixcolSums(ordiYbar(model, "CA")^2)or with old"cca" class fromcolSums(model$CA$Xbar^2). For canoco residuals, the sigma is based oncolSums(model$CCA$wa^2) - 1and has no clear relation to residual variation (but it has a very simple relationship to species-environment correlation) -- it is not even monotonic with constrained eigenvalues.Branch
influence-ccahas functionshatvalues.cca,hatvalues.rdaandsigma.ccawhich are all based on R standard generic functions mainly working withlmobjects. There is no dedicated function for raw residuals (when writing this), but these are extracted within specific functions. Based on these, I have now implementedrstandard.cca,rstudent.ccaandcooks.distance.ccawhich all work both withccaandrdaresults, and all have option to selecttype = "response"or"canoco". What we would really need is an option that combines good sides of these responses: gives influence statistics for axis (or cumulative axes) like"canoco", but uses meaningful residual error like"response".I haven't implemented
dfbetaanddfbetasfunctions. These produce so huge results even for univariate cases that they would be unmanageable and unusable in multivariate cases.I have also implemented method function
vcov.cca(usingSSD.cca) which can be used to find the covariance matrix and standard errors of coefficients of constraints. When writing this, I have only optiontype = "canoco", but adding"response"should be as simple as adding choice for raw residuals inSSD.cca. These also produce large matrices which can be difficult to inspect. However, the diagonal of this matrix contains the residual variances of coefficients, and the the same t-values for coefficients assummary(as.mlm(model))will be given bycoef(model)/sqrt(diag(vcov(model))). However, these are based on"canoco"style sigma, which seems to make them rather meaningless. We really need a bettersigma.If these functions are merged into
master, there really is no longer need foras.mlmmethods and we could deprecate them. I think all usable statistics can be found with the proposed new functions, and we can also easily provide better alternatives than the inbuilt"canoco"type ofas.mlm. Development of new properties is also easier with the proposed structure than relying on casting to"mlm"objects.