Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in rda formula vegan 2.4-1 #200

Closed
GZikalala opened this issue Sep 26, 2016 · 9 comments
Closed

bug in rda formula vegan 2.4-1 #200

GZikalala opened this issue Sep 26, 2016 · 9 comments
Labels

Comments

@GZikalala
Copy link

@GZikalala GZikalala commented Sep 26, 2016

Hi,

I have a problems with running the rda function (vegan version 2.4-1),

vegan::rda(Y ~ X1 + X2 + X3+ X4, data= Env_1dmod)

"Error in formula.default(object, env = baseenv()) : invalid formula"

Help will be much appreciated.

Thanks.

Gugu

@gavinsimpson
Copy link
Contributor

@gavinsimpson gavinsimpson commented Sep 26, 2016

I can't replicate this locally:

> library("vegan")
Loading required package: permute
Loading required package: lattice
This is vegan 2.4-1
> data(dune.env, dune)
> vegan::rda(dune ~ Use + Management + Moisture + Manure, data = dune.env)
Call: rda(formula = dune ~ Use + Management + Moisture + Manure, data =
dune.env)

              Inertia Proportion Rank
Total         84.1237     1.0000     
Constrained   60.8358     0.7232   11
Unconstrained 23.2879     0.2768    8
Inertia is variance 
Some constraints were aliased because they were collinear (redundant)

Eigenvalues for constrained axes:
  RDA1   RDA2   RDA3   RDA4   RDA5   RDA6   RDA7   RDA8   RDA9  RDA10  RDA11 
22.371 15.794  7.038  4.029  3.503  2.527  2.102  1.657  0.943  0.582  0.289 

Eigenvalues for unconstrained axes:
  PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8 
6.857 5.284 3.853 2.583 2.398 1.120 0.674 0.518

Suggestions are to retry in a clean session, and I assume you have already loaded the package via library("vegan"), if not try that and see if it works. It looks like the correct formula processing code is not being dispatched and your use of vegan::rda might suggest you haven't loaded the package yet.

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Sep 27, 2016

I have tried several ways of abusing rda command, but I haven't been able to reproduce your error. Please type traceback() after getting the error and show us the returned path to the error.

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Sep 27, 2016

OK. Now I can reproduce this problem:

library(vegan)
data(dune, dune.env)
notadataframe <- data.matrix(dune.env) # change from data.frame to matrix
rda(dune ~ A1 + Moisture, notadataframe) # data *must* be a data.frame
# Error in formula.default(object, env = baseenv()) : invalid formula

Is your Env_1dmod a matrix instead of a data.frame? What does class(Env_1dmod) say?

The data must be a data frame. You can cast it to the correct form using as.data.frame(Env_1dmod), or you can even give a matrix as a term in the formula

rda(dune ~ notadataframe)

but then all variables are used and will be interpreted as continuous (which would be a mistake in this case, as some obviously are factors).

Interestingly, lm gives a more informative error message from the model.frame.default function where rda also is heading to:

lm(A1 ~ Moisture, notadataframe)
#Error in model.frame.default(formula = A1 ~ Moisture, data = notadataframe,  : 
# 'data' must be a data.frame, not a matrix or an array

@gavinsimpson , probably we need to do something to this.

@jarioksa jarioksa added the not-a-bug label Sep 27, 2016
@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Sep 27, 2016

Giving data= as a matrix instead of a data.frame is a user error, but vegan error message should be informative. The difference between data.frame and matrix is not at all obvious to a user: it cannot be seen but it must be specifically inspected. The error message should guide to that inspection and sorting out data. Most vegan functions fail, and the error messages can be really confusing. Here the cases I have seen:

OK: 'data' must be a data.frame, not a matrix or an array

  • gdispweight
  • ordisurf
  • varpart

Bad: Error in formula.default(object, env = baseenv()) : invalid formula

  • all cca, rda, capscale, dbrda, adonis2 functions that use ordiGetData. This may also concern some of their support functions, but I haven't checked those. Update on 6/10/16: These are fixed with PR #202 that removes ordiGetData.

Bad: Error in eval(lhs, data, parent.frame()) : numeric 'envir' arg not of length one

  • adipart, hiersimu and multipartthat all use hierParseFormula
  • old adonis: the mechanism is the same as in bioenv -- we try to evaluate the lhs. The current approach allows the lhs be a variable given in data. We broke this in adonis2 and it was raised as issue #156. If we want to keep this behaviour (= find LHS from data) we need to find another solution than in adonis2.
  • bioenv: error emerges when we try to evaluate the lhs (dependent data) as comm <- formula[[2]]; comm <- eval(comm, data, parent.frame()). The second argument of eval() can be a number referring to parent frames, but matrix is taken as a vector of numbers and hence we get the error message. The current approach should made it possible to have the lhs in data as well, and changing this would break those applications that did so. See comment on adonis
  • envfit: the mechanism is the same as in bioenv.

Dubious: Looks like a matrix would work in place of a data.frame (but these are lattice graphics and therefore special)

  • ordicloud, ordixyplot
@GZikalala
Copy link
Author

@GZikalala GZikalala commented Sep 27, 2016

@jarioksa

works super! thank you so much.

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Oct 7, 2016

The original problem was a user error, but very easy to be made since it is impossible to see from surface whether a matrix-like object is a data.frame or a matrix. Only data.frame is accepted, but the error message was confusing. It would be easy to fix the message: just add a test. However, I think the confusing error message is just a symptom and we need to check the ways formulae are handled in vegan. We have no consistent way of handling formula, but it seems that each function has invented its own hack, perhaps after peeking a randomly selected function. I already found some unnecessary looking kluge from 2004, and I think I have now a simpler and better way of handling formula in constrained ordination & friends (PR #202). As a side effect, we also got the expected informative error message.

Currently all bad error messages are of the same type: we try to evaluate the left-hand-side of a formula with data:

eval(formula[[2]], data)

In all these cases LHS should be an independent matrix-like object or they should be dissimilarities that are not in data, but we still find them in the enclosing environment. However, if data is a matrix, we get an error. A more natural evaluation would be:

eval(formula[[2]], environment(formula), enclos = .GlobalEnv) # or enclos = parent.frame()

which really looks for the LHS where it should be. However, a side effect of the original evaluation is that the formula works when the LHS is a univariate response in data. This is not documented, and it is even against the documentations, but @gavinsimpson raised this as issue #156 with adonis2. Now the point is whether we should keep the undocumented behaviour or to switch to new evaluation. Even if we switch to the new evaluation scheme, it is still possible to use univariate responses (whether they make sense or not), but they cannot be found from data.

I already changed envfit. Its only legitimate usage is that LHS is an ordination result that should not be in data (except in the insane ggplot system that mixes data with its presentation). I think that it is also safe to change adipart, multipart and hiersimu of @psolymos because they deal with diversity partitioning and data would allow only one-species community with no useful diversity measure. I am not sure if anybody has used bioenv for univariate responses with data, but that is possible. However, I'm inclined to change that, too. NB, you still need a special handling of LHS in bioenv, because univariate response is dropped to a vector by default and will fail in dissimilarity index (such as as.matrix(y) ~ ...). However, I think I'll leave adonis like it is now.

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Oct 11, 2016

This was a user error, and the vegan problem was that the error message was unclear. The real underlying problem was that handling of formula was inconsistent and sometimes sloppy in vegan. The formula handling was improved in pull requests #202 and #204 and as the side effect, the message on the user error was fixed and is now informative: 'data' must be a data.frame, not a matrix or an array.

@jarioksa jarioksa closed this Oct 11, 2016
@GZikalala
Copy link
Author

@GZikalala GZikalala commented Oct 12, 2016

Again thank you for the response, it was more my error.

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Oct 12, 2016

No worries: it is an easy error to make (and we all have made it several times). The error message was not helpful, and that was our problem.

jarioksa added a commit that referenced this issue Nov 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.