New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using anova.ccabyterm within function? #16

Closed
EDiLD opened this Issue Sep 24, 2013 · 7 comments

Comments

Projects
None yet
3 participants
@EDiLD
Contributor

EDiLD commented Sep 24, 2013

I am trying to use anova.ccabyterm (but this also concerns *byaxis and *bymargin) within a function.

Example
For example using the pyrifos data and to run a RDA for every week, I could do something along these lines:

data(pyrifos)
week <- gl(11, 12, labels=c(-4, -1, 0.1, 1, 2, 4, 8, 12, 15, 19, 24))
dose <- factor(rep(c(0.1, 0, 0, 0.9, 0, 44, 6, 0.1, 44, 0.9, 0, 6), 11))

# RDA per week
out <- NULL
for (i in levels(week)) {
  out[[i]] <- anova(rda(pyrifos[week == i, ] ~ dose[week==i]), by = "terms")
}
# works!

However wrapping into a function breaks:

foo2 <- function(response, treatment, time){
  out <- NULL
  for (i in levels(time)) {
    out[[i]] <- anova(rda(response[time == i, ] ~ treatment[time==i]), by = 'terms')
  }
  return(out)
}
foo2(response = pyrifos, treatment = dose, time = week)
#  Error in eval(expr, envir, enclos) : object 'response' not found 

The error throws along the lines of model.frame.cca. Any Ideas?

@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Sep 27, 2013

Contributor

Not very good ideas. This is a scoping issue and scoping issues are tricky. We have slowly built up these routines so that they often work and may even work in loops or within functions. We have even some tests of scoping in vegan. However, this seems to be too deep. These are the items within model.frame.cca:

]> ls()
[1] "call"    "formula" "m"    

In addition, the model.frame.cca function looks at the parent.frame and there you have:

> ls(envir=parent.frame(1))
 [1] "adj"    "call"   "chi"    "df"     "n0"     "ntrm"   "object" "pchi"  
 [9] "sim"    "step"   "trm"    "trmlab"

No response here and therefore you get the error. You must go up the parents to find response:

> ls(envir=parent.frame(2))
[1] "alpha"    "beta"     "by"       "mf"       "object"   "perm.max" "step"    
> ls(envir=parent.frame(3))
[1] "i"         "out"       "response"  "time"      "treatment"

The scoping issues should be made cleaner. If you have an idea, that's fine. I can only say that you cannot do what you try to achieve.

Contributor

jarioksa commented Sep 27, 2013

Not very good ideas. This is a scoping issue and scoping issues are tricky. We have slowly built up these routines so that they often work and may even work in loops or within functions. We have even some tests of scoping in vegan. However, this seems to be too deep. These are the items within model.frame.cca:

]> ls()
[1] "call"    "formula" "m"    

In addition, the model.frame.cca function looks at the parent.frame and there you have:

> ls(envir=parent.frame(1))
 [1] "adj"    "call"   "chi"    "df"     "n0"     "ntrm"   "object" "pchi"  
 [9] "sim"    "step"   "trm"    "trmlab"

No response here and therefore you get the error. You must go up the parents to find response:

> ls(envir=parent.frame(2))
[1] "alpha"    "beta"     "by"       "mf"       "object"   "perm.max" "step"    
> ls(envir=parent.frame(3))
[1] "i"         "out"       "response"  "time"      "treatment"

The scoping issues should be made cleaner. If you have an idea, that's fine. I can only say that you cannot do what you try to achieve.

@EDiLD

This comment has been minimized.

Show comment
Hide comment
@EDiLD

EDiLD Sep 27, 2013

Contributor

Thanks Jari for your comment.

Contributor

EDiLD commented Sep 27, 2013

Thanks Jari for your comment.

@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Sep 27, 2013

Contributor

The following seems to work:

Add a data frame with rhs variables in your global environment

> mydf <- data.frame(treatment = dose, time = week)

Then edit your foo2 to use this data frame

foo2 <-
function(response, treatment, time){
  out <- NULL
  for (i in levels(time)) {
    out[[i]] <- anova(rda(response[time == i, ] ~ treatment[time==i], data=mydf), by = 'terms')
  }
  return(out)
}

NB the data=mydf in the rda() call.

The problem really seems to be in anova.ccabyterm which cannot find the variables if they are not given in data= argument. This should be fixed, but I hate fixing scoping issues so that I do not promise to do this.

Contributor

jarioksa commented Sep 27, 2013

The following seems to work:

Add a data frame with rhs variables in your global environment

> mydf <- data.frame(treatment = dose, time = week)

Then edit your foo2 to use this data frame

foo2 <-
function(response, treatment, time){
  out <- NULL
  for (i in levels(time)) {
    out[[i]] <- anova(rda(response[time == i, ] ~ treatment[time==i], data=mydf), by = 'terms')
  }
  return(out)
}

NB the data=mydf in the rda() call.

The problem really seems to be in anova.ccabyterm which cannot find the variables if they are not given in data= argument. This should be fixed, but I hate fixing scoping issues so that I do not promise to do this.

@gavinsimpson

This comment has been minimized.

Show comment
Hide comment
@gavinsimpson

gavinsimpson Sep 27, 2013

Contributor

If the intention is to recreate the data, we need to know the calling environment at the point model.frame() is called, which would usually be the global environment but in the case Eduard describes. I haven't looked in detail, but is the environment we need stored somewhere in the terms information stored in the object?

Contributor

gavinsimpson commented Sep 27, 2013

If the intention is to recreate the data, we need to know the calling environment at the point model.frame() is called, which would usually be the global environment but in the case Eduard describes. I haven't looked in detail, but is the environment we need stored somewhere in the terms information stored in the object?

@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Sep 27, 2013

Contributor

The environment where the formula was created is environment(formula$terms). However, knowing this did not help me.

If you look at the log of anova.ccabyterm.R you see that almost all commits handle scoping issues. This function is cursed. The error crops out in model.frame.cca, but the question should be why call that function? All anova.ccaby*.R functions use a little bit different tricks in getting data with different scoping, and it seems that I made wrong decisions here at some stage. Changing this would need a lot of testing and experimenting, and I am not very motivated to do so.

The scoping is multifaceted. You should be able to find the data when it was given in data=, or was in the working environment (globalenv()), or was in an attached data frame, or in any mixture of these. Further, it should work when function is called directly, or when it is in a loop or [l]apply or embedded in a function. Now it was embedded rather deep in a loop in a function. Moreover, the rhs of the formula (dependent community data frame) will never be a part of data=, and may be found in different environment than the lhs of the formula. We have added tests to catch breaking some of these cases, and for a reason.

A basic problem here is that I started developing cca/rda functions long time ago (in 2001), and I did not know very well the R internals at that time (and I think I still lack knowledge in the gory entrails of R). When we hit a scoping issue, we added one new level to catch those cases. It could have been better to to redesign the interface instead of adding ad hoc over ad hoc. I have a feeling that the whole cca/rda/capscale/anova.cca could be made much simpler, but that would mean a complete redesign and breakage all over. All that ordiGetData could perhaps be removed, and ordiParseFormula made much simpler.

Contributor

jarioksa commented Sep 27, 2013

The environment where the formula was created is environment(formula$terms). However, knowing this did not help me.

If you look at the log of anova.ccabyterm.R you see that almost all commits handle scoping issues. This function is cursed. The error crops out in model.frame.cca, but the question should be why call that function? All anova.ccaby*.R functions use a little bit different tricks in getting data with different scoping, and it seems that I made wrong decisions here at some stage. Changing this would need a lot of testing and experimenting, and I am not very motivated to do so.

The scoping is multifaceted. You should be able to find the data when it was given in data=, or was in the working environment (globalenv()), or was in an attached data frame, or in any mixture of these. Further, it should work when function is called directly, or when it is in a loop or [l]apply or embedded in a function. Now it was embedded rather deep in a loop in a function. Moreover, the rhs of the formula (dependent community data frame) will never be a part of data=, and may be found in different environment than the lhs of the formula. We have added tests to catch breaking some of these cases, and for a reason.

A basic problem here is that I started developing cca/rda functions long time ago (in 2001), and I did not know very well the R internals at that time (and I think I still lack knowledge in the gory entrails of R). When we hit a scoping issue, we added one new level to catch those cases. It could have been better to to redesign the interface instead of adding ad hoc over ad hoc. I have a feeling that the whole cca/rda/capscale/anova.cca could be made much simpler, but that would mean a complete redesign and breakage all over. All that ordiGetData could perhaps be removed, and ordiParseFormula made much simpler.

@EDiLD

This comment has been minimized.

Show comment
Hide comment
@EDiLD

EDiLD Sep 30, 2013

Contributor

Thanks Jari,

I just noticed this behaviour casually - no need to fix, since using the data-argument is the prefered way.

Contributor

EDiLD commented Sep 30, 2013

Thanks Jari,

I just noticed this behaviour casually - no need to fix, since using the data-argument is the prefered way.

@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Feb 27, 2015

Contributor

PR #104 seems to fix this.

Contributor

jarioksa commented Feb 27, 2015

PR #104 seems to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment