-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
next set of models #35
Comments
I think that this is the right place to ask: What about:
As a lower priority, what about more complex cases that are combinations of other model types:
|
So far Perhaps Max thinks about this differently than I do, but since those models typically don't have lots of hyperparameters / any and don't get cross-validated all that often, I think of them as living in a slightly different universe that isn't as central to |
what about mgcv::gam / mgcv::bam? I don't think g If I'd like to contribute, what would be the best startpoint? |
To some extent, I'd like to organize these models on a more mechanistic basis. For example, if you were doing PK modeling, we could have This could end up generating a ton of different models, but I think this is better than a general That's my current thinking right now.
Similarly, having model functions for specific types of experiments/analysis. Suppose you have a simple 1-level repeated measures design (e.g. replicate data points per drug or longitudinal data for a patient over time). I can think of a few different approaches to this analysis: standard mixed effects (e.g. These models are so general, I don't think that a general wrapper would be a lot of good. |
These are more difficult in some ways. How do we parameterize the smoothness functions since they can be different for each term? We could assume a common df for each term but that is pretty restricting. This would enable the
Definitely not. We'd have to have a different model specification function.
That's hard to say. The If you want, focus on this vignette and try implementing a model. Give us some feedback on where the pain was or if the documentation is unclear. Some of the modeling functions above are pretty vanilla (e.g. Poisson regression, ordinal regression, discriminant analysis). |
@topepo, the way I had thought about the fit of As an example, I'd like to be able to do something like the following (I know that this isn't the current parsnip interface): nonlinear_reg(mode="regression") %>%
model_data(my_data) %>%
set_engine("nlme") %>%
equation(response="y", predictor="e0 + emax*x/(ec50+x)") %>%
random_var(e0="subjectid") %>%
var_start(e0=10, emax=50, ec50=30) %>%
fit() That way, we would be able to build up "equation" libraries that map to With this, parsnip as a standardized interface to mixed effects models would be very useful as a base for libraries of models that implement domain-specific model equations (like nonlinear_reg(mode="regression") %>%
model_data(my_data) %>%
set_engine("nlme") %>%
equation(response="y", predictor="e0 + emax*x/(ec50+x)") %>%
model_substitute(e0="e0_base + (age-50)*e0_age")
random_var(e0_base="subjectid") %>%
var_start(e0=10, emax=50, ec50=30) %>%
fit() which could yield an equation like: y = (e0_base + (age-50)*e0_age) + emax*x/(ec50+x) (I've done some initial playing around with formula substitution in https://github.com/billdenney/formulops, but this was more for me to play with than intended to release.) |
That would introduce 5 or 6 new functions just to fit one type of model. I'd like to avoid that. I think that the formula could be supplied (even if it has covariates) to We are currently figuring out how to build random effect-related models terms. |
I understand not wanting 5-6 new functions unless they generalize to many model types. And, for figuring out how to build random effect-related model terms, that seems like it could make most of the needs fixed. With a generalized way to define random effect-related model terms, I could imagine something like the following: model_fixed_effect <-
linear_reg() %>%
set_engine("lm") %>%
fit(
Sepal.Length~Sepal.Width,
data=iris
)
model_mixed_effect <-
linear_reg() %>%
set_engine("lme4") %>%
fit(
Sepal.Length~Sepal.Width+(1|Species),
data=iris
) And, ideally, I could use the same equation form with either the "lm" or "lme4" engine, but the "lm" engine version would give a warning that the random effect-parts are ignored: model_fixed_effect <-
linear_reg() %>%
set_engine("lm") %>%
fit(
Sepal.Length~Sepal.Width+(1|Species),
data=iris
)
# Gives warning like: random effect on `1` is ignored with "lm" engine.
model_mixed_effect <-
linear_reg() %>%
set_engine("lme4") %>%
fit(
Sepal.Length~Sepal.Width+(1|Species),
data=iris
) With this general idea, the only added function required would be As an aside, I like the |
I see poisson regression included in @topepo's original list — are you still planning to provide support for it in the future? Are other count regression models like those in the list below being considered as well? Thanks! Count Regression Models
|
@tiernanmartin Well timed: @billdenney Getting back to some of your questions... I'll open up a package that adds engines to different packages for |
Thanks for the Poisson support! Echoing @tiernanmartin - any plans for the Negative Binomial in the future? |
Hello, is parsnip considering an tidymodel implementation of Bayesian Additive Regression Trees (BART) sometime down the line? |
@SangeetM That's a good idea, especially since there is now a version that doesn't use Feel free to put a PR into The thing to avoid is needing to make a wrapper function to fit the model. This means that we'd have to add it as a formal dependency to |
@topepo thanks for the suggestion. Earlier I was looking into bartMachine which was present in the |
@topepo We've just seen our first request for modeltime to add GAMs. Chances are it should probably be a broad package, but we could get something started for regression. business-science/modeltime#71 |
I'm all for that. We should probably have the main model definition live in
Any ideas for this? Assuming the same smoothing term across predictors is probably better than nothing but not very satisfying. Davis and I have discussed the possibility of having |
@topepo We've started very early strategizing a new Time Series ResultsGAMs are showing very strong promise for time series in the testing we have performed.
Implementation and TuningWe will revert back on the 2nd point. We need to think about this. |
@topepo Here's what I'm thinking regarding the progression of a package.
|
@topepo Do you have any plans for wrappers to create a multinomial classification? I mean sth. where you can just supply any binary classification algo and then specify sth. like "one vs. one" or "one vs. all"? Maybe I am overlooking sth. but so far I can only see multinomial algos for a hand full of specified algos? Not sth. I currently need, but I had lots of use cases for this in the past where I wanted to benchmark several algos and didn't find any consistent interface in the R world and so just moved over to use a specific algo e.g. XGBoost by their interface directly. |
We might put those in the probably package but it is not high on the priority list. For tidymodels, that is most functional when we have post-processing of model outputs. |
Is Negative binomial support added in the parsnip package? |
Not yet but feel free to make a PR for the |
Just ran across this issue while looking for a way to do ordinal logistic regression in |
@AmeliaMN I did start a prototype repo for this a while back and you can probably give me some feedback. The main issue that I had as about how to organize the functions. In parsnip we try to have the main model functions describe the structural aspects of the model (e.g.
and so on. My thinking is that people would probably want to look at the parallel assumption (assuming they have the right design for that) and tuning over the How would you like to see these types of model organized? The other main issue is the high degree of heterogeneity and other issues with some of the packages. glmnetcr and ordinalNet are interesting but not easy to productionize given how they do the regularization. I'd also try to add ordinalForest and party models for trees. Finally, there are some good brms models. I've mostly stayed away from those since the compiler requirement seems problematic for a lot of users. Also, I think that the model would need to be compiled in every resample :groan: |
Closing this issue. For new model requests, please start new issues. Thanks for all of the feedback! |
kknn
packagedecision_tree
viarpart
,C5.0
,spark
(others?)kernlab
)glment
andspark
earth
packageklaR
,spark?)rpart
version and side package)linear_reg
)multilevelmod
package)censored
package)👆Already in
parsnip
or adjacent package👇working on or thinking about
The text was updated successfully, but these errors were encountered: