Differentiating between models, estimators and engines

I'm think I can finally translate the thoughts from the modeling abstraction essay (a separate doc that grew out of #19) into `parsnip` terms. Some concepts to start:

- A *model* is a family of probability distributions or functions. That is, a model is *set*.
- An *estimator* is a way to calculate the parameters of a model from a dataset. Note that hyperparameters are most often properties of estimators.
- The resulting estimates are a *fit* (I think @topepo often refers to this a sub-model). This is an *element* of the model.
- There are often multiple algorithms and implementations of the same estimator. In this case, using `parsnip` terminology, each implementation is a different *engine*.

## Estimators are typically implicit

- `lm` specifies the OLS estimator for the linear model
- `glmnet` specifies the elastic net estimator for the linear model

## Estimator selection should be explicit

Something along the lines of

```r
ols_hc1_fit <- linear_reg() %>% 
  linear_estimator(coefs = "ols", coef_covariance = "HC1") %>% 
  fit_xy(
    x = ...,
    y = ...,
    engine = "lm_robust"
  )
```

Perhaps the `linear_reg()` isn't necessary here, but it does feel the most explicit / low-level to me. In particular, I think it's important to explicitly select an estimator, rather than letting it be implicit in `engine`. All estimators are not created equal.

## Different estimators should have informative subclasses

Currently the `parsnip` behavior is to always produce a `model_fit` object:

```r
ols <- linear_reg() %>% 
  fit(hp ~ ., data = mtcars, engine = "lm")

class(ols)
># [1] "model_fit"
```

I'm strongly of the opinion that `ols` should have subclasses that indicate:
- the `model_fit` was estimated using ordinary least squares
- the `model_fit` object contains a single fit/submodel, as opposed to a set of fits/submodels

Without this differentiation I don't think it's possible to meaningfully define methods on `ols` for inference. Consider the following methods, all for the linear model:

- `plot_lasso_path()` only makes sense for a set of fits from the LASSO estimator
- `coef_standard_errors()` makes sense for a fit from the OLS estimator but not the LASSO estimator
- `interpret_coefficients()` should have different behavior for an OLS fit and a GEE fit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Differentiating between models, estimators and engines #54

Estimators are typically implicit

Estimator selection should be explicit

Different estimators should have informative subclasses

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Differentiating between models, estimators and engines #54

Description

Estimators are typically implicit

Estimator selection should be explicit

Different estimators should have informative subclasses

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions