-
Notifications
You must be signed in to change notification settings - Fork 4
/
02-function-interfaces.Rmd
executable file
·58 lines (29 loc) · 3.93 KB
/
02-function-interfaces.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Function Interfaces
We distinguish between "top-level"/"user-facing" api's and "low-level"/"computational" api's. The former being the interface between the users of the function (with their needs) and the code that does the estimation/training activities.
<div id="decouple-back"></div>
When creating model objects, the computational code that fits the model should be **decoupled** from the interface code to specify the model. This allows for different interfaces to be used to specify the model that then pass common data structures on to
the computational code. [{note}](#decouple)
## User-Facing Functions
* At a minimum, the top-level model function should be a generic with methods for data frames, formulas, and possibly recipes. These methods error trap bad arguments, format/encode the data, then pass it along to the lower-level computational code.
* Do not require users to create dummy variables from their categorical predictors. Provide a formula and/or recipe interface to your model to do this (see the next item) and other methods should error trap if qualitative data should not be subsequently used.
* These top-level functions should use proper S3 dispatch (e.g. appropriately invoke to `foo.data.frame`, `foo.formula` and so on).
<div id="user-appropriate-back"></div>
* Only **user-appropriate data structures** should be accommodated for the user-facing function. The underlying computational code should make appropriate transformations to computationally appropriate formats/encodings. [{note}](#user-appropriate)
<div id="top-level-back"></div>
* **Design the top-level code for humans**. This includes using sensible defaults and protecting against common errors. Design your top-level interface code so that people will not hate you. [{note}](#top-level)
* Parameters that users will commonly modify should be main arguments to the top-level function. Others, especially those that control computational aspects of the fit, should be contained in a `control` object.
<div id="common-param-back"></div>
* If your model passes `...` to another modeling function, consider the names of your top-level function arguments to avoid conflicts with the argument names of the underlying function. [{note}](#common-param)
<div id="dot-check-back"></div>
* If possible, check to see if the arguments passed to `...` are real arguments to the function(s) receiving the `...`. [{note}](#dot-check)
<div id="standardized-back"></div>
* Common arguments should use standardized names. [{note}](#standardized)
## Computational Functions
* A test set should never be required when fitting a model.
* If internal resampling is involved in the fitting process, there is a strong preference for using `tidymodels` infrastructure so that a common interface (and set of choices) can be used. If this cannot be done (e.g. the resampling occurs in C code), there should be some ability to pass in integer values that define the resamples. In this way, the internal sampling is reproducible. The same is true for other infrastructure (e.g. `yardstick` for performance measures, etc.)
* When possible, do not reimplement computations that have been done well elsewhere (tidy or not). For example, kernel methods should use the infrastructure in `kernlab`, exponential family distribution computations should use those in `?family` etc.
* For modeling packages that use random numbers, setting the seed in R should control how random numbers are generated internally. At worst, a random number seed for non-R code (e.g. C, Java) should be an argument to the main modeling function.
* Computational code should (almost) always use `X[ , ,drop = FALSE]` when subsetting matrices (and non-tibble data frames) to make sure that the 2D structure is maintained.
* Use `try` or similar methods to avoid uncontrolled errors in order to return intelligible errors.
<div id="calls-back"></div>
* Using the call object for post-estimation activies is discouraged. [{note}](#calls)