Skip to content
Permalink
Browse files

doc updates for wtep_woe

  • Loading branch information...
topepo committed Jul 10, 2019
1 parent 6ded626 commit 6323e8de45032fe2381783c531d57075136bfb24
@@ -5,11 +5,13 @@ S3method(bake,step_lencode_bayes)
S3method(bake,step_lencode_glm)
S3method(bake,step_lencode_mixed)
S3method(bake,step_umap)
S3method(bake,step_woe)
S3method(prep,step_embed)
S3method(prep,step_lencode_bayes)
S3method(prep,step_lencode_glm)
S3method(prep,step_lencode_mixed)
S3method(prep,step_umap)
S3method(prep,step_woe)
S3method(print,step_embed)
S3method(print,step_lencode_bayes)
S3method(print,step_lencode_glm)
@@ -20,12 +22,16 @@ S3method(tidy,step_lencode_bayes)
S3method(tidy,step_lencode_glm)
S3method(tidy,step_lencode_mixed)
S3method(tidy,step_umap)
S3method(tidy,step_woe)
export(add_woe)
export(dictionary)
export(embed_control)
export(step_embed)
export(step_lencode_bayes)
export(step_lencode_glm)
export(step_lencode_mixed)
export(step_umap)
export(step_woe)
export(tidy)
export(tidy.step_embed)
export(tidy.step_lencode_bayes)
@@ -71,6 +77,7 @@ importFrom(recipes,rand_id)
importFrom(recipes,sel2char)
importFrom(recipes,step)
importFrom(recipes,terms_select)
importFrom(rlang,"!!")
importFrom(rlang,set_names)
importFrom(rstanarm,stan_glmer)
importFrom(stats,as.formula)
@@ -88,6 +95,7 @@ importFrom(tibble,rownames_to_column)
importFrom(tidyr,gather)
importFrom(utils,capture.output)
importFrom(utils,globalVariables)
importFrom(utils,stack)
importFrom(uwot,umap)
importFrom(uwot,umap_transform)
importFrom(withr,with_seed)
@@ -1,6 +1,9 @@
# `embed` 0.0.3

## New Steps

* `step_umap()` was added for both supervised and unsupervised encodings.
* `step_woe()` created weight of evidence encodings.


# `embed` 0.0.2
@@ -84,10 +84,11 @@
#'
#' woe_models <- prep(rec, training = credit_tr)
#'
#' woe_te <- bake(woe_models, new_data = credit_te)
#'
#' head(woe_te)
#' tidy(rec, number = 1)
#' # the encoding:
#' bake(woe_models, new_data = credit_te %>% slice(1:5), starts_with("woe"))
#' # the original data
#' credit_te %>% slice(1:5) %>% dplyr::select(Job, Home)
#' # the details:
#' tidy(woe_models, number = 1)
#'
#' # Example of custom dictionary + tweaking
@@ -15,6 +15,8 @@ The steps for categorical predictors are:

* `step_embed` uses `keras::layer_embedding` to translate the original _C_ factor levels into a set of _D_ new variables (< _C_). The model fitting routine optimizes which factor levels are mapped to each of the new variables as well as the corresponding regression coefficients (i.e., neural network weights) that will be used as the new encodings.

* `step_woe` creates new variables based on weight of evidence encodings.

For numeric predictors:

* `step_umap` uses a nonlinear transformation similar to t-SNE but can be used to project the transformation on new data. Both supervised and unsupervised methods can be used.
@@ -23,9 +25,10 @@ Some references for these methods are:

* Francois C and Allaire JJ (2018) [_Deep Learning with R_](https://www.manning.com/books/deep-learning-with-r), Manning
* Guo, C and Berkhahn F (2016) "[Entity Embeddings of Categorical Variables](https://arxiv.org/abs/1604.06737)"
* Micci-Barreca D (2001) "A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems," ACM SIGKDD Explorations Newsletter, 3(1), 27-32.
* Micci-Barreca D (2001) "[A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=A+preprocessing+scheme+for+high-cardinality+categorical+attributes+in+classification+and+prediction+problems&btnG=)," ACM SIGKDD Explorations Newsletter, 3(1), 27-32.
* Zumel N and Mount J (2017) "[`vtreat`: a `data.frame` Processor for Predictive Modeling](https://arxiv.org/abs/1611.09477)"
* McInnes L and Healy J (2018) [UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction](https://arxiv.org/abs/1802.03426)
* Good, I. J. (1985), "[Weight of evidence: A brief survey](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Weight+of+evidence%3A+A+brief+survey&btnG=)", Bayesian Statistics, 2, pp.249-270.



Large diffs are not rendered by default.

Large diffs are not rendered by default.

Some generated files are not rendered by default. Learn more.

Some generated files are not rendered by default. Learn more.

Some generated files are not rendered by default. Learn more.

0 comments on commit 6323e8d

Please sign in to comment.
You can’t perform that action at this time.