Merge pull request #621 from weirubin/master

minor typos fix for bookdown
topepo · Mar 27, 2017 · 25f057d · 25f057d
2 parents 17ec2d9 + b3c4693
commit 25f057d
Show file tree

Hide file tree

Showing 3 changed files with 13 additions and 9 deletions.
diff --git a/bookdown/02-PreProcessing.Rmd b/bookdown/02-PreProcessing.Rmd
@@ -32,7 +32,7 @@ library(earth)
 data(etitanic)
 ```
 
-For example, the `etitanic` data set in the [`earth`](http://cran.r-project.org/web/packages/earth/index.html) package includes two factors: `r I(paste(levels(etitanic$pclass),  collapse = ", "))`) and <code>sex</code> (with levels `r I(paste(levels(etitanic$sex),  sep = "", collapse = ", "))`). The base R function `model.matrix` would generate the following variables:
+For example, the `etitanic` data set in the [`earth`](http://cran.r-project.org/web/packages/earth/index.html) package includes two factors: `pclass` (passenger class, with levels `r I(paste(levels(etitanic$pclass),  collapse = ", "))`) and `sex` (with levels `r I(paste(levels(etitanic$sex),  sep = "", collapse = ", "))`). The base R function `model.matrix` would generate the following variables:
 
 ```{r pp_dummy1}
 library(earth)
@@ -99,7 +99,7 @@ descrCor <-  cor(filteredDescr)
 highCorr <- sum(abs(descrCor[upper.tri(descrCor)]) > .999)
 ```
 
-For the previous MDRR data, there are r`I(highCorr)` descriptors that are almost perfectly correlated (|correlation| &gt; 0.999), such as the total information index of atomic composition (`IAC`) and the total information content index (neighborhood symmetry of 0-order) (`TIC0`) (correlation = 1). The code chunk below shows the effect of removing descriptors with absolute correlations above 0.75.
+For the previous MDRR data, there are `r I(highCorr)` descriptors that are almost perfectly correlated (|correlation| &gt; 0.999), such as the total information index of atomic composition (`IAC`) and the total information content index (neighborhood symmetry of 0-order) (`TIC0`) (correlation = 1). The code chunk below shows the effect of removing descriptors with absolute correlations above 0.75.
 
 ```{r pp_corr2}
 descrCor <- cor(filteredDescr)

diff --git a/bookdown/04-Basic.Rmd b/bookdown/04-Basic.Rmd
@@ -7,6 +7,7 @@ library(mlbench)
 library(kernlab)
 library(pROC)
 library(plyr)
+library(caret)
 ```
 
 # Model Training and Tuning
@@ -185,9 +186,7 @@ cat(paste(text2, collapse = "\n"))
 ```
 
 
-If there are missing values in the training set, PCA and ICA models only use complete samples.
-
-Another option is to use a random sample of possible tuning parameter combinations, i.e. "random search"[(pdf)](www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf). This functionality is described on [this page](random-hyperparameter-search.html).
+Another option is to use a random sample of possible tuning parameter combinations, i.e. "random search"[(pdf)](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf). This functionality is described on [this page](random-hyperparameter-search.html).
 
 To use a random search, use the option `search = "random"` in the call to `trainControl`. In this situation, the `tuneLength` parameter defines the total number of parameter combinations that will be evaluated.
 
@@ -354,7 +353,7 @@ The main issue with these functions is related to ordering the models from simpl
 ## Extracting Predictions and Class Probabilities
 
 
-As previously mentioned, objects produced by the `train` function contain the "optimized" model in the `finalModel` sub-object. Predictions can be made from these objects as usual. In some cases, such as `pls` or `gbm` objects, additional parameters from the optimized fit may need to be specified. In these cases, the `train` objects uses the results of the parameter optimization to predict new samples. For example, if predictions were create using `predict.gbm`, the user would have to specify the number of trees directly (there is no default). Also, for binary classification, the predictions from this function take the form of the probability of one of the classes, so extra steps are required to convert this to a factor vector. `predict.train` automatically handles these details for this (and for other models).
+As previously mentioned, objects produced by the `train` function contain the "optimized" model in the `finalModel` sub-object. Predictions can be made from these objects as usual. In some cases, such as `pls` or `gbm` objects, additional parameters from the optimized fit may need to be specified. In these cases, the `train` objects uses the results of the parameter optimization to predict new samples. For example, if predictions were created using `predict.gbm`, the user would have to specify the number of trees directly (there is no default). Also, for binary classification, the predictions from this function take the form of the probability of one of the classes, so extra steps are required to convert this to a factor vector. `predict.train` automatically handles these details for this (and for other models).
 
 Also, there are very few standard syntaxes for model predictions in R. For example, to get class probabilities, many  `predict` methods have an argument called  `type` that is used to specify whether the classes or probabilities should be generated. Different packages use different values of `type`, such as `"prob"`, `"posterior"`, `"response"`, `"probability"` or `"raw"`. In other cases, completely different syntax is used.
 
@@ -383,7 +382,7 @@ There are several [`lattice`](http://cran.r-project.org/web/packages/lattice/ind
 
 For example, the following statements create a density plot:
 
-```{r 4,echo=FALSE,fig.width=7,fig.height=4}
+```{r 4,echo=TRUE,fig.width=7,fig.height=4}
 trellis.par.set(caretTheme())
 densityplot(gbmFit3, pch = "|")
 ```
@@ -395,7 +394,7 @@ Note that if you are interested in plotting the resampling results across multip
 ### Between-Models
 
 
-The [`caret`](http://cran.r-project.org/web/packages/caret/index.html) package also includes functions to characterize the differences between models (generated using `train`,  `sbf` or `rfe`) via their resampling distributions. These functions are based on the work of [Hothorn et al. (2005)](http://www.stat.uni-muenchen.de/~leisch/papers/Hothorn+Leisch+Zeileis-2005.pdf) and [Eugster et al (2008)](http://epub.ub.uni-muenchen.de/10604/1/tr56.pdf).
+The [`caret`](http://cran.r-project.org/web/packages/caret/index.html) package also includes functions to characterize the differences between models (generated using `train`,  `sbf` or `rfe`) via their resampling distributions. These functions are based on the work of [Hothorn et al. (2005)](https://homepage.boku.ac.at/leisch/papers/Hothorn+Leisch+Zeileis-2005.pdf) and [Eugster et al (2008)](http://epub.ub.uni-muenchen.de/10604/1/tr56.pdf).
 
 First, a support vector machine model is fit to the Sonar data. The data are centered and scaled using the `preProc` argument. Note that the same random number seed is set prior to the model that is identical to the seed used for the boosted tree model. This ensures that the same resampling sets are used, which will come in handy when we compare the resampling profiles between models.
 
@@ -438,6 +437,11 @@ Note that, in this case, the option `resamples = "final"` should be user-defined
 There are several lattice plot methods that can be used to visualize the resampling distributions: density plots, box-whisker plots, scatterplot matrices and scatterplots of summary statistics. For example:
 
 ```{r train_resample_box,fig.width=9,fig.height=4}
+theme1 <- trellis.par.get()
+theme1$plot.symbol$col = rgb(.2, .2, .2, .4)
+theme1$plot.symbol$pch = 16
+theme1$plot.line$col = rgb(1, 0, 0, .7)
+theme1$plot.line$lwd <- 2
 trellis.par.set(theme1)
 bwplot(resamps, layout = c(3, 1))
 ```

diff --git a/bookdown/08-Parallel.Rmd b/bookdown/08-Parallel.Rmd
@@ -23,7 +23,7 @@ include_graphics('premade/parallel.png', dpi = NA)
 
 Note that some models, especially those using the [`RWeka`](http://cran.r-project.org/web/packages/RWeka/index.html) package, may not be able to be run in parallel due to the underlying code structure.
 
-`train`, `rfe`, `sbf`, `bag` and  `avNNet` were given an additional argument in their respective control files called  `allowParallel` that defaults to `TRUE`. When `TRUE`, the code will be executed in parallel if a parallel backend (e.g. **doMC**) is registered. When  `allowParallel`` = FALSE`, the parallel backend is always ignored. The use case is when `rfe` or `sbf` calls  `train`. If a parallel backend with *P* processors is being used, the combination of these functions will create *P*^2^ processes. Since some operations benefit more from parallelization than others, the user has the ability to concentrate computing resources for specific functions.
+`train`, `rfe`, `sbf`, `bag` and  `avNNet` were given an additional argument in their respective control files called  `allowParallel` that defaults to `TRUE`. When `TRUE`, the code will be executed in parallel if a parallel backend (e.g. **doMC**) is registered. When  `allowParallel = FALSE`, the parallel backend is always ignored. The use case is when `rfe` or `sbf` calls  `train`. If a parallel backend with *P* processors is being used, the combination of these functions will create *P*^2^ processes. Since some operations benefit more from parallelization than others, the user has the ability to concentrate computing resources for specific functions.
 
 One additional "trick" that `train` exploits to increase computational efficiency is to use sub-models; a single model fit can produce predictions for multiple tuning parameters. For example, in most implementations of boosted models, a model trained on *B* boosting iterations can produce predictions for models for iterations less than *B*. Suppose a `gbm` model was tuned over the following grid: