Question about preProcOptions in trainControl #335

ajing · 2015-11-30T18:15:42Z

I wonder how is the preprocess handled in the training process (or in the train function). If the method in trainControl is "cv", is preprocess applied to the whole training data or parts of data after splitting by cross-validation. I was doing preprocessing, like PCA, for the whole training data set, before I run the train function. I think it may lead to data leakage for the following cross-validation step.

topepo · 2015-11-30T18:38:11Z

No leakage! =]

All pre-processing is applied on the resampled version of the data (e.g. 90% in 10-fold CV) and then those calculations are applied to the holdouts (the remaining 10%) with no re-calculation.

For example, if you use 50 bootstrap samples, the pre-processing that you choose is applied 50 separate times (plus once more for the entire data set at the very end).

Also, if any subsampling is done to deal with a class imbalance, that is also done within each resample. There is a programatic choice to do the pre-processing before or after the subsampling.

ajing · 2015-12-02T15:19:56Z

Thanks for the explanation! What if I want to do tSNE for preprocessing? I didn't see that as an option.

zachmayer · 2015-12-02T15:21:42Z

It's not an option because none of the R packages for tnsne can make predictions on new data (as far as I know). If you know how to do it, feel free to make a pull request to add it to preProccess.

zachmayer · 2015-12-02T15:23:08Z

It looks like they're adding Barnes-Hut t-SNE to scikit learn, which can predict on new data. Someone should port this to R!

ajing · 2015-12-02T15:48:20Z

You are right. Is there any way to port to R?

zachmayer · 2015-12-02T15:49:56Z

It turns out there's already an R implementation in the Rtsne package.

I opened a feature request for a predict method. We'll see what happens!

howewenann · 2018-07-02T08:35:34Z

"There is a programatic choice to do the pre-processing before or after the subsampling."
How do you do this? Can't seem to find it in the documentation.

topepo closed this as completed Dec 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about preProcOptions in trainControl #335

Question about preProcOptions in trainControl #335

ajing commented Nov 30, 2015

topepo commented Nov 30, 2015

ajing commented Dec 2, 2015

zachmayer commented Dec 2, 2015

zachmayer commented Dec 2, 2015

ajing commented Dec 2, 2015

zachmayer commented Dec 2, 2015

howewenann commented Jul 2, 2018

Question about preProcOptions in trainControl #335

Question about preProcOptions in trainControl #335

Comments

ajing commented Nov 30, 2015

topepo commented Nov 30, 2015

ajing commented Dec 2, 2015

zachmayer commented Dec 2, 2015

zachmayer commented Dec 2, 2015

ajing commented Dec 2, 2015

zachmayer commented Dec 2, 2015

howewenann commented Jul 2, 2018