sparse matrices and train #31

topepo · 2014-07-21T17:22:39Z

See here.

This should be feasible now that x and y are carried along separately through train.

The text was updated successfully, but these errors were encountered:

zachmayer · 2014-07-21T17:52:57Z

This is an excellent idea

topepo · 2014-07-21T19:42:22Z

Feel free to take a whack at it =]

On Mon, Jul 21, 2014 at 1:52 PM, Zach Mayer notifications@github.com
wrote:

This is an excellent idea

—
Reply to this email directly or view it on GitHub
#31 (comment).

topepo · 2014-07-28T18:23:45Z

I didn't hear anything from you, so I've checked in some changes to make
this happen. I'm testing them now if you want to give it a try.

The changes were pretty minor but there were a few complications. The
pre-process via caret might not work since we can't guarantee that certain
operations will work with non matrix/data.frame classes. For example, I
don;t see a complete.cases() function for the sparse matrix class from the
Matrix package. There are some (small) holes like that to fill. So far, the
regression tests look fine but I'll also check them agains the ones
generated by the last caret version.

On Mon, Jul 21, 2014 at 3:42 PM, Max Kuhn mxkuhn@gmail.com wrote:

Feel free to take a whack at it =]

On Mon, Jul 21, 2014 at 1:52 PM, Zach Mayer notifications@github.com
wrote:

This is an excellent idea

—
Reply to this email directly or view it on GitHub
#31 (comment).

topepo · 2014-07-29T14:52:20Z

Regression tests between the current devel version and 6.0-30 show no differences in the model results between versions. I've added a test case in the glmnet test file but other tests would be good. I'm also testing out string kernels for SVM models too.

zachmayer · 2014-07-29T15:12:59Z

Thanks for grabbing this— a kaggle competition has been eating up all my free time lately =D

topepo · 2014-08-08T21:41:02Z

As far as I can tell, things work fine. I'm going to close this.

leakyMirror · 2014-11-28T20:28:35Z

How can I use sparse matrix with caret and glmnet for classification problem? My response is in binary format [0,1] and train function says that this is actually a regression problem :) If I set metric to RMSE it throws a warning:
The training data could not be converted to a data frame for saving

My data is converted to sparse matrix with Matrix package.

topepo · 2014-11-29T21:44:09Z

I think the first issue is the format of your outcome. Make it a factor with levels that are valid R variables (e.g. "yes"/"no") so that train knows that you are doing classification.

If you are still having issues after that, send us an example that we can test with.

I'm going to be out of commission from shoulder surgery starting Monday so you might not get a response back from me in the short term.

zachmayer · 2014-11-29T22:10:31Z

Yikes! Good luck with the surgery Max!

@leakyMirror If you post a reproducible example (twoClassSim is super useful here) I can help you debug it.

pgswartz · 2015-03-19T20:01:34Z

Thanks for all your work on this package Max and zachmayer! Caret's great.

I've also been trying to use sparse matrices with the output from the train method and am similarly running into forced conversion of my sparse matrices into non-sparse matrices. I'm running caret_6.0-41 and R version 3.1.2 (2014-10-31). If I run:

predict(trainLassoModel, newdata=testSparseMatrix)

this is slightly slower than

predict(trainLassoModel, newdata=testNonSparseMatrix)

Due to, based on profiling, the use of as.matrix in caret's implementation of predict. However, the following is very fast:

predict(trainLassoModel$finalModel, newx=testSparseMatrix)

due to, as I understand it, predict being called from the 'glmnet' package rather than 'caret'. (I know this last formulation will not carry out preprocessing, etc.).

Am I doing something wrong, or is predict(trainObj) still coercing newdata to a non-sparse form? I did not train on a sparse matrix, using one at prediction time.

Please let me know if I can provide a toy example and thanks again for the excellent package!

dselivanov · 2016-08-04T14:28:47Z

@zachmayer @topepo sorry for crossposting. Can you have a look whats going wrong here: text2vec-classification-with-caret-problems. How to feed sparse matrices (dgCMatrix) to caret routines?

randomgambit · 2016-10-27T21:07:02Z

Hello everyone,

I just discovered this amazing caret package. Incredible work! Following up on what has been said here, can I expect caret to work with sparse matrices? I am getting sparse matrices from quanteda after some text-classification. In other words, if I transform my sparse matrix using as.matrix, my computer will burst into flames....

Thanks!!

randomgambit · 2016-11-09T00:53:31Z

hello @topepo, can you please confirm caret can finally work with sparse matrices and glmnet? thanks!!

topepo added the enhancement label Jul 21, 2014

topepo added the help wanted label Jul 21, 2014

topepo closed this as completed Aug 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparse matrices and train #31

sparse matrices and train #31

topepo commented Jul 21, 2014

zachmayer commented Jul 21, 2014

topepo commented Jul 21, 2014

topepo commented Jul 28, 2014

topepo commented Jul 29, 2014

zachmayer commented Jul 29, 2014

topepo commented Aug 8, 2014

leakyMirror commented Nov 28, 2014

topepo commented Nov 29, 2014

zachmayer commented Nov 29, 2014

pgswartz commented Mar 19, 2015

dselivanov commented Aug 4, 2016

randomgambit commented Oct 27, 2016 •

edited

randomgambit commented Nov 9, 2016

sparse matrices and train #31

sparse matrices and train #31

Comments

topepo commented Jul 21, 2014

zachmayer commented Jul 21, 2014

topepo commented Jul 21, 2014

topepo commented Jul 28, 2014

topepo commented Jul 29, 2014

zachmayer commented Jul 29, 2014

topepo commented Aug 8, 2014

leakyMirror commented Nov 28, 2014

topepo commented Nov 29, 2014

zachmayer commented Nov 29, 2014

pgswartz commented Mar 19, 2015

dselivanov commented Aug 4, 2016

randomgambit commented Oct 27, 2016 • edited

randomgambit commented Nov 9, 2016

randomgambit commented Oct 27, 2016 •

edited