Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse matrices and train #31

Closed
topepo opened this issue Jul 21, 2014 · 13 comments
Closed

sparse matrices and train #31

topepo opened this issue Jul 21, 2014 · 13 comments

Comments

@topepo
Copy link
Owner

topepo commented Jul 21, 2014

See here.

This should be feasible now that x and y are carried along separately through train.

@zachmayer
Copy link
Collaborator

This is an excellent idea

@topepo
Copy link
Owner Author

topepo commented Jul 21, 2014

Feel free to take a whack at it =]

On Mon, Jul 21, 2014 at 1:52 PM, Zach Mayer notifications@github.com
wrote:

This is an excellent idea


Reply to this email directly or view it on GitHub
#31 (comment).

@topepo
Copy link
Owner Author

topepo commented Jul 28, 2014

I didn't hear anything from you, so I've checked in some changes to make
this happen. I'm testing them now if you want to give it a try.

The changes were pretty minor but there were a few complications. The
pre-process via caret might not work since we can't guarantee that certain
operations will work with non matrix/data.frame classes. For example, I
don;t see a complete.cases() function for the sparse matrix class from the
Matrix package. There are some (small) holes like that to fill. So far, the
regression tests look fine but I'll also check them agains the ones
generated by the last caret version.

On Mon, Jul 21, 2014 at 3:42 PM, Max Kuhn mxkuhn@gmail.com wrote:

Feel free to take a whack at it =]

On Mon, Jul 21, 2014 at 1:52 PM, Zach Mayer notifications@github.com
wrote:

This is an excellent idea


Reply to this email directly or view it on GitHub
#31 (comment).

@topepo
Copy link
Owner Author

topepo commented Jul 29, 2014

Regression tests between the current devel version and 6.0-30 show no differences in the model results between versions. I've added a test case in the glmnet test file but other tests would be good. I'm also testing out string kernels for SVM models too.

@zachmayer
Copy link
Collaborator

Thanks for grabbing this— a kaggle competition has been eating up all my free time lately =D

@topepo
Copy link
Owner Author

topepo commented Aug 8, 2014

As far as I can tell, things work fine. I'm going to close this.

@topepo topepo closed this as completed Aug 8, 2014
@leakyMirror
Copy link

How can I use sparse matrix with caret and glmnet for classification problem? My response is in binary format [0,1] and train function says that this is actually a regression problem :) If I set metric to RMSE it throws a warning:
The training data could not be converted to a data frame for saving

My data is converted to sparse matrix with Matrix package.

@topepo
Copy link
Owner Author

topepo commented Nov 29, 2014

I think the first issue is the format of your outcome. Make it a factor with levels that are valid R variables (e.g. "yes"/"no") so that train knows that you are doing classification.

If you are still having issues after that, send us an example that we can test with.

I'm going to be out of commission from shoulder surgery starting Monday so you might not get a response back from me in the short term.

@zachmayer
Copy link
Collaborator

Yikes! Good luck with the surgery Max!

@leakyMirror If you post a reproducible example (twoClassSim is super useful here) I can help you debug it.

@pgswartz
Copy link

Thanks for all your work on this package Max and zachmayer! Caret's great.

I've also been trying to use sparse matrices with the output from the train method and am similarly running into forced conversion of my sparse matrices into non-sparse matrices. I'm running caret_6.0-41 and R version 3.1.2 (2014-10-31). If I run:

predict(trainLassoModel, newdata=testSparseMatrix)

this is slightly slower than

predict(trainLassoModel, newdata=testNonSparseMatrix)

Due to, based on profiling, the use of as.matrix in caret's implementation of predict. However, the following is very fast:

predict(trainLassoModel$finalModel, newx=testSparseMatrix)

due to, as I understand it, predict being called from the 'glmnet' package rather than 'caret'. (I know this last formulation will not carry out preprocessing, etc.).

Am I doing something wrong, or is predict(trainObj) still coercing newdata to a non-sparse form? I did not train on a sparse matrix, using one at prediction time.

Please let me know if I can provide a toy example and thanks again for the excellent package!

@dselivanov
Copy link

@zachmayer @topepo sorry for crossposting. Can you have a look whats going wrong here: text2vec-classification-with-caret-problems. How to feed sparse matrices (dgCMatrix) to caret routines?

@randomgambit
Copy link

randomgambit commented Oct 27, 2016

Hello everyone,

I just discovered this amazing caret package. Incredible work! Following up on what has been said here, can I expect caret to work with sparse matrices? I am getting sparse matrices from quanteda after some text-classification. In other words, if I transform my sparse matrix using as.matrix, my computer will burst into flames....

Thanks!!

@randomgambit
Copy link

hello @topepo, can you please confirm caret can finally work with sparse matrices and glmnet? thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants