Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image explanation #35

Closed
thomasp85 opened this issue Sep 21, 2017 · 17 comments
Closed

Add image explanation #35

thomasp85 opened this issue Sep 21, 2017 · 17 comments

Comments

@thomasp85
Copy link
Owner

This should be the focus (beyond bug fixes) for the next version. It will bring the R version on par with the Python one.

One of the biggest challenge of this is the superpixel segmentation - a crude implementation in R can be seen here

Another possible challenge is general memory usage - image data is much larger so the permutations will take both time and space.

On top of my head I believe the input should be image files rather than in-memory images. We can then provide a preprocessor function as with text analysis to allow the user to get the image data into the format they need for the model. This will solve the memory issue of permutations, as well as the fact that there seem to be no common image class with widespread use in modeling in R.

I believe the magick package should provide all the infrastructure needed for the lime side of things...

@pommedeterresautee you are free to comment and suggest things for this - I'll take the implementation upon me.

@thomasp85
Copy link
Owner Author

Wow - the superpixel implementation is really slow, mainly due to k-means being slow with the required number of clusters. I'll try to play with some alternative clustering algorithms to see if it can be speed up...

@thomasp85
Copy link
Owner Author

Hmm... An alternative algorithm to SLIC is Grid Seams (http://ieeexplore.ieee.org/document/6816834/) - a C++ implementation is available for inspiration: https://github.com/richarddlu/grid_seams

@thomasp85
Copy link
Owner Author

Another possibility is to improve upon SLIC. The main bottleneck is the k-means algorithm, but it seems there's no need to consider all pixels simultaneously as we know that pixels belonging to the same cluster is located spatially close to each other...

@pommedeterresautee
Copy link
Contributor

I have not tried anything applied to image and Lime, but currently I am playing with dbscan on another project. Would it mak things better? (perf are super good and there is only 1 reaured hyper parameter which can be the minimal size of a block, very handy)

@thomasp85
Copy link
Owner Author

Tried dbscan on a whim - crashed my r session for some reason... I’ll keep looking in to it

@thomasp85
Copy link
Owner Author

Got dbscan to work - not a good fit for this as the eps argument is not intuitive for image dimensions

@pommedeterresautee
Copy link
Contributor

For Grid seams, have you tried with O3 flag?

@thomasp85
Copy link
Owner Author

Haven't tried Grid seams at all - currently implementing SLIC in Rcpp

@expectopatronum
Copy link

Hi @thomasp85, is there any way how I could contribute to this issue? (I'll also gladly create test cases or later on a demo notebook/tutorial since I have no experience with Rcpp...)

@thomasp85
Copy link
Owner Author

If you have an image classifier and some test images to share it would help greatly - I already have most of the framework set up but as I doesn’t really do any image classification myself it’s hard to make a proper test

@expectopatronum
Copy link

I'm thinking about training a classifier using caret (probably random forest) on the fashion-mnist dataset. Is that ok?

@thomasp85
Copy link
Owner Author

I think we'll probably need some larger, more complex images? - have you seen the article and the use cases they have there?

But then again, I know next to nothing about image classification...

@expectopatronum
Copy link

Yes I saw it, but so far I couldn't figure out which dataset he was using. He also has an example using MNIST data. Do you know of any more interesting image datasets? Maybe the CIFAR-10 dataset is more interesting?

@expectopatronum
Copy link

I managed to train a tensorflow model on the CIFAR dataset, because with caret it never finished (I tried random forest and neural nets), maybe the data is too big. The model performance is not good yet but I was able to define predict_model and model_type in a way that it is recognized by explain. I'm pretty sure predict_model could/should be improved. But if you're interested I can show it to you and you can use the model for testing. Meanwhile I can try to improve the accuracy of the classifier.

@AdamSpannbauer
Copy link

AdamSpannbauer commented Feb 12, 2018

Hi. I installed the image branch to see how its current state would work with one of the pretrained models shipped with keras.

I ran into an error (which might be totally expected in the current state of the dev), and I thought I'd share my script here in case it aids the dev process at all. The script I used is here, where I attempted to use lime to explain predictions created by the pretrained VGG19 imagenet weights.

I created the 'image_explainer' object w/o error (I'm not sure how to verify if it was created correctly). There was an error when calling lime:::explain.imagefile: Error in ``*tmp*``[4, , ] : subscript out of bounds. The error occurs in line 59 of image.R; the first dimension value is 3 when running with the image_explainer created in my script.

Without exploration I changed the index in the line from 4 to 3 to see if it would work with dumb luck. It then ran into error in line 71 where it seems as if the perm matrix dimensions are transposed from what they should be for the colnames assignment to work.

Not sure if any of this helps, but I'd be more than happy to help if there's any additional exploration needed for the dev of the image features in this version of lime.


Edit: I've realized that some issues I'm running into might stem from creating the explainer with only one observation. I will recreate with more observations and see if the issues persist.

@thomasp85
Copy link
Owner Author

Sorry about being slow on this - I'll begin working on lime again soon and will get back to you on the image explanation part

@AdamSpannbauer
Copy link

No worries. Just wanted to share in case it helped dev in any way. I was/am excited to play with the new functionality in the R version; no rush on the dev process. Great work so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants