-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending Graph regularization to images? #87
Comments
Hi Sayak, thanks for your interest! Can you say more about what you mean by extending graph regularization? What are you thinking of demonstrating and what dataset would it be for? |
Hi Arjun. I should have been clearer. I meant to say "graph regularization example" that we have for text classification. I am thinking of using similar methods on an image dataset (let's say the Flowers dataset). Some brief pointers:
Let me know if anything is unclear. |
Ah I see, that makes sense. We'll discuss this with the rest of our group and get back to you later this week. |
Do you have a dataset in mind that encodes a natural/organic graph, perhaps something like a co-occurrence graph? We believe that using an orthogonal source of similarity and not just inferring it based on embeddings will be much more valuable for graph regularization, and so it'd be great to demonstrate that if possible. Another option might be to create 'perturbed' versions of images and use them as neighbors for graph regularization to improve the stability/robustness of the model. Let us know what you think. |
I don't have that kind of dataset in mind. Since in the text classification example, graph regularization was demonstrated using the IMDB dataset I was more inclined toward using a dataset of natural images.
It's already kind of covered implicitly in the Adversarial Regularization example, isn't it? |
Ideally, our goal should be to demonstrate some wins of using graph regularization in this tutorial. Here are two options for doing that; there could be more: Option 1:Use a complex pre-trained model to generate image embeddings and then use embedding similarity for graph building. Then, for the classification task, use a simple(r) model. The hope here is that using the graph will yield some improvements for the classification model because of the more powerful model used to generate embeddings. Let us know if you have other ideas here. Option 2Evaluate the robustness/stability of a model using image perturbations. The additional perturbed examples can be used as augmented training data or as neighbors for graph regularization. Note that this is different from the adversarial regularization example because here we'd be generating model-agnostic image perturbations -- for example, cropped, shifted, rotated, blurred, etc, versions of images. We have some work underway along this thread and could potentially collaborate on this if you're interested. |
I am interested to collaborate on both. Sounds really interesting. The first idea you mentioned, that is precisely what I had in mind, probably I did not convey that well enough. Let me know the best possible way to start this off. I have the bandwidth to work on both the cases. |
Sounds good. For option 1, feel free to put together what you had in mind and send us a PR. We can plan to have it under https://github.com/tensorflow/neural-structured-learning/tree/master/neural_structured_learning/examples/notebooks. For option 2, will discuss with the rest of the team to see how we can go about this and circle back. |
Alright. For option 1, here's what I have in mind:
For option 2, that sounds good to me. I have experience working with those kinds of perturbations and corruptions. I recently worked on assessing the robustness of Vision Transformers against these perturbations and corruptions. |
@arjung I started putting together a notebook for option 1. After experimenting for a while, I am seeing that not all the images are having neighbors. This is likely because of the embeddings being generated by the pre-trained model I am using and also the hyperparameters I am using during constructing the graphs. To elaborate, here's an example of an entry that does not have any neighbors:
Here's one that does have neighbors:
How should we handle this situation? Here's the Colab Notebook for full reproducibility. Note that the pre-trained model I used (BiT-m-r50x1) to extract the embeddings yield a vector of shape (1, 2048) (considering we have only one image). I further reduced this to a vector of shape (1, 128) with random projection. The flowers dataset has only about 3600 examples in total categorized into 5 classes somewhat equally. The number of samples might be an issue. But I still wanted to know your thoughts. Let me know if anything is unclear. |
Thanks for putting together an initial version of the colab quickly. I took a quick peek at it and here are a couple of comments, which I think should address your question.
to
The actual default value of this neighbor feature doesn't matter because the corresponding neighbor weight is set to 0 -- this edge won't contribute to the graph regularization term. The shape has to be compatible with the value in the original example though. See https://www.tensorflow.org/neural_structured_learning/tutorials/graph_keras_mlp_cora#load_train_and_test_data for how this is done in a different example. |
Thank you, @arjung! I added the following: feature_default_value = tf.zeros((IMG_SIZE, IMG_SIZE, 3))
feature_default_value = tf.strings.as_string(feature_default_value, precision=2)
feature_spec[nbr_feature_key] = tf.io.FixedLenFeature([], tf.string,
default_value=feature_default_value) I also reduced the With this, when I am trying to parse the TFRecords (built with NSL augmentation), I'm running into:
Now when prefixed the shapes like so (following the tutorial), I got another error. feature_default_value = tf.zeros((IMG_SIZE, IMG_SIZE, 3))
feature_default_value = tf.strings.as_string(feature_default_value, precision=2)
feature_spec = {
'image': tf.io.FixedLenFeature([IMG_SIZE, IMG_SIZE, 3], tf.string,
default_value=feature_default_value),
'label': tf.io.FixedLenFeature((), tf.int64, default_value=-1),
}
...
feature_spec[nbr_feature_key] = tf.io.FixedLenFeature([IMG_SIZE, IMG_SIZE, 3], tf.string,
default_value=feature_default_value) Issue:
You can refer to the same Colab Notebook mentioned here in case you want to take a look. |
As the error indicates, |
Will look into it. But I guess I have already tried the next thing. When the feature_spec of the images are not specified with shapes ([]. rank-0) it results into:
I indicated this in the first part of my previous comment. |
The above code will not work because for the neighbor feature, you're specifying the shape as rank 0 but then specifying a default value with shape [img_size, img_size, 3]. |
Yes, that is why I tried the other one but then ran into the rank issue. Any approach you can think of to mitigate it? I understand we need to deal with the shape requirements but it's not immediately clear to me how we could do that here. |
One option is to specify the default value as a JPEG-encoded string with rank 0. Your code already handles decoding from jpeg to integer tensors after Alternatively, if you don't need the back-and-forth JPEG conversion, you can serialize the examples (output of augmentation) to contain an int64_list for the 'image' feature -- this is the format of the feature in the dataset to begin with. If you do that, then you specify the default value for the 'image' feature as an integer tensor with shape [384, 384, 3]. |
Thanks. The code now works. Here's the Colab. My base model is:
I think we can agree that this is way simple to deal with 224x224x3 images but I just wanted to get something up and running quickly. In this case, I saw graph regularization did play an important part. Over five runs, I was able to squeeze out at least 1-2% improvement over the base model. I have also added a visualization snippet to allow folks to get a deeper insight into the neighbors being formed by NSL: Let me know if I should proceed toward including the text pieces on this and any additional feedback you may have. |
Thanks. I've been a bit busy over the past 2 weeks and haven't had a chance to look into this. Will get to it over the next few days. |
I appreciate that. Thanks. |
Looks generally good, Sayak! I'll take another look once you send the PR. Please add sufficient documentation, doc strings, etc. A few comments/suggestions for now though regarding the results:
Can you try increasing the graph regularization multiplier and try increasing the # epochs? Another thing to potentially experiment with is the similarity threshold for the graph. If you increase it to > 0.65, is there a difference in the final model quality? |
Yes, I didn't train it to completion. I wanted to just briefly run the models to ensure they are working.
What we should expect to see there? This would give me a good idea as I experiment further.
I'll experiment with all of these and report back. Thanks, Arjun! |
@arjung here are some observations from the recent set of experiments I conducted:
Some points to note:
Here's the Colab Notebook (BiT) where all these are reflected. Let me know how you would want me to proceed or if anything is unclear. |
|
What I meant by this is the training accuracy is not on par with what we get with graph regularization. Without graph regularization, the training accuracy stays at 89%. With graph regularization incorporated, it reaches a substantially higher training as well validation accuracy. This is evident in both the notebooks I mentioned in my previous comment.
1024 (DenseNet121) and 2048 (BiT-ResNet) dimensions did not seem very practical when serializing the embeddings. I will do a small ablation with the embedding dimensionality and will update the results in the next iteration. |
Updates. Looks like the discrepancy between the training and validation performance can be easily fixed by reducing the batch size. Also, here is a short ablation study:
The table above reports the validation top-1 accuracies with and without graph regularization under different reduced embedding dimensionalities. Notice that after 256-d the performance trade-off reverses. Maybe a bit more tinkering with hyperparameters like |
Thanks. Are these the embedding dimensionality values used to build the graph, used in the input layer of the classifier, or both? You mentioned you were going to try a larger BiT model too? Is that still part of your plan? In general, I think all of these experiments are useful and it would be great to have the findings summarized at the end of the colab. |
For building the graph. Images go directly to the subsequent classifier.
Sure I will do that. Do you think now we have a good ground to start working on the tutorial based on the notebook? If so, I can work on it and have a PR ready. |
Yeah definitely, please go ahead. Thanks! |
Hi folks.
I am willing to work on a tutorial that shows how to extend graph regularization example in the same way it's done for text-based problems. Is there a scope for this tutorial inside this repo?
The text was updated successfully, but these errors were encountered: