Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Error #7

Closed
AmingWu opened this issue Jun 16, 2019 · 15 comments
Closed

Code Error #7

AmingWu opened this issue Jun 16, 2019 · 15 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@AmingWu
Copy link

AmingWu commented Jun 16, 2019

Hello,
When I run python main.py --config ./config/Places_LT/stage_2_meta_embedding.py, there is an error.

File "./models/MetaEmbeddingClassifier.py", line 33, in forward
dist_cur = torch.norm(x_expand - centroids_expand, 2, 2)
RuntimeError: The size of tensor a (365) must match the size of tensor b (122) at non-singleton dimension 1

Here, I print the shape of x_expand and centroids_expand.

torch.Size([86, 365, 512])
torch.Size([86, 122, 512])

Could you give some advice to solve this problem?

@AmingWu
Copy link
Author

AmingWu commented Jun 16, 2019

Ok, I have solved this problem.

@xavieryxie
Copy link

Hello,
When I run python main.py --config ./config/Places_LT/stage_2_meta_embedding.py, there is an error.

File "./models/MetaEmbeddingClassifier.py", line 33, in forward
dist_cur = torch.norm(x_expand - centroids_expand, 2, 2)
RuntimeError: The size of tensor a (365) must match the size of tensor b (122) at non-singleton dimension 1

Here, I print the shape of x_expand and centroids_expand.

torch.Size([86, 365, 512])
torch.Size([86, 122, 512])

Could you give some advice to solve this problem?

I also met this problem and it is odd that i got different errors when i run python main.py... in multiple times. Could you tell me how do you solve the problem? Thanks

@AmingWu
Copy link
Author

AmingWu commented Jun 17, 2019

Use a single GPU. For example, CUDA_VISIBLE_DEVICES=0 python main.py --config ./config/Places_LT/stage_2_meta_embedding.py

@xavieryxie
Copy link

Use a single GPU. For example, CUDA_VISIBLE_DEVICES=0 python main.py --config ./config/Places_LT/stage_2_meta_embedding.py

Which python do you use? 2.7 or 3.5? i still got the same error, it is a little weird==

@xavieryxie
Copy link

Use a single GPU. For example, CUDA_VISIBLE_DEVICES=0 python main.py --config ./config/Places_LT/stage_2_meta_embedding.py

The problem has been solved. Thanks for your advice.

@AmingWu
Copy link
Author

AmingWu commented Jun 17, 2019

OK, When you have trained the model on the Place365 dataset, could you share your result with me?

@xavieryxie
Copy link

Ok, No problem.

@zhmiao
Copy link
Owner

zhmiao commented Jun 17, 2019

@AmingWu @onexxp Thank you very much for asking. The problem you have encountered was caused by the use of multi-GPU. We have had the same problem as well. Pytorch split the batch according to the number of available GPUs, such that the actual calculation in the code can cause problems because we assume the batch size should be fixed. (i.e. if we have 2 GPUs with batchsize=256, most likely in each GPU there would only be 128 samples, while all the other calculations are expecting 256 input samples). We did not prepare the code to be compatible with multi-GPU training/testing. We are sorry about this. It might need some extra effort to make it work.

@zhmiao zhmiao added enhancement New feature or request question Further information is requested labels Jun 17, 2019
@xavieryxie
Copy link

@AmingWu @onexxp Thank you very much for asking. The problem you have encountered was caused by the use of multi-GPU. We have had the same problem as well. Pytorch split the batch according to the number of available GPUs, such that the actual calculation in the code can cause problems because we assume the batch size should be fixed. (i.e. if we have 2 GPUs with batchsize=256, most likely in each GPU there would only be 128 samples, while all the other calculations are expecting 256 input samples). We did not prepare the code to be compatible with multi-GPU training/testing. We are sorry about this. It might need some extra effort to make it work.

Thanks for your answering and awesome work. I met another problem when i use python3.5 to run the code. When we initial models, the feat/classifier param will become order-less for we define the param as a dict, and the code won't work. I change it to OrderedDict() and it works. I don't know if it is just me occur the problem. Just a little question.

@xavieryxie
Copy link

OK, When you have trained the model on the Place365 dataset, could you share your result with me?

Have you trained the model? I use the default param to train the model, but the result seems a little lower than the paper reported. It is like: Many_shot_accuracy_top1: 0.412 Median_shot_accuracy_top1: 0.369 Low_shot_accuracy_top1: 0.218 on the closed-set.

@AmingWu
Copy link
Author

AmingWu commented Jun 18, 2019

My result is lower than your result.

@AmingWu
Copy link
Author

AmingWu commented Jun 28, 2019

Hello, for the Place_LT, the number of open set is 6600. But, when I run the openset test, I find the number is 43100. Why?

@AmingWu
Copy link
Author

AmingWu commented Jun 28, 2019

Hello, I have understood your setting.

@zhmiao
Copy link
Owner

zhmiao commented Jul 31, 2019

@AmingWu As mentioned here: #17 (comment) we think we have found the problem why the inference results are a little bit lower than reported. We will fix this asap. Thank you very much.

@zhmiao
Copy link
Owner

zhmiao commented Aug 5, 2019

@AmingWu #17 (comment)

@zhmiao zhmiao closed this as completed Aug 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants