Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do evaluation with gloo backend, and only on process 0 #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Nanne
Copy link

@Nanne Nanne commented Nov 8, 2020

Thanks for publishing this great code base! In playing around with it we ran into some issues with being unable to evaluate on large databases because the extract_features() function requires (because of the NCCL backend) that data is send via the GPU.

Additionally, by relying on the NCCL backend (which doesn't support gather() or point-to-point communication), the data was actually shared among all processes, which caused us issues even when the data did fit on GPU (but then redundant copies caused CPU memory issues). It also meant that the evaluation was actually performed N times (where N = world_size).

To fix these issues I've rewritten the extract_features() function to use the gloo backend for gathering all the data. This way it doesn't have to go via the GPU, but once extracted the data can stay in CPU memory, it also makes it possible to gather everything on process 0, and then do the evaluation only once.

I didn't thoroughly compare, but I think its now always better to use --sync_gather. Either way both cases should be faster and use less memory than before.

(I didn't want to take too many GPUs with CVPR deadline so close, so I only tested with 2 GPUs, but it should scale to larger world_size.)

@Nanne
Copy link
Author

Nanne commented Nov 8, 2020

Oops, it seems extract_features() is also used in net_vlad_img.py and netvlad_img_sfrs.py, and I think my changes will break the code there.

It looks like the training code can be fairly easily adjusted to use my proposed extract_features(), but I don't have time to do (and test) it now. Seems worthwhile (and easy enough) to make these changes though, and it should also reduce evaluation speed and memory usage during training.

Actually, It also will fail if using PCA during evaluation, but that should probably also only happen on process 0, and not on all processes. Sorry, will update PR to make it work for all places where extract_features is called.

@yxgeee
Copy link
Owner

yxgeee commented Nov 9, 2020

Thanks a lot for your contribution! The large memory request of GPU and CPU seems a significant issue in this codebase. You provide quite a good solution. I will look into your code, and adjust the training part to match it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants