Do evaluation with gloo backend, and only on process 0 #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for publishing this great code base! In playing around with it we ran into some issues with being unable to evaluate on large databases because the
extract_features()
function requires (because of the NCCL backend) that data is send via the GPU.Additionally, by relying on the NCCL backend (which doesn't support
gather()
or point-to-point communication), the data was actually shared among all processes, which caused us issues even when the data did fit on GPU (but then redundant copies caused CPU memory issues). It also meant that the evaluation was actually performed N times (where N = world_size).To fix these issues I've rewritten the
extract_features()
function to use the gloo backend for gathering all the data. This way it doesn't have to go via the GPU, but once extracted the data can stay in CPU memory, it also makes it possible to gather everything on process 0, and then do the evaluation only once.I didn't thoroughly compare, but I think its now always better to use --sync_gather. Either way both cases should be faster and use less memory than before.
(I didn't want to take too many GPUs with CVPR deadline so close, so I only tested with 2 GPUs, but it should scale to larger world_size.)