Parallelize each level of BFS in `GetMatchingPaths` #44310

firejq · 2020-10-25T16:41:00Z

This is a PR from JIZHI Team & TaiJi AI platform in Tencent.

I have tried to submit a PR #44269 which parallelize the tf.gfile.Glob in Python, by multi-threaeds against multiple path patterns. This PR is going to optimize the GetMatchingPaths in C++, which brings PBFS into the progress, parallelizes every level of tree by a additional queue. The optimizion can bring considerable performance improvement when the number of files to match reaches a certain level.

Since the content of the change is different, I put this in another PR. @mihaimaruseac Could you please task a look at this? Thanks for your review!

firejq · 2020-10-26T12:04:58Z

@mihaimaruseac Sorry for some careless compiling problems in the previous commit.. I have fixed them already. Could you please take a look and give any suggestions?

mihaimaruseac · 2020-10-28T17:56:15Z

First thing, the CI could crash because the Pr was approved by @SuperSaiyan-God who has no approval rights. @SuperSaiyan-God , please don't spam approve PRs, it goes against the code of conduct.

Second, I see you are using <mutex>, but TF has its own mutex wrapper. Can you use TF's synchronization primitives instead of those from the C++ lib?

Finally, this will also need to be replicated on the filesystem plugins side.

firejq · 2020-10-28T20:10:06Z

@mihaimaruseac Thanks for your advising! As you mentioned, I have turned mutex lib from C++ lib to TF's mutex wrapper in 299062c.

But I’m sorry I didn’t figure out the specific meaning of your last point... What do you mean that needs to be replicated, could you please explain it in more detail? Thanks!

vnghia · 2020-10-29T15:39:03Z

Finally, this will also need to be replicated on the filesystem plugins side.

@mihaimaruseac
I think this is not the case. Because all 3 cloud filesystems and posix, we have ops_->get_matching_paths == nullptr and it will fall back on this function internal::GetMatchingPaths

tensorflow/tensorflow/c/experimental/filesystem/modular_filesystem.cc

Lines 191 to 195 in 79cdd95

    
           Status ModularFileSystem::GetMatchingPaths(const std::string& pattern, 
        
                                                      TransactionToken* token, 
        
                                                      std::vector<std::string>* result) { 
        
             if (ops_->get_matching_paths == nullptr) 
        
               return internal::GetMatchingPaths(this, Env::Default(), pattern, result);

Beside, if there is a filesystem that has its own get_matching_paths, I think it will be quite different from internal::GetMatchingPaths so it is not the case either.

firejq · 2020-11-02T02:29:09Z

@mihaimaruseac Could you please check this again?

mihaimaruseac · 2020-11-02T17:45:38Z

Do you have some benchmarks for the implementation?

firejq · 2020-11-03T08:08:57Z

Do you have some benchmarks for the implementation?

@mihaimaruseac Thanks for leading the progress of this pr.

I haved made benchmarks against two file systems separately, those are local posix file system and remote hdfs file system. For each file system, I used 4 different directories for benchmarking, the first of which is an empty directory to measure the fixed overhead of the operation itself.

Each test is repeated 10 times and the average is taken as the result. The all benchmark results are as follows:

System environment: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 80 threads

Local POSIX file system:

d1: an empty directory
d2: 7601 directories, 76354 files
d3: 64520 directories, 905496 files
d4: 113322 directories, 1317540 files

version	patterns	Duration (ms)
baseline	d1	448
baseline	d2	4,257
baseline	d3	27,527
baseline	d4	133,951
opt	d1	475
opt	d2	3,258
opt	d3	14,128
opt	d4	74,653

Remote HDFS file system:

d1: an empty directory
d2: 3 directories, 6000 files
d3: 10 directories, 35896 files
d4: 10 directories, 63212 files

version	patterns	Duration (ms)
baseline	d1	2,166
baseline	d2	955,592
baseline	d3	5,236,644
baseline	d4	8,102,354
opt	d1	2,234
opt	d2	104,478
opt	d3	648,057
opt	d4	1,002,565

P.S. The baseline is the implement of current tensorflow master and the opt is the implement of my parallelized version.

According to the results of the benchmarks, the effect of parallel optimization is quite obvious which approximately
speeded up 8 times in the current machine environment.

firejq · 2020-11-03T11:50:01Z

Hi, @mihaimaruseac @gbaned

There seems to be some compilation errors that have nothing to do with the code I modified, and I would like to ask for advice how I should solve them to make the ci process run through?

vnghia · 2020-11-03T11:56:14Z

There seems to be some compilation errors

The errors are not your fault ( They are caused by the some other parts of the baseline ) so I think you do not need to do anything.

firejq · 2020-11-03T12:16:13Z

The errors are not your fault ( They are caused by the some other parts of the baseline ) so I think you do not need to do anything.

@vnvo2409 Okay, Thanks for your reply! : )

mihaimaruseac · 2020-11-03T19:46:52Z

This is awesome. Thank you

Parallelize each level of BFS in GetMatchingPaths

0f22b28

google-ml-butler bot added the size:M CL Change Size: Medium label Oct 25, 2020

google-cla bot added the cla: yes label Oct 25, 2020

gbaned self-assigned this Oct 26, 2020

gbaned added the comp:core issues related to core part of tensorflow label Oct 26, 2020

gbaned added this to Assigned Reviewer in PR Queue via automation Oct 26, 2020

gbaned requested a review from mihaimaruseac October 26, 2020 11:20

Fix some compiling problems

cf665bb

papaaannn approved these changes Oct 28, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Oct 28, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Oct 28, 2020

gbaned removed the ready to pull PR ready for merge process label Oct 28, 2020

firejq mentioned this pull request Oct 28, 2020

Avoid unnecessary for cycle overhead for tf.gfile.Glob #44268

Closed

Use tf mutex wrapper instead of C++ lib

299062c

mihaimaruseac approved these changes Nov 2, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Nov 2, 2020

PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Nov 2, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Nov 2, 2020

copybara-service bot merged commit ce153b2 into tensorflow:master Nov 3, 2020

PR Queue automation moved this from Approved by Reviewer to Merged Nov 3, 2020

mihaimaruseac mentioned this pull request Nov 3, 2020

Add parallel matching for tf.gfile.Glob #44269

Closed

firejq deleted the patch-4 branch November 10, 2020 06:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize each level of BFS in `GetMatchingPaths` #44310

Parallelize each level of BFS in `GetMatchingPaths` #44310

firejq commented Oct 25, 2020 •

edited

firejq commented Oct 26, 2020

mihaimaruseac commented Oct 28, 2020

firejq commented Oct 28, 2020

vnghia commented Oct 29, 2020

firejq commented Nov 2, 2020

mihaimaruseac commented Nov 2, 2020

firejq commented Nov 3, 2020 •

edited

firejq commented Nov 3, 2020

vnghia commented Nov 3, 2020

firejq commented Nov 3, 2020

mihaimaruseac commented Nov 3, 2020

Parallelize each level of BFS in GetMatchingPaths #44310

Parallelize each level of BFS in GetMatchingPaths #44310

Conversation

firejq commented Oct 25, 2020 • edited

firejq commented Oct 26, 2020

mihaimaruseac commented Oct 28, 2020

firejq commented Oct 28, 2020

vnghia commented Oct 29, 2020

firejq commented Nov 2, 2020

mihaimaruseac commented Nov 2, 2020

firejq commented Nov 3, 2020 • edited

firejq commented Nov 3, 2020

vnghia commented Nov 3, 2020

firejq commented Nov 3, 2020

mihaimaruseac commented Nov 3, 2020

Parallelize each level of BFS in `GetMatchingPaths` #44310

Parallelize each level of BFS in `GetMatchingPaths` #44310

firejq commented Oct 25, 2020 •

edited

firejq commented Nov 3, 2020 •

edited