Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize each level of BFS in GetMatchingPaths #44310

Merged
merged 3 commits into from Nov 3, 2020

Conversation

firejq
Copy link
Contributor

@firejq firejq commented Oct 25, 2020

This is a PR from JIZHI Team & TaiJi AI platform in Tencent.

I have tried to submit a PR #44269 which parallelize the tf.gfile.Glob in Python, by multi-threaeds against multiple path patterns. This PR is going to optimize the GetMatchingPaths in C++, which brings PBFS into the progress, parallelizes every level of tree by a additional queue. The optimizion can bring considerable performance improvement when the number of files to match reaches a certain level.

Since the content of the change is different, I put this in another PR. @mihaimaruseac Could you please task a look at this? Thanks for your review!

@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label Oct 25, 2020
@google-cla google-cla bot added the cla: yes label Oct 25, 2020
@gbaned gbaned self-assigned this Oct 26, 2020
@gbaned gbaned added the comp:core issues related to core part of tensorflow label Oct 26, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Oct 26, 2020
@firejq
Copy link
Contributor Author

firejq commented Oct 26, 2020

@mihaimaruseac Sorry for some careless compiling problems in the previous commit.. I have fixed them already. Could you please take a look and give any suggestions?

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Oct 28, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Oct 28, 2020
@gbaned gbaned removed the ready to pull PR ready for merge process label Oct 28, 2020
@mihaimaruseac
Copy link
Collaborator

First thing, the CI could crash because the Pr was approved by @SuperSaiyan-God who has no approval rights. @SuperSaiyan-God , please don't spam approve PRs, it goes against the code of conduct.

Second, I see you are using <mutex>, but TF has its own mutex wrapper. Can you use TF's synchronization primitives instead of those from the C++ lib?

Finally, this will also need to be replicated on the filesystem plugins side.

@firejq
Copy link
Contributor Author

firejq commented Oct 28, 2020

@mihaimaruseac Thanks for your advising! As you mentioned, I have turned mutex lib from C++ lib to TF's mutex wrapper in 299062c.

But I’m sorry I didn’t figure out the specific meaning of your last point... What do you mean that needs to be replicated, could you please explain it in more detail? Thanks!

@vnghia
Copy link
Contributor

vnghia commented Oct 29, 2020

Finally, this will also need to be replicated on the filesystem plugins side.

@mihaimaruseac
I think this is not the case. Because all 3 cloud filesystems and posix, we have ops_->get_matching_paths == nullptr and it will fall back on this function internal::GetMatchingPaths

Status ModularFileSystem::GetMatchingPaths(const std::string& pattern,
TransactionToken* token,
std::vector<std::string>* result) {
if (ops_->get_matching_paths == nullptr)
return internal::GetMatchingPaths(this, Env::Default(), pattern, result);

Beside, if there is a filesystem that has its own get_matching_paths, I think it will be quite different from internal::GetMatchingPaths so it is not the case either.

@firejq
Copy link
Contributor Author

firejq commented Nov 2, 2020

@mihaimaruseac Could you please check this again?

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Nov 2, 2020
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Nov 2, 2020
@mihaimaruseac
Copy link
Collaborator

Do you have some benchmarks for the implementation?

@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Nov 2, 2020
@firejq
Copy link
Contributor Author

firejq commented Nov 3, 2020

Do you have some benchmarks for the implementation?

@mihaimaruseac Thanks for leading the progress of this pr.

I haved made benchmarks against two file systems separately, those are local posix file system and remote hdfs file system. For each file system, I used 4 different directories for benchmarking, the first of which is an empty directory to measure the fixed overhead of the operation itself.

Each test is repeated 10 times and the average is taken as the result. The all benchmark results are as follows:

System environment: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 80 threads

Local POSIX file system:

  • d1: an empty directory
  • d2: 7601 directories, 76354 files
  • d3: 64520 directories, 905496 files
  • d4: 113322 directories, 1317540 files
version patterns Duration (ms)
baseline d1 448
baseline d2 4,257
baseline d3 27,527
baseline d4 133,951
opt d1 475
opt d2 3,258
opt d3 14,128
opt d4 74,653

Remote HDFS file system:

  • d1: an empty directory
  • d2: 3 directories, 6000 files
  • d3: 10 directories, 35896 files
  • d4: 10 directories, 63212 files
version patterns Duration (ms)
baseline d1 2,166
baseline d2 955,592
baseline d3 5,236,644
baseline d4 8,102,354
opt d1 2,234
opt d2 104,478
opt d3 648,057
opt d4 1,002,565

P.S. The baseline is the implement of current tensorflow master and the opt is the implement of my parallelized version.

According to the results of the benchmarks, the effect of parallel optimization is quite obvious which approximately
speeded up 8 times in the current machine environment.

@firejq
Copy link
Contributor Author

firejq commented Nov 3, 2020

Hi, @mihaimaruseac @gbaned

There seems to be some compilation errors that have nothing to do with the code I modified, and I would like to ask for advice how I should solve them to make the ci process run through?

@vnghia
Copy link
Contributor

vnghia commented Nov 3, 2020

There seems to be some compilation errors

The errors are not your fault ( They are caused by the some other parts of the baseline ) so I think you do not need to do anything.

@firejq
Copy link
Contributor Author

firejq commented Nov 3, 2020

The errors are not your fault ( They are caused by the some other parts of the baseline ) so I think you do not need to do anything.

@vnvo2409 Okay, Thanks for your reply! : )

@copybara-service copybara-service bot merged commit ce153b2 into tensorflow:master Nov 3, 2020
PR Queue automation moved this from Approved by Reviewer to Merged Nov 3, 2020
@mihaimaruseac
Copy link
Collaborator

This is awesome. Thank you

@firejq firejq deleted the patch-4 branch November 10, 2020 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:core issues related to core part of tensorflow ready to pull PR ready for merge process size:M CL Change Size: Medium
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

6 participants