Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage parallel capabilities #142

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

thecotne
Copy link

@thecotne thecotne commented Feb 4, 2022

not sure if this does what it should be doing but tests are passing

fixes #112

@thecotne thecotne changed the title Leverage parallel capabilities fixes #112 Leverage parallel capabilities Feb 4, 2022
@sylvestre
Copy link
Sponsor Contributor

Sweet
Could you please add a BENCHMARKING.md file like https://github.com/uutils/coreutils/blob/main/src/uu/sort/BENCHMARKING.md with some examples?

Did you see some performance wins?

@codecov
Copy link

codecov bot commented Feb 4, 2022

Codecov Report

Merging #142 (836d06c) into main (58070d6) will increase coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #142      +/-   ##
==========================================
+ Coverage   52.98%   53.12%   +0.13%     
==========================================
  Files          24       24              
  Lines        4756     4757       +1     
  Branches     1548     1553       +5     
==========================================
+ Hits         2520     2527       +7     
+ Misses       1687     1675      -12     
- Partials      549      555       +6     
Impacted Files Coverage Δ
src/find/matchers/mod.rs 62.54% <ø> (ø)
src/find/mod.rs 63.33% <100.00%> (+1.53%) ⬆️
src/xargs/mod.rs 77.46% <0.00%> (-2.08%) ⬇️
src/lib.rs 36.36% <0.00%> (+0.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58070d6...836d06c. Read the comment docs.

@sylvestre
Copy link
Sponsor Contributor

Seems that it broke some BFS tests:

2022-02-04T08:16:27.7485162Z /home/runner/work/findutils/findutils/findutils/target/release/find basic \( -name foo -type d -o -name bar -a -type f \) -print \, \! -empty -type f -print 
2022-02-04T08:16:27.7548222Z test_print0
2022-02-04T08:16:27.7552535Z /home/runner/work/findutils/findutils/findutils/target/release/find basic/a basic/b -print0 
2022-02-04T08:16:27.7567667Z thread '<unnamed>' panicked at 'already borrowed: BorrowMutError', src/find/matchers/printer.rs:42:52
2022-02-04T08:16:27.7568120Z note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2022-02-04T08:16:27.7585129Z test_print0 failed!

@thecotne
Copy link
Author

thecotne commented Feb 4, 2022

i ran some benchmarks and this branch is slightly slower then main and both are faster then gnu find

$ hyperfine --warmup 3 'find ~ -name yarn.lock' './find ~ -name yarn.lock' './find-par ~ -name yarn.lock'
Benchmark 1: find ~ -name yarn.lock
  Time (mean ± σ):     798.2 ms ±   9.0 ms    [User: 332.7 ms, System: 465.3 ms]
  Range (min … max):   783.5 ms … 809.0 ms    10 runs

Benchmark 2: ./find ~ -name yarn.lock
  Time (mean ± σ):     608.6 ms ±  11.1 ms    [User: 236.1 ms, System: 372.5 ms]
  Range (min … max):   590.4 ms … 621.0 ms    10 runs

Benchmark 3: ./find-par ~ -name yarn.lock
  Time (mean ± σ):     618.9 ms ±   7.5 ms    [User: 224.5 ms, System: 395.0 ms]
  Range (min … max):   606.6 ms … 629.4 ms    10 runs

Summary
  './find ~ -name yarn.lock' ran
    1.02 ± 0.02 times faster than './find-par ~ -name yarn.lock'
    1.31 ± 0.03 times faster than 'find ~ -name yarn.lock'

$ hyperfine --warmup 3 'find ~ -name *.json' './find ~ -name *.json' './find-par ~ -name *.json'
Benchmark 1: find ~ -name *.json
  Time (mean ± σ):     836.2 ms ±   6.4 ms    [User: 374.1 ms, System: 461.8 ms]
  Range (min … max):   826.4 ms … 847.3 ms    10 runs

Benchmark 2: ./find ~ -name *.json
  Time (mean ± σ):     723.5 ms ±   9.4 ms    [User: 313.4 ms, System: 410.1 ms]
  Range (min … max):   710.3 ms … 735.8 ms    10 runs

Benchmark 3: ./find-par ~ -name *.json
  Time (mean ± σ):     741.5 ms ±   5.4 ms    [User: 339.3 ms, System: 402.8 ms]
  Range (min … max):   735.1 ms … 752.0 ms    10 runs

Summary
  './find ~ -name *.json' ran
    1.02 ± 0.02 times faster than './find-par ~ -name *.json'
    1.16 ± 0.02 times faster than 'find ~ -name *.json'
  • ./find is build from main branch
  • ./find-par is build from this branch
  • and find is gnu find

@sylvestre
Copy link
Sponsor Contributor

slower, even on directories with plenty of files?

@thecotne
Copy link
Author

thecotne commented Feb 4, 2022

730550 files to be exact

@mcharsley
Copy link
Contributor

mcharsley commented Feb 4, 2022 via email

@thecotne
Copy link
Author

thecotne commented Feb 5, 2022

i am testing on SSD

@sylvestre
Copy link
Sponsor Contributor

ok, so, is there an advantage merging this ? (besides the fun of it ? :)

@thecotne
Copy link
Author

thecotne commented Feb 5, 2022

no. in current state this branch is slightly more complex and slightly less efficient version so there is no point in merging it.

if you can give any advice on possible problems/solutions and/or what to investigate etc. to actually improve performance using parallelism i can try to do it. if not we can close this PR i guess

@tavianator
Copy link
Contributor

I believe there's no performance benefit because with walkdir, all the filesystem traversal happens on a single thread. par_iter() then splits up the processing work between multiple threads, but the bottleneck is the traversal which remains single-threaded.

You could try the ignore crate which natively supports parallel traversal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Leverage parallel capabilities
4 participants