Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uu-tail performance drop when reading from stdin #3842

Closed
Joining7943 opened this issue Aug 18, 2022 · 7 comments · Fixed by #3845 or #3874
Closed

uu-tail performance drop when reading from stdin #3842

Joining7943 opened this issue Aug 18, 2022 · 7 comments · Fixed by #3845 or #3874
Labels

Comments

@Joining7943
Copy link
Contributor

Joining7943 commented Aug 18, 2022

I'm new to rust and I like this project, so I take parts of it sometimes as reference for my rust learning projects. During tests I noticed that uutils tail version is blazingly fast reading large files (my test file tests/inputs/bigger.txt is ~500MB full of random text with 10_000_000 lines although not very long lines) from disk and is up to 15x faster than the gnu version, but when it comes to reading from stdin the performance drops significantly. Please note, I ran these benchmarks just to get a first impression for relative performance differences between the tested programs. Here's a quick overview over the test file tests/inputs/bigger.txt, tail and uu-tail:

❯ wc --lines --words --bytes --chars --max-line-length tests/inputs/bigger.txt
 10000000 105001050 577418760 577418760       101 tests/inputs/bigger.txt
❯ tail --version
tail (GNU coreutils) 9.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Ian Lance Taylor,
and Jim Meyering.
❯ ~/workspace/external/uutils/coreutils/target/release/tail --version
/home/lenny/workspace/external/uutils/coreutils/target/release/tail 0.0.14
Benchmark of {tail,uu-tail} -n +{10,1000,100000,10000000} tests/inputs/bigger.txt in which uu-tail is faster than gnu's tail. However there is a performance drop of uu-tail running with `-n +10_000_000` at the end of the benchmark test run.
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n +{values} tests/inputs/bigger.txt'
Benchmark 1: tail -n +10 tests/inputs/bigger.txt
  Time (mean ± σ):     313.9 ms ±   3.4 ms    [User: 61.7 ms, System: 251.6 ms]
  Range (min … max):   310.9 ms … 322.5 ms    10 runs

Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n +10 tests/inputs/bigger.txt
  Time (mean ± σ):      20.8 ms ±   0.7 ms    [User: 2.9 ms, System: 19.7 ms]
  Range (min … max):    20.2 ms …  24.5 ms    111 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (23.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 3: tail -n +1000 tests/inputs/bigger.txt
  Time (mean ± σ):     317.6 ms ±   4.7 ms    [User: 62.6 ms, System: 254.4 ms]
  Range (min … max):   312.0 ms … 329.3 ms    10 runs

Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n +1000 tests/inputs/bigger.txt
  Time (mean ± σ):      21.0 ms ±   1.8 ms    [User: 2.5 ms, System: 20.1 ms]
  Range (min … max):    20.1 ms …  34.1 ms    115 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 5: tail -n +100000 tests/inputs/bigger.txt
  Time (mean ± σ):     318.0 ms ±   4.9 ms    [User: 69.6 ms, System: 248.0 ms]
  Range (min … max):   313.2 ms … 330.3 ms    10 runs

Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n +100000 tests/inputs/bigger.txt
  Time (mean ± σ):      25.6 ms ±   0.5 ms    [User: 5.2 ms, System: 21.4 ms]
  Range (min … max):    24.9 ms …  27.8 ms    97 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 7: tail -n +10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     214.3 ms ±   3.5 ms    [User: 125.7 ms, System: 88.3 ms]
  Range (min … max):   212.2 ms … 222.8 ms    13 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n +10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     371.8 ms ±   5.4 ms    [User: 284.7 ms, System: 86.6 ms]
  Range (min … max):   366.4 ms … 379.4 ms    10 runs

Summary
  '~/workspace/external/uutils/coreutils/target/release/tail -n +10 tests/inputs/bigger.txt' ran
    1.01 ± 0.09 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +1000 tests/inputs/bigger.txt'
    1.23 ± 0.05 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +100000 tests/inputs/bigger.txt'
   10.28 ± 0.37 times faster than 'tail -n +10000000 tests/inputs/bigger.txt'
   15.06 ± 0.51 times faster than 'tail -n +10 tests/inputs/bigger.txt'
   15.24 ± 0.54 times faster than 'tail -n +1000 tests/inputs/bigger.txt'
   15.26 ± 0.54 times faster than 'tail -n +100000 tests/inputs/bigger.txt'
   17.84 ± 0.63 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +10000000 tests/inputs/bigger.txt'

Now with -n -{values} instead of -n +{values} the gnu version and uu-tail version are pretty close.

Benchmark of {tail,uu-tail} -n -{10,1000,100000,10000000} tests/inputs/bigger.txt
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n -{values} tests/inputs/bigger.txt'
Benchmark 1: tail -n -10 tests/inputs/bigger.txt
  Time (mean ± σ):       0.4 ms ±   0.3 ms    [User: 0.7 ms, System: 0.5 ms]
  Range (min … max):     0.0 ms …   3.1 ms    803 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n -10 tests/inputs/bigger.txt
  Time (mean ± σ):       0.5 ms ±   0.5 ms    [User: 0.7 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   4.5 ms    714 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

Benchmark 3: tail -n -1000 tests/inputs/bigger.txt
  Time (mean ± σ):       0.5 ms ±   0.3 ms    [User: 0.7 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   3.0 ms    803 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n -1000 tests/inputs/bigger.txt
  Time (mean ± σ):       0.7 ms ±   0.4 ms    [User: 0.8 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   4.9 ms    803 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 5: tail -n -100000 tests/inputs/bigger.txt
  Time (mean ± σ):       5.5 ms ±   0.7 ms    [User: 3.5 ms, System: 3.5 ms]
  Range (min … max):     4.9 ms …   9.7 ms    295 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n -100000 tests/inputs/bigger.txt
  Time (mean ± σ):       7.0 ms ±   0.6 ms    [User: 5.9 ms, System: 2.0 ms]
  Range (min … max):     5.7 ms …  10.6 ms    275 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 7: tail -n -10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     546.3 ms ±   6.3 ms    [User: 207.4 ms, System: 338.1 ms]
  Range (min … max):   537.4 ms … 554.5 ms    10 runs

Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     499.1 ms ±   4.2 ms    [User: 416.0 ms, System: 82.4 ms]
  Range (min … max):   495.6 ms … 506.0 ms    10 runs

Summary
  'tail -n -10 tests/inputs/bigger.txt' ran
    1.08 ± 1.03 times faster than 'tail -n -1000 tests/inputs/bigger.txt'
    1.16 ± 1.33 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10 tests/inputs/bigger.txt'
    1.61 ± 1.45 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -1000 tests/inputs/bigger.txt'
   13.06 ± 8.12 times faster than 'tail -n -100000 tests/inputs/bigger.txt'
   16.74 ± 10.30 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -100000 tests/inputs/bigger.txt'
 1185.55 ± 722.84 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 tests/inputs/bigger.txt'
 1297.59 ± 791.22 times faster than 'tail -n -10000000 tests/inputs/bigger.txt'

But when it comes to reading from stdin uu-tail is significantly slower than the gnu version

Benchmark of {tail,uu-tail} -n -{10,1000,100000,10000000} - < tests/inputs/bigger.txt
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n -{values} - < tests/inputs/bigger.txt'
Benchmark 1: tail -n -10 - < tests/inputs/bigger.txt
  Time (mean ± σ):       0.1 ms ±   0.2 ms    [User: 0.5 ms, System: 0.2 ms]
  Range (min … max):     0.0 ms …   1.7 ms    647 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: The first benchmarking run for this command was significantly slower than the rest (0.8 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n -10 - < tests/inputs/bigger.txt
  Time (mean ± σ):     611.1 ms ±   8.4 ms    [User: 500.8 ms, System: 109.8 ms]
  Range (min … max):   596.3 ms … 621.8 ms    10 runs

Benchmark 3: tail -n -1000 - < tests/inputs/bigger.txt
  Time (mean ± σ):       0.1 ms ±   0.1 ms    [User: 0.5 ms, System: 0.2 ms]
  Range (min … max):     0.0 ms …   0.8 ms    779 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: The first benchmarking run for this command was significantly slower than the rest (0.0 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n -1000 - < tests/inputs/bigger.txt
  Time (mean ± σ):     671.2 ms ±   9.1 ms    [User: 559.4 ms, System: 111.3 ms]
  Range (min … max):   662.4 ms … 692.5 ms    10 runs

Benchmark 5: tail -n -100000 - < tests/inputs/bigger.txt
  Time (mean ± σ):       5.6 ms ±   0.3 ms    [User: 3.7 ms, System: 2.3 ms]
  Range (min … max):     5.1 ms …   6.6 ms    287 runs

Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n -100000 - < tests/inputs/bigger.txt
  Time (mean ± σ):      1.196 s ±  0.013 s    [User: 1.011 s, System: 0.184 s]
  Range (min … max):    1.184 s …  1.226 s    10 runs

Benchmark 7: tail -n -10000000 - < tests/inputs/bigger.txt
  Time (mean ± σ):     552.2 ms ±   6.9 ms    [User: 223.1 ms, System: 328.0 ms]
  Range (min … max):   544.2 ms … 566.8 ms    10 runs

Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 - < tests/inputs/bigger.txt
  Time (mean ± σ):      8.179 s ±  0.160 s    [User: 3.442 s, System: 4.726 s]
  Range (min … max):    8.025 s …  8.533 s    10 runs

Summary
  'tail -n -1000 - < tests/inputs/bigger.txt' ran
    1.97 ± 4.73 times faster than 'tail -n -10 - < tests/inputs/bigger.txt'
   83.11 ± 140.53 times faster than 'tail -n -100000 - < tests/inputs/bigger.txt'
 8259.42 ± 13958.62 times faster than 'tail -n -10000000 - < tests/inputs/bigger.txt'
 9140.98 ± 15448.56 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10 - < tests/inputs/bigger.txt'
10040.23 ± 16968.30 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -1000 - < tests/inputs/bigger.txt'
17885.66 ± 30226.98 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -100000 - < tests/inputs/bigger.txt'
122339.26 ± 206764.49 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 - < tests/inputs/bigger.txt'

I hope these benchmarks help figuring out the problem and I haven't done anything wrong on my side.

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Aug 18, 2022

Thanks! These are excellent benchmarks. After looking into it, I'm not surprised it's slow and I'm curious how GNU does it so quickly.

First, a (big) difference between files and stdin is to be expected, because for a file, we can just seek to the end of the file and find the data to display from there. For stdin, we cannot seek so we have to read through the data while remembering the last count lines we have seen. When we then get to the end we display those last 10 lines. We currently do this using a ringbuffer in this line:

Ok(RingBuffer::from_iter(iter.map(|r| r.unwrap()), count).data)

With the ringbuffer defined here: https://github.com/uutils/coreutils/blob/main/src/uucore/src/lib/features/ringbuffer.rs

This is of course not particularly fast. One thing we might try is to first scan in bigger blocks than lines to get a buffer that has at least count lines and then print the last count lines from that buffer. So, we keep track of a buffers: VecDeque<(String, usize)> where the total of the usizes must be larger than count and we can remove from the start of buffers if count < total - buffers[0].1.

@Joining7943
Copy link
Contributor Author

@tertsdiepraam glad I can help out :)

jhscheer added a commit to jhscheer/coreutils that referenced this issue Aug 18, 2022
This fixes a bug where calling `tail - < file.txt` would result
in invoking `unbounded_tail()`.
However, it is a stdin redirect to a seekable regular file and
therefore `bounded_tail` should be invoked as if `tail file.txt` had
been called.
@jhscheer
Copy link
Contributor

HI, thanks for doing the benchmarks and reporting this issue.

What @tertsdiepraam wrote is of course true, however in this case you uncovered a bug in our tail implementation. The problem is not how we handle tailing stdin but with the stdin redirect from the shell.
See the fix for more details.

With the fix, the benchmarks look more resonable:

Summary
  'tail -n -10000 - < 505MB.txt' ran
    1.36 ± 0.25 times faster than 'target/release/tail -n -10000 - < 505MB.txt'
    5.16 ± 0.93 times faster than 'tail -n -100000 - < 505MB.txt'
    5.98 ± 1.19 times faster than 'target/release/tail -n -100000 - < 505MB.txt'
   32.12 ± 5.87 times faster than 'tail -n -1000000 - < 505MB.txt'
   37.17 ± 7.74 times faster than 'target/release/tail -n -1000000 - < 505MB.txt'

jhscheer added a commit to jhscheer/coreutils that referenced this issue Aug 18, 2022
This fixes a bug where calling `tail - < file.txt` would result
in invoking `unbounded_tail()`.
However, it is a stdin redirect to a seekable regular file and
therefore `bounded_tail` should be invoked as if `tail file.txt` had
been called.
@tertsdiepraam
Copy link
Member

Excellent!

If I understand correctly, this means that we're still slow when actually reading from stdin, right? For example, in a case like this:

cat 505MB.txt | tail -n -10000

As a side note, do you know why we're still slower in your benchmark even though we're faster in @Joining7943's first benchmark? It seems to me that there theoretically shouldn't be a difference between those two, since it's both just reading from a file.

@jhscheer
Copy link
Contributor

If I understand correctly, this means that we're still slow when actually reading from stdin, right? For example, in a case like this:

cat 505MB.txt | tail -n -10000

Yes:

Summary
  'cat 505MB.txt | tail -n -10000' ran
    1.25 ± 0.07 times faster than 'cat 505MB.txt | tail -n -100000'
    1.58 ± 0.07 times faster than 'cat 505MB.txt | tail -n -1000000'
    1.80 ± 0.08 times faster than 'cat 505MB.txt | target/release/tail -n -10000'
    2.11 ± 0.09 times faster than 'cat 505MB.txt | target/release/tail -n -100000'
    4.52 ± 0.17 times faster than 'cat 505MB.txt | target/release/tail -n -1000000'

As a side note, do you know why we're still slower in your benchmark even though we're faster in @Joining7943's first benchmark? It seems to me that there theoretically shouldn't be a difference between those two, since it's both just reading from a file.

I don't know, but I agree, there shouldn't be a difference.
For reference, this is the benchmark on my system:

Summary
  'tail -n -10000 505MB.txt' ran
    1.30 ± 0.43 times faster than 'target/release/tail -n -10000 505MB.txt'
    5.66 ± 0.71 times faster than 'tail -n -100000 505MB.txt'
    6.80 ± 1.07 times faster than 'target/release/tail -n -100000 505MB.txt'
   40.35 ± 3.96 times faster than 'target/release/tail -n -1000000 505MB.txt'
   41.82 ± 5.63 times faster than 'tail -n -1000000 505MB.txt'

The testfile was created with:

tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w 100 | head -n 5000000 > 505MB.txt

@Joining7943
Copy link
Contributor Author

I don't know, but I agree, there shouldn't be a difference. For reference, this is the benchmark on my system:

@jhscheer I think @tertsdiepraam meant my first benchmark

Benchmark of {tail,uu-tail} -n +{10,1000,100000,10000000} tests/inputs/bigger.txt in which uu-tail is faster than gnu's tail. However there is a performance drop of uu-tail running with -n +10_000_000 at the end of the benchmark test run.

with -n +{values} tests/inputs/bigger.txt (not with -n -{values}) when not reading from stdin.

To be sure, I reran the above benchmark with the file 505MB.txt producing approximately the same relative results.

benchmark with tail,uutail -n +{values} tests/inputs/505MB.txt
❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/uutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n +{values} tests/inputs/505MB.txt'
Benchmark 1: tail -n +10 tests/inputs/505MB.txt
  Time (mean ± σ):     277.7 ms ±   2.0 ms    [User: 59.1 ms, System: 218.1 ms]
  Range (min … max):   275.2 ms … 281.0 ms    10 runs

Benchmark 2: ~/workspace/external/uutils/uutils/target/release/tail -n +10 tests/inputs/505MB.txt
  Time (mean ± σ):      18.1 ms ±   0.5 ms    [User: 3.7 ms, System: 16.0 ms]
  Range (min … max):    17.8 ms …  22.2 ms    115 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (22.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 3: tail -n +1000 tests/inputs/505MB.txt
  Time (mean ± σ):     278.2 ms ±   1.3 ms    [User: 54.7 ms, System: 223.0 ms]
  Range (min … max):   276.2 ms … 280.3 ms    10 runs

Benchmark 4: ~/workspace/external/uutils/uutils/target/release/tail -n +1000 tests/inputs/505MB.txt
  Time (mean ± σ):      19.4 ms ±   1.9 ms    [User: 2.6 ms, System: 18.3 ms]
  Range (min … max):    17.9 ms …  34.8 ms    130 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 5: tail -n +100000 tests/inputs/505MB.txt
  Time (mean ± σ):     272.4 ms ±   1.5 ms    [User: 76.5 ms, System: 195.5 ms]
  Range (min … max):   270.5 ms … 275.3 ms    10 runs

Benchmark 6: ~/workspace/external/uutils/uutils/target/release/tail -n +100000 tests/inputs/505MB.txt
  Time (mean ± σ):      22.9 ms ±   0.3 ms    [User: 4.5 ms, System: 19.6 ms]
  Range (min … max):    22.1 ms …  24.7 ms    112 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 7: tail -n +10000000 tests/inputs/505MB.txt
  Time (mean ± σ):     124.5 ms ±   3.5 ms    [User: 51.7 ms, System: 72.6 ms]
  Range (min … max):   122.4 ms … 132.5 ms    22 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (131.5 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 8: ~/workspace/external/uutils/uutils/target/release/tail -n +10000000 tests/inputs/505MB.txt
  Time (mean ± σ):     188.6 ms ±   3.8 ms    [User: 113.5 ms, System: 74.7 ms]
  Range (min … max):   186.1 ms … 197.2 ms    15 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (195.9 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Summary
  '~/workspace/external/uutils/uutils/target/release/tail -n +10 tests/inputs/505MB.txt' ran
    1.07 ± 0.11 times faster than '~/workspace/external/uutils/uutils/target/release/tail -n +1000 tests/inputs/505MB.txt'
    1.27 ± 0.04 times faster than '~/workspace/external/uutils/uutils/target/release/tail -n +100000 tests/inputs/505MB.txt'
    6.88 ± 0.26 times faster than 'tail -n +10000000 tests/inputs/505MB.txt'
   10.41 ± 0.34 times faster than '~/workspace/external/uutils/uutils/target/release/tail -n +10000000 tests/inputs/505MB.txt'
   15.04 ± 0.40 times faster than 'tail -n +100000 tests/inputs/505MB.txt'
   15.34 ± 0.41 times faster than 'tail -n +10 tests/inputs/505MB.txt'
   15.36 ± 0.40 times faster than 'tail -n +1000 tests/inputs/505MB.txt'

Additionally, I also checked that both tail and uutail produce the same output without errors

for value in 10 1000 100000 10000000; do diff --suppress-common-lines -y <(tail -n +${value} tests/inputs/505MB.txt) <(~/workspace/external/uutils/uutils/target/release/tail -n +${value} tests/inputs/505MB.txt) || return 1; doneecho $?
0

to be sure noone is cheating.

If I understand correctly, this means that we're still slow when actually reading from stdin, right? For example, in a case like this:

cat 505MB.txt | tail -n -10000

Would you mind giving me a shot, trying to write a better performing solution for piped stdin if beginning

if beginning {

is false, replacing the ringbuffer, like you suggested?

@tertsdiepraam
Copy link
Member

@jhscheer I think @tertsdiepraam meant my first benchmark

I think I made a mistake there. The second benchmark made more sense. Anyway it's still an interesting difference.

Would you mind giving me a shot

Please do!

Joining7943 pushed a commit to Joining7943/uutil-coreutils that referenced this issue Aug 23, 2022
Rewrite handling of stdin when it is piped and read input in chunks.

Fixes uutils#3842
Joining7943 added a commit to Joining7943/uutil-coreutils that referenced this issue Aug 23, 2022
Rewrite handling of stdin when it is piped and read input in chunks.

Fixes uutils#3842
Joining7943 added a commit to Joining7943/uutil-coreutils that referenced this issue Aug 24, 2022
Rewrite handling of stdin when it is piped and read input in chunks.

Fixes uutils#3842
sylvestre pushed a commit to jhscheer/coreutils that referenced this issue Sep 5, 2022
This fixes a bug where calling `tail - < file.txt` would result
in invoking `unbounded_tail()`.
However, it is a stdin redirect to a seekable regular file and
therefore `bounded_tail` should be invoked as if `tail file.txt` had
been called.
tertsdiepraam added a commit that referenced this issue Sep 7, 2022
sylvestre pushed a commit that referenced this issue Sep 9, 2022
Rewrite handling of stdin when it is piped and read input in chunks.

Fixes #3842
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants