uu-tail performance drop when reading from stdin #3842

Joining7943 · 2022-08-18T11:33:37Z

I'm new to rust and I like this project, so I take parts of it sometimes as reference for my rust learning projects. During tests I noticed that uutils tail version is blazingly fast reading large files (my test file tests/inputs/bigger.txt is ~500MB full of random text with 10_000_000 lines although not very long lines) from disk and is up to 15x faster than the gnu version, but when it comes to reading from stdin the performance drops significantly. Please note, I ran these benchmarks just to get a first impression for relative performance differences between the tested programs. Here's a quick overview over the test file tests/inputs/bigger.txt, tail and uu-tail:

❯ wc --lines --words --bytes --chars --max-line-length tests/inputs/bigger.txt
 10000000 105001050 577418760 577418760       101 tests/inputs/bigger.txt

❯ tail --version
tail (GNU coreutils) 9.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Ian Lance Taylor,
and Jim Meyering.

❯ ~/workspace/external/uutils/coreutils/target/release/tail --version
/home/lenny/workspace/external/uutils/coreutils/target/release/tail 0.0.14

Benchmark of {tail,uu-tail} -n +{10,1000,100000,10000000} tests/inputs/bigger.txt in which uu-tail is faster than gnu's tail. However there is a performance drop of uu-tail running with `-n +10_000_000` at the end of the benchmark test run.

❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n +{values} tests/inputs/bigger.txt'
Benchmark 1: tail -n +10 tests/inputs/bigger.txt
  Time (mean ± σ):     313.9 ms ±   3.4 ms    [User: 61.7 ms, System: 251.6 ms]
  Range (min … max):   310.9 ms … 322.5 ms    10 runs

Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n +10 tests/inputs/bigger.txt
  Time (mean ± σ):      20.8 ms ±   0.7 ms    [User: 2.9 ms, System: 19.7 ms]
  Range (min … max):    20.2 ms …  24.5 ms    111 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (23.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 3: tail -n +1000 tests/inputs/bigger.txt
  Time (mean ± σ):     317.6 ms ±   4.7 ms    [User: 62.6 ms, System: 254.4 ms]
  Range (min … max):   312.0 ms … 329.3 ms    10 runs

Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n +1000 tests/inputs/bigger.txt
  Time (mean ± σ):      21.0 ms ±   1.8 ms    [User: 2.5 ms, System: 20.1 ms]
  Range (min … max):    20.1 ms …  34.1 ms    115 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 5: tail -n +100000 tests/inputs/bigger.txt
  Time (mean ± σ):     318.0 ms ±   4.9 ms    [User: 69.6 ms, System: 248.0 ms]
  Range (min … max):   313.2 ms … 330.3 ms    10 runs

Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n +100000 tests/inputs/bigger.txt
  Time (mean ± σ):      25.6 ms ±   0.5 ms    [User: 5.2 ms, System: 21.4 ms]
  Range (min … max):    24.9 ms …  27.8 ms    97 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 7: tail -n +10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     214.3 ms ±   3.5 ms    [User: 125.7 ms, System: 88.3 ms]
  Range (min … max):   212.2 ms … 222.8 ms    13 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n +10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     371.8 ms ±   5.4 ms    [User: 284.7 ms, System: 86.6 ms]
  Range (min … max):   366.4 ms … 379.4 ms    10 runs

Summary
  '~/workspace/external/uutils/coreutils/target/release/tail -n +10 tests/inputs/bigger.txt' ran
    1.01 ± 0.09 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +1000 tests/inputs/bigger.txt'
    1.23 ± 0.05 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +100000 tests/inputs/bigger.txt'
   10.28 ± 0.37 times faster than 'tail -n +10000000 tests/inputs/bigger.txt'
   15.06 ± 0.51 times faster than 'tail -n +10 tests/inputs/bigger.txt'
   15.24 ± 0.54 times faster than 'tail -n +1000 tests/inputs/bigger.txt'
   15.26 ± 0.54 times faster than 'tail -n +100000 tests/inputs/bigger.txt'
   17.84 ± 0.63 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n +10000000 tests/inputs/bigger.txt'

Now with -n -{values} instead of -n +{values} the gnu version and uu-tail version are pretty close.

Benchmark of {tail,uu-tail} -n -{10,1000,100000,10000000} tests/inputs/bigger.txt

❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n -{values} tests/inputs/bigger.txt'
Benchmark 1: tail -n -10 tests/inputs/bigger.txt
  Time (mean ± σ):       0.4 ms ±   0.3 ms    [User: 0.7 ms, System: 0.5 ms]
  Range (min … max):     0.0 ms …   3.1 ms    803 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n -10 tests/inputs/bigger.txt
  Time (mean ± σ):       0.5 ms ±   0.5 ms    [User: 0.7 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   4.5 ms    714 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

Benchmark 3: tail -n -1000 tests/inputs/bigger.txt
  Time (mean ± σ):       0.5 ms ±   0.3 ms    [User: 0.7 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   3.0 ms    803 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n -1000 tests/inputs/bigger.txt
  Time (mean ± σ):       0.7 ms ±   0.4 ms    [User: 0.8 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   4.9 ms    803 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 5: tail -n -100000 tests/inputs/bigger.txt
  Time (mean ± σ):       5.5 ms ±   0.7 ms    [User: 3.5 ms, System: 3.5 ms]
  Range (min … max):     4.9 ms …   9.7 ms    295 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n -100000 tests/inputs/bigger.txt
  Time (mean ± σ):       7.0 ms ±   0.6 ms    [User: 5.9 ms, System: 2.0 ms]
  Range (min … max):     5.7 ms …  10.6 ms    275 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 7: tail -n -10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     546.3 ms ±   6.3 ms    [User: 207.4 ms, System: 338.1 ms]
  Range (min … max):   537.4 ms … 554.5 ms    10 runs

Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 tests/inputs/bigger.txt
  Time (mean ± σ):     499.1 ms ±   4.2 ms    [User: 416.0 ms, System: 82.4 ms]
  Range (min … max):   495.6 ms … 506.0 ms    10 runs

Summary
  'tail -n -10 tests/inputs/bigger.txt' ran
    1.08 ± 1.03 times faster than 'tail -n -1000 tests/inputs/bigger.txt'
    1.16 ± 1.33 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10 tests/inputs/bigger.txt'
    1.61 ± 1.45 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -1000 tests/inputs/bigger.txt'
   13.06 ± 8.12 times faster than 'tail -n -100000 tests/inputs/bigger.txt'
   16.74 ± 10.30 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -100000 tests/inputs/bigger.txt'
 1185.55 ± 722.84 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 tests/inputs/bigger.txt'
 1297.59 ± 791.22 times faster than 'tail -n -10000000 tests/inputs/bigger.txt'

But when it comes to reading from stdin uu-tail is significantly slower than the gnu version

Benchmark of {tail,uu-tail} -n -{10,1000,100000,10000000} - < tests/inputs/bigger.txt

❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/coreutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n -{values} - < tests/inputs/bigger.txt'
Benchmark 1: tail -n -10 - < tests/inputs/bigger.txt
  Time (mean ± σ):       0.1 ms ±   0.2 ms    [User: 0.5 ms, System: 0.2 ms]
  Range (min … max):     0.0 ms …   1.7 ms    647 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: The first benchmarking run for this command was significantly slower than the rest (0.8 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: ~/workspace/external/uutils/coreutils/target/release/tail -n -10 - < tests/inputs/bigger.txt
  Time (mean ± σ):     611.1 ms ±   8.4 ms    [User: 500.8 ms, System: 109.8 ms]
  Range (min … max):   596.3 ms … 621.8 ms    10 runs

Benchmark 3: tail -n -1000 - < tests/inputs/bigger.txt
  Time (mean ± σ):       0.1 ms ±   0.1 ms    [User: 0.5 ms, System: 0.2 ms]
  Range (min … max):     0.0 ms …   0.8 ms    779 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: The first benchmarking run for this command was significantly slower than the rest (0.0 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 4: ~/workspace/external/uutils/coreutils/target/release/tail -n -1000 - < tests/inputs/bigger.txt
  Time (mean ± σ):     671.2 ms ±   9.1 ms    [User: 559.4 ms, System: 111.3 ms]
  Range (min … max):   662.4 ms … 692.5 ms    10 runs

Benchmark 5: tail -n -100000 - < tests/inputs/bigger.txt
  Time (mean ± σ):       5.6 ms ±   0.3 ms    [User: 3.7 ms, System: 2.3 ms]
  Range (min … max):     5.1 ms …   6.6 ms    287 runs

Benchmark 6: ~/workspace/external/uutils/coreutils/target/release/tail -n -100000 - < tests/inputs/bigger.txt
  Time (mean ± σ):      1.196 s ±  0.013 s    [User: 1.011 s, System: 0.184 s]
  Range (min … max):    1.184 s …  1.226 s    10 runs

Benchmark 7: tail -n -10000000 - < tests/inputs/bigger.txt
  Time (mean ± σ):     552.2 ms ±   6.9 ms    [User: 223.1 ms, System: 328.0 ms]
  Range (min … max):   544.2 ms … 566.8 ms    10 runs

Benchmark 8: ~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 - < tests/inputs/bigger.txt
  Time (mean ± σ):      8.179 s ±  0.160 s    [User: 3.442 s, System: 4.726 s]
  Range (min … max):    8.025 s …  8.533 s    10 runs

Summary
  'tail -n -1000 - < tests/inputs/bigger.txt' ran
    1.97 ± 4.73 times faster than 'tail -n -10 - < tests/inputs/bigger.txt'
   83.11 ± 140.53 times faster than 'tail -n -100000 - < tests/inputs/bigger.txt'
 8259.42 ± 13958.62 times faster than 'tail -n -10000000 - < tests/inputs/bigger.txt'
 9140.98 ± 15448.56 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10 - < tests/inputs/bigger.txt'
10040.23 ± 16968.30 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -1000 - < tests/inputs/bigger.txt'
17885.66 ± 30226.98 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -100000 - < tests/inputs/bigger.txt'
122339.26 ± 206764.49 times faster than '~/workspace/external/uutils/coreutils/target/release/tail -n -10000000 - < tests/inputs/bigger.txt'

I hope these benchmarks help figuring out the problem and I haven't done anything wrong on my side.

The text was updated successfully, but these errors were encountered:

tertsdiepraam · 2022-08-18T13:38:22Z

Thanks! These are excellent benchmarks. After looking into it, I'm not surprised it's slow and I'm curious how GNU does it so quickly.

First, a (big) difference between files and stdin is to be expected, because for a file, we can just seek to the end of the file and find the data to display from there. For stdin, we cannot seek so we have to read through the data while remembering the last count lines we have seen. When we then get to the end we display those last 10 lines. We currently do this using a ringbuffer in this line:

coreutils/src/uu/tail/src/tail.rs

Line 1464 in 88261f3

Ok(RingBuffer::from_iter(iter.map(|r| r.unwrap()), count).data)

With the ringbuffer defined here: https://github.com/uutils/coreutils/blob/main/src/uucore/src/lib/features/ringbuffer.rs

This is of course not particularly fast. One thing we might try is to first scan in bigger blocks than lines to get a buffer that has at least count lines and then print the last count lines from that buffer. So, we keep track of a buffers: VecDeque<(String, usize)> where the total of the usizes must be larger than count and we can remove from the start of buffers if count < total - buffers[0].1.

Joining7943 · 2022-08-18T15:09:12Z

@tertsdiepraam glad I can help out :)

This fixes a bug where calling `tail - < file.txt` would result in invoking `unbounded_tail()`. However, it is a stdin redirect to a seekable regular file and therefore `bounded_tail` should be invoked as if `tail file.txt` had been called.

jhscheer · 2022-08-18T20:23:59Z

HI, thanks for doing the benchmarks and reporting this issue.

What @tertsdiepraam wrote is of course true, however in this case you uncovered a bug in our tail implementation. The problem is not how we handle tailing stdin but with the stdin redirect from the shell.
See the fix for more details.

With the fix, the benchmarks look more resonable:

Summary
  'tail -n -10000 - < 505MB.txt' ran
    1.36 ± 0.25 times faster than 'target/release/tail -n -10000 - < 505MB.txt'
    5.16 ± 0.93 times faster than 'tail -n -100000 - < 505MB.txt'
    5.98 ± 1.19 times faster than 'target/release/tail -n -100000 - < 505MB.txt'
   32.12 ± 5.87 times faster than 'tail -n -1000000 - < 505MB.txt'
   37.17 ± 7.74 times faster than 'target/release/tail -n -1000000 - < 505MB.txt'

This fixes a bug where calling `tail - < file.txt` would result in invoking `unbounded_tail()`. However, it is a stdin redirect to a seekable regular file and therefore `bounded_tail` should be invoked as if `tail file.txt` had been called.

tertsdiepraam · 2022-08-18T21:36:24Z

Excellent!

If I understand correctly, this means that we're still slow when actually reading from stdin, right? For example, in a case like this:

cat 505MB.txt | tail -n -10000

As a side note, do you know why we're still slower in your benchmark even though we're faster in @Joining7943's first benchmark? It seems to me that there theoretically shouldn't be a difference between those two, since it's both just reading from a file.

jhscheer · 2022-08-19T13:16:55Z

If I understand correctly, this means that we're still slow when actually reading from stdin, right? For example, in a case like this:
cat 505MB.txt | tail -n -10000

Yes:

Summary
  'cat 505MB.txt | tail -n -10000' ran
    1.25 ± 0.07 times faster than 'cat 505MB.txt | tail -n -100000'
    1.58 ± 0.07 times faster than 'cat 505MB.txt | tail -n -1000000'
    1.80 ± 0.08 times faster than 'cat 505MB.txt | target/release/tail -n -10000'
    2.11 ± 0.09 times faster than 'cat 505MB.txt | target/release/tail -n -100000'
    4.52 ± 0.17 times faster than 'cat 505MB.txt | target/release/tail -n -1000000'

As a side note, do you know why we're still slower in your benchmark even though we're faster in @Joining7943's first benchmark? It seems to me that there theoretically shouldn't be a difference between those two, since it's both just reading from a file.

I don't know, but I agree, there shouldn't be a difference.
For reference, this is the benchmark on my system:

Summary
  'tail -n -10000 505MB.txt' ran
    1.30 ± 0.43 times faster than 'target/release/tail -n -10000 505MB.txt'
    5.66 ± 0.71 times faster than 'tail -n -100000 505MB.txt'
    6.80 ± 1.07 times faster than 'target/release/tail -n -100000 505MB.txt'
   40.35 ± 3.96 times faster than 'target/release/tail -n -1000000 505MB.txt'
   41.82 ± 5.63 times faster than 'tail -n -1000000 505MB.txt'

The testfile was created with:

tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w 100 | head -n 5000000 > 505MB.txt

Joining7943 · 2022-08-21T15:06:25Z

I don't know, but I agree, there shouldn't be a difference. For reference, this is the benchmark on my system:

@jhscheer I think @tertsdiepraam meant my first benchmark

Benchmark of {tail,uu-tail} -n +{10,1000,100000,10000000} tests/inputs/bigger.txt in which uu-tail is faster than gnu's tail. However there is a performance drop of uu-tail running with -n +10_000_000 at the end of the benchmark test run.

with -n +{values} tests/inputs/bigger.txt (not with -n -{values}) when not reading from stdin.

To be sure, I reran the above benchmark with the file 505MB.txt producing approximately the same relative results.

benchmark with tail,uutail -n +{values} tests/inputs/505MB.txt

❯ hyperfine --warmup 3 --output pipe -L prg tail,~/workspace/external/uutils/uutils/target/release/tail -L values 10,1000,100000,10000000 '{prg} -n +{values} tests/inputs/505MB.txt'
Benchmark 1: tail -n +10 tests/inputs/505MB.txt
  Time (mean ± σ):     277.7 ms ±   2.0 ms    [User: 59.1 ms, System: 218.1 ms]
  Range (min … max):   275.2 ms … 281.0 ms    10 runs

Benchmark 2: ~/workspace/external/uutils/uutils/target/release/tail -n +10 tests/inputs/505MB.txt
  Time (mean ± σ):      18.1 ms ±   0.5 ms    [User: 3.7 ms, System: 16.0 ms]
  Range (min … max):    17.8 ms …  22.2 ms    115 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (22.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 3: tail -n +1000 tests/inputs/505MB.txt
  Time (mean ± σ):     278.2 ms ±   1.3 ms    [User: 54.7 ms, System: 223.0 ms]
  Range (min … max):   276.2 ms … 280.3 ms    10 runs

Benchmark 4: ~/workspace/external/uutils/uutils/target/release/tail -n +1000 tests/inputs/505MB.txt
  Time (mean ± σ):      19.4 ms ±   1.9 ms    [User: 2.6 ms, System: 18.3 ms]
  Range (min … max):    17.9 ms …  34.8 ms    130 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 5: tail -n +100000 tests/inputs/505MB.txt
  Time (mean ± σ):     272.4 ms ±   1.5 ms    [User: 76.5 ms, System: 195.5 ms]
  Range (min … max):   270.5 ms … 275.3 ms    10 runs

Benchmark 6: ~/workspace/external/uutils/uutils/target/release/tail -n +100000 tests/inputs/505MB.txt
  Time (mean ± σ):      22.9 ms ±   0.3 ms    [User: 4.5 ms, System: 19.6 ms]
  Range (min … max):    22.1 ms …  24.7 ms    112 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 7: tail -n +10000000 tests/inputs/505MB.txt
  Time (mean ± σ):     124.5 ms ±   3.5 ms    [User: 51.7 ms, System: 72.6 ms]
  Range (min … max):   122.4 ms … 132.5 ms    22 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (131.5 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 8: ~/workspace/external/uutils/uutils/target/release/tail -n +10000000 tests/inputs/505MB.txt
  Time (mean ± σ):     188.6 ms ±   3.8 ms    [User: 113.5 ms, System: 74.7 ms]
  Range (min … max):   186.1 ms … 197.2 ms    15 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (195.9 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Summary
  '~/workspace/external/uutils/uutils/target/release/tail -n +10 tests/inputs/505MB.txt' ran
    1.07 ± 0.11 times faster than '~/workspace/external/uutils/uutils/target/release/tail -n +1000 tests/inputs/505MB.txt'
    1.27 ± 0.04 times faster than '~/workspace/external/uutils/uutils/target/release/tail -n +100000 tests/inputs/505MB.txt'
    6.88 ± 0.26 times faster than 'tail -n +10000000 tests/inputs/505MB.txt'
   10.41 ± 0.34 times faster than '~/workspace/external/uutils/uutils/target/release/tail -n +10000000 tests/inputs/505MB.txt'
   15.04 ± 0.40 times faster than 'tail -n +100000 tests/inputs/505MB.txt'
   15.34 ± 0.41 times faster than 'tail -n +10 tests/inputs/505MB.txt'
   15.36 ± 0.40 times faster than 'tail -n +1000 tests/inputs/505MB.txt'

Additionally, I also checked that both tail and uutail produce the same output without errors

❯ for value in 10 1000 100000 10000000; do diff --suppress-common-lines -y <(tail -n +${value} tests/inputs/505MB.txt) <(~/workspace/external/uutils/uutils/target/release/tail -n +${value} tests/inputs/505MB.txt) || return 1; done
❯ echo $?
0

to be sure noone is cheating.

If I understand correctly, this means that we're still slow when actually reading from stdin, right? For example, in a case like this:

cat 505MB.txt | tail -n -10000

Would you mind giving me a shot, trying to write a better performing solution for piped stdin if beginning

coreutils/src/uu/tail/src/tail.rs

Line 1461 in 38b6ce5

if beginning {

is false, replacing the ringbuffer, like you suggested?

tertsdiepraam · 2022-08-21T15:30:45Z

@jhscheer I think @tertsdiepraam meant my first benchmark

I think I made a mistake there. The second benchmark made more sense. Anyway it's still an interesting difference.

Would you mind giving me a shot

Please do!

Rewrite handling of stdin when it is piped and read input in chunks. Fixes uutils#3842

This fixes a bug where calling `tail - < file.txt` would result in invoking `unbounded_tail()`. However, it is a stdin redirect to a seekable regular file and therefore `bounded_tail` should be invoked as if `tail file.txt` had been called.

tail: fix stdin redirect (#3842)

Rewrite handling of stdin when it is piped and read input in chunks. Fixes #3842

jhscheer mentioned this issue Aug 18, 2022

tail: fix stdin redirect (#3842) #3845

Merged

tertsdiepraam added the U - tail label Aug 18, 2022

Joining7943 pushed a commit to Joining7943/uutil-coreutils that referenced this issue Aug 23, 2022

tail: improve performance of piped stdin

274348c

Rewrite handling of stdin when it is piped and read input in chunks. Fixes uutils#3842

Joining7943 mentioned this issue Aug 23, 2022

tail: improve performance when stdin is piped #3874

Merged

Joining7943 added a commit to Joining7943/uutil-coreutils that referenced this issue Aug 23, 2022

tail: improve performance of piped stdin

c282304

Rewrite handling of stdin when it is piped and read input in chunks. Fixes uutils#3842

Joining7943 added a commit to Joining7943/uutil-coreutils that referenced this issue Aug 24, 2022

tail: improve performance of piped stdin

fa51fe8

Rewrite handling of stdin when it is piped and read input in chunks. Fixes uutils#3842

tertsdiepraam closed this as completed in #3845 Sep 7, 2022

tertsdiepraam added a commit that referenced this issue Sep 7, 2022

Merge pull request #3845 from jhscheer/fix_3842

987479d

tail: fix stdin redirect (#3842)

sylvestre pushed a commit that referenced this issue Sep 9, 2022

tail: improve performance of piped stdin

2658f8a

Rewrite handling of stdin when it is piped and read input in chunks. Fixes #3842

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uu-tail performance drop when reading from stdin #3842

uu-tail performance drop when reading from stdin #3842

Joining7943 commented Aug 18, 2022 •

edited

Loading

tertsdiepraam commented Aug 18, 2022 •

edited

Loading

Joining7943 commented Aug 18, 2022

jhscheer commented Aug 18, 2022

tertsdiepraam commented Aug 18, 2022

jhscheer commented Aug 19, 2022

Joining7943 commented Aug 21, 2022

tertsdiepraam commented Aug 21, 2022

uu-tail performance drop when reading from stdin #3842

uu-tail performance drop when reading from stdin #3842

Comments

Joining7943 commented Aug 18, 2022 • edited Loading

tertsdiepraam commented Aug 18, 2022 • edited Loading

Joining7943 commented Aug 18, 2022

jhscheer commented Aug 18, 2022

tertsdiepraam commented Aug 18, 2022

jhscheer commented Aug 19, 2022

Joining7943 commented Aug 21, 2022

tertsdiepraam commented Aug 21, 2022

Joining7943 commented Aug 18, 2022 •

edited

Loading

tertsdiepraam commented Aug 18, 2022 •

edited

Loading