Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C scanner infrastructure. #253

Closed
wants to merge 1 commit into from
Closed

Conversation

kevincox
Copy link

@kevincox kevincox commented Oct 27, 2016

This scanner allows low overhead reading of files. It can currently read
paths from an array of strings (good for ruby-based scanners) and a
terminated string (great for command executor scanners).

This also converts the find scanner and the git scanner to use the
string based infrastructure.

I'm probably going to convert the string -> Paths constructor to an FD based one. The string input is quite fast but it doesn't allow progress information or file limiting. File limiting could be done by the scanner with head but progress information would be difficult to do. It would also be faster as it can be done in parallel with the scanner itself.

If you wouldn't mind giving a general overview it would be appreciated. I'm running this right now and it appears to be working. I have tested the ruby, find and git scanners.

Highlights:

  • Ruby never has to parse the data (not allocations in the hot path)
  • Requires two allocations (backing string and matches array)
  • Matching requires allocations only if using the heap based approach.

TODO:

  • FD based rather then string based.
  • Max files work (error reporting)
  • See if I can push the wildignore filter into find (if it helps, likely not)
  • Code polish.

Fixes #249

autoload :FindFileScanner, 'command-t/scanner/file_scanner/find_file_scanner'
autoload :GitFileScanner, 'command-t/scanner/file_scanner/git_file_scanner'
autoload :RubyFileScanner, 'command-t/scanner/file_scanner/ruby_file_scanner'
autoload :WatchmanFileScanner, 'command-t/scanner/file_scanner/watchman_file_scanner'

def self.for_string s
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want this to be called new?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. We could use a more descriptive word than "string" though. Maybe for_scanner_type or something like that.

]
Dir.chdir @path do
CommandT::Paths.from_string `#{command}`, terminator,
drop: drop, # Drop leading ./
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(kevincox) move this comment to the drop method.

if @scan_dot_directories
''
else
'-not -path "*/.*/*" '
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(kevincox) make sure there are tests for this.

@@ -6,7 +6,7 @@

describe CommandT::Matcher do
def matcher(*paths)
scanner = OpenStruct.new(:paths => paths)
scanner = OpenStruct.new(:c_paths => CommandT::Paths.from_array(paths))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(kevincox) revert this (c_paths now does this conversion).

@kevincox kevincox force-pushed the c-scanner branch 2 times, most recently from 1cb57a4 to a51f3ce Compare October 29, 2016 13:57
@kevincox
Copy link
Author

Ok I like the way this looks. It relies on virtual memory to prevent reallocations and copies but I think that it will work on all modern OSes. That will need to be tested.

As for performance it is about 6x faster on my system. The most exciting part is that it is now "faster" then find $flags >/dev/null. So the scanning is no longer the bottleneck. In fact is is less the 4x slower then cat /tmp/filelist off a ramdisk so according to the numbers I remember this makes it much faster then git ls-files -c and locate.

Let me know if you have any quelms and I will try and address them.

@kevincox kevincox force-pushed the c-scanner branch 3 times, most recently from d1e7f4c to cc6610f Compare October 29, 2016 15:54
@kevincox kevincox changed the title [WIP] Add C scanner infrastructure. Add C scanner infrastructure. Oct 29, 2016
Copy link
Owner

@wincent wincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just gave this a superficial look over. Looks promising!

I'll give this a deeper look and start testing it locally, but in the meantime I've added some minor nit-ish feedback.

// Struct for representing a collection of matches.
typedef struct {
int len;
match_t matches[];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent by 4 spaces.


thread_count = NIL_P(threads_option) ? 1 : NUM2LONG(threads_option);
if (use_heap) {
heap_matches = malloc(thread_count * limit * sizeof(match_t));
heap_matches = malloc(sizeof(matches_t) + (thread_count*limit+1) * sizeof(match_t));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spaces around the * and + operators


long len = RARRAY_LEN(source);
long bufsize = sizeof(paths_t) + len * sizeof(match_t);
paths_t *paths = mmap(NULL, bufsize,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It relies on virtual memory to prevent reallocations and copies but I think that it will work on all modern OSes. That will need to be tested.

If we want to pre-check for the existence of mmap at build time we can add something to extconf.rb (it's going to be something like have_func or similar). But yes, we should have some fallback in place if it turns out that this isn't going to work on one of the major platforms.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can spin up a windows VM today and try to compile vim and see if this works. If you could do the mac test it would be appreciated. This will work on the BSDs and Linux for sure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have been a bit ambitious. I'm having trouble compiling even master, I probably have a borked gem development setup but I can compile json so I don't know what I'm doing wrong. Is command-t known to work on windows? If I feel brave I might give it another try tonight.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has worked historically, but I personally haven't tested it on Windows for a long, long time (years). It's one of those things where I make a best effort to not break things, but ultimately we need a Windows-using maintainer to step forward if we want to really be on top of the Windows side, and in the absence of that then we just have to rely on user reports of stuff working (or not).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to get some CI set up. There are free CI services for the big three platforms. Building and running the specs would help a lot even if not running vim.


def filter
if @wildignore
lambda {|p| !path_excluded? p }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after the {

end

def update
lambda {|count| progress_reporter.update count }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after the {

# Same in submodules
git submodule --quiet foreach --recursive '
git ls-files --exclude-standard -z |
awk "-vp=$path" "BEGIN{RS=\\"\\\\0\\";ORS =\\"\\\\0\\"}\\$0=p \\"#{File::SEPARATOR}\\" \\$0"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a comment somewhere near here saying what this awk incantation does.

end

def scanner_failed status, result
return super if @nogit
puts status.inspect
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover debugging code?

autoload :FindFileScanner, 'command-t/scanner/file_scanner/find_file_scanner'
autoload :GitFileScanner, 'command-t/scanner/file_scanner/git_file_scanner'
autoload :RubyFileScanner, 'command-t/scanner/file_scanner/ruby_file_scanner'
autoload :WatchmanFileScanner, 'command-t/scanner/file_scanner/watchman_file_scanner'

def self.for_string s
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. We could use a more descriptive word than "string" though. Maybe for_scanner_type or something like that.

// order matters; we want this to be evaluated only after ruby.h
#ifdef HAVE_PTHREAD_H
#include <pthread.h> /* for pthread_create, pthread_join etc */
#endif
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is copy-pasta? No use of pthread_*, qsort, strncmp in this file. Haven't checked all the headers included here, but I bet many of these include statements can be deleted (heap.h, matcher.h etc).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, forgot to clean that up. FIxed.

// Licensed under the terms of the BSD 2-clause license.

#include <ruby.h>
#include <assert.h>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

@kevincox
Copy link
Author

Thanks for the look over, I have fixed all your comments.

@kevincox kevincox force-pushed the c-scanner branch 2 times, most recently from 493d853 to 3249756 Compare October 30, 2016 12:14
@sstallion
Copy link
Contributor

sstallion commented Nov 16, 2016

FWIW, I'd love to see this feature added. I work in a very large code base (approx. 1GB of source) and launching command-t usually takes around 10 seconds to build the cache from git ls-files.

@kevincox
Copy link
Author

Thanks @sstallion, I kept forgetting to ping this. I would be very interested if you can time the next branch as well as this patch (also let me know if it works for you :) ) as well as just whatever scanner you use piped into /dev/null.

If you aren't using the next branch I would expect most of the improvement to be from switching to it. But the should also provide some improvement over that.

@sstallion
Copy link
Contributor

Sure thing. What's the best way to test? Normally I just interact with
Command-T inside of Vim, but if there is a way to measure the scanner
directly I'd be happy to switch and give you some results of next versus
this PR.

On Wed, Nov 16, 2016 at 3:45 PM, Kevin Cox notifications@github.com wrote:

Thanks @sstallion https://github.com/sstallion, I kept forgetting to
ping this. I would be very interested if you can time the next branch as
well as this patch (also let me know if it works for you :) ) as well as
just whatever scanner you use piped into /dev/null.

If you aren't using the next branch I would expect most of the improvement
to be from switching to it. But the should also provide some improvement
over that.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#253 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFtLiKopgiM6ZswmpRN9304dGW2mA3udks5q-3mDgaJpZM4Ki67d
.

@kevincox
Copy link
Author

kevincox commented Nov 17, 2016 via email

@wincent
Copy link
Owner

wincent commented Nov 17, 2016

Just an FYI, I haven't forgotten this, but it is a fairly invasive change so wanted it to "bake" a while, with at least @kevincox running it locally, and hopefully some others. Still interested in getting this into next.

@kevincox
Copy link
Author

Totally reasonable to let this settle for a bit. For an update on my usage I'm using it at work every day on a modified Ubuntu 14.04 and at home most days on Arch Linux and haven't noticed any problems.

@sstallion If you could try running it as your daily driver the testing would be much appreciated :)

@sstallion
Copy link
Contributor

Sounds like a plan. I've been meaning to move over to next and retire my
fork anyway!

On Thu, Nov 17, 2016 at 4:45 AM, Kevin Cox notifications@github.com wrote:

Totally reasonable to let this settle for a bit. For an update on my usage
I'm using it at work every day on a modified Ubuntu 14.04 and at home most
days on Arch Linux and haven't noticed any problems.

@sstallion https://github.com/sstallion If you could try running it as
your daily driver the testing would be much appreciated :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#253 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFtLiMLQwk3DWLXHWLh_dq9IorqaeGi6ks5q_DBBgaJpZM4Ki67d
.

@sstallion
Copy link
Contributor

sstallion commented Nov 17, 2016

All looks well so far, but I have noticed a compiler warning:

% make
compiling ext.c
compiling heap.c
compiling match.c
compiling matcher.c
compiling scanner.c
scanner.c:83:25: warning: implicit conversion from 'long long' to 'size_t' (aka 'unsigned long')
      changes value from 1099511627776 to 0 [-Wconstant-conversion]
    size_t buffer_len = 1099511627776; // 1TiB should be enough for anyone.
           ~~~~~~~~~~   ^~~~~~~~~~~~~
1 warning generated.
compiling watchman.c
linking shared-object ext.bundle

size_t is likely the wrong type to use here (i.e. dealing with 32-bit systems). You'll either want to use an unsigned long long or a uint64_t if you need a 64-bit unsigned integer.

@sstallion
Copy link
Contributor

sstallion commented Nov 17, 2016

In general, I'm seeing a 2-3x increase in scanning speed over next. Great work! I am however seeing some warning messages in Vim that I need to dig into. Would you be opposed to having me submit a PR to your branch to fix these issues?

@sstallion
Copy link
Contributor

sstallion commented Nov 17, 2016

Almost forgot:

git ls-files -cz --others > /dev/null  0.09s user 0.33s system 99% cpu 0.416 total

There's still a bit of overhead. Displaying the file list takes around 2 seconds with this PR. It's much snappier and having the spinner is a nice touch - it beats staring at my terminal for 5-10 seconds at a time waiting for a refresh 👍

@kevincox
Copy link
Author

kevincox commented Nov 17, 2016

@sstallion Thanks for catching that warning. I might have some work to do on a 32bit system since mmap takes a size_t so it does seem like the right type to use. I'll take a look at it or if you can submit a PR that would be great. It does sound like a real error.

Awesome, that speed improvement does look about right. Also consider that some of the overhead is from the first match (aka filter) (which is somewhat slower the subsequent matches since it does some precomputation which makes the next ones faster). Although as I say that it might be an interesting idea to move the mask generation into the scanner because then it can be done in parallel with the scanner executable. Since we are generally now speed limited by the external process this could completely hide that latency. I'll probably look into that in a future PR.

Also what OS are you running?

@sstallion
Copy link
Contributor

Great! This was tested on OS X El Capitan. I'll start digging in - so far I've only seen a couple of trivial problems.

@sstallion
Copy link
Contributor

This is strange. While looking into some odd behavior, I swapped back to the next branch and I'm now seeing this branch outperform this PR. What was taking 10 seconds before is now less than one. If I switch back to this PR, it goes up to about 2 seconds. Nothing in my environment has changed, which leads me to think that my initial test may not have been completely correct.

@kevincox
Copy link
Author

Hmm, that's interesting. Were your original tests with a hot cache? And what about the background CPU and RAM untilization? Also are you still using the same scanner? The Ruby scanner will be much slower then this version but shouldn't have a 10x difference between here and next. Git also does some clever things with the modified timestamps of files to speed up git ls-files -o but should be the same across runs unless you are constantly mutating the files.

If you can run bin/benchmarks/scanner.rb it would be helpful, as it puts in a little effort to produce consistent numbers. However it doesn't exist on next so you will have to git checkout $pr bin/ then change the benchmark script to hardcode the scanner you want to test.

Another think is to make sure you have recompiled the library when switching branches.

I'm very interested in why your performance just jumped for the next branch "without changing anything"

This scanner allows low overhead reading of files. It can currently read
paths from an array of strings (good for ruby-based scanners) and a
terminated string (great for command executor scanners).

This also converts the find scanner and the git scanner to use the
string based infrastructure.
wincent added a commit that referenced this pull request Jul 16, 2022
Takes the idea from this 2016 PR that never got merged:

    #253

of creating a large `mmap()`-backed allocation and having the scanner
read into it from an external process. This is by no means finished yet,
but I wanted to get it to the point where it runs, and it does run.

Key changes to the scanner:

- New `scanner_new_command()` function that takes a script command
  string and runs it, capturing the results. The output is stored in an
  `mmap()`-ed buffer, which is retained for the lifetime of the scanner.
  Associated `str_t` candidates also live in an `mmap()`-ed region.
  Their `contents` pointers point directly into the buffer, so there is
  no additional string copying.
- The existing `scanner_new_copy()` is updated to use an `mmap()`-backed
  allocation as well, just for simplicity (so I don't have to maintain
  book-keeping data-structures for `free()`-ing `malloc()`-ed memory
  blocks and `munmap()`-ing `mmap()`-ed ones). I will include benchmarks
  here to "prove" that this doesn't seem to have had any unanticipated
  performance effects.
- Deleted `scanner_new()` because it is was unused. As the only user of
  the `capacity` feature, I got rid of the `capacity` field. Because
  scanners use `mmap()` now, they can act as though they didn't have a
  capacity limit (technically, they do, but it is absurdly large and I
  can't see it being an issue in real workloads).
- Switched from `xmalloc()` to `xcalloc()` in a couple of places to
  ensure the `scanner_t` struct fields all get initialized to zero. This
  is important so that we don't end up trying to `munmap()`
  uninitialized fields like `buffer` when using the `scanner_new_copy()`
  form of scanner.
- The Watchman scanner continues to use `scanner_new_str()`, which is
  the non-copying version of the scanner. We don't want to call
  `munmap()` on those candidates, because they were allocated with
  `malloc(); I considered flagging this by setting `candidates_size` to
  zero, but given that there was no performance penalty detected by the
  benchmark run (see below), I opted to instead teach the Watchman
  scanner to use `xmap()` too.

Perf delta info follows. Nothing to see here. The benchmarks use
`scanner_new_copy()`, so we are indeed exercising the `mmap()`-ed code
path:

                      best    avg      sd     +/-     p    (best)    (avg)      (sd)     +/-   p
       pathological 0.28685 0.29972 0.11067 [-0.6%]      (0.28685) (0.29980) (0.11089) [-1.4%]
          command-t 0.23675 0.24726 0.08370 [-0.1%]      (0.23675) (0.24759) (0.08559) [-1.2%]
  chromium (subset) 1.63047 1.64562 0.10958 [+0.7%] 0.05 (0.42029) (0.43740) (0.19917) [+2.4%]
   chromium (whole) 1.88771 1.90492 0.09161 [+0.1%]      (0.30267) (0.31403) (0.07257) [+1.6%]
         big (400k) 2.91391 2.95510 0.14608 [+0.2%]      (0.45750) (0.47420) (0.07242) [+0.8%]
              total 6.96067 7.05262 0.29309 [+0.2%]      (1.70912) (1.77301) (0.28325) [+0.7%]

Differences from the PR:

- That one leans heavily on Ruby-level stuff (ie. both in the calling
  code, and in the C layer) whereas this one is "pure" C. For example,
  the `popen` call in the PR is made from Ruby, and lots of Ruby values
  are passed back and forth. There are calls back into the Ruby layer to
  do filtering that are not implemented here (and I rather like the idea
  of keep things in "pure" C, so they never be).
- There are lots of features in the PR that aren't implemented yet (and
  again, I haven't decided when/if I will implement them): things like
  control over the max files count, ability to pass in a `drop` value to
  skip over entries, ability to specify the field separator character etc.
wincent added a commit that referenced this pull request Jul 16, 2022
To reflect indirect contribution referenced in the parent commit.

Cross-referencing again:

    #253
@wincent
Copy link
Owner

wincent commented Jul 16, 2022

Just to "close" the loop here after many years 🤦, over on the pu branch I've been working on a basically from-scratch Command-T rewrite:

  • Targeting Neovim exclusively instead of Vim.
  • Plug-in code in Lua instead of Vimscript.
  • C library is pure POSIX with no references to Ruby VM, or even Lua.
  • Instead of calling in to C from Ruby, we call into C from LuaJIT using the FFI.

The basic idea is to reset the whole thing onto some more modern underpinnings. Compatibility with Windows or Vim, for example, aren't on the priority list at this stage — just getting the thing to work for simple cases is the focus (without necessarily even many configuration options; I just want to pick some "reasonable defaults" and see them work first).

My C is super rusty at this point (although it is becoming less so as I proceed with the rewrite), and while it is far from finished, it is showing a nice performance boost already. I started by porting the core matching/scoring code, then I ported the Watchman scanner. The numbers showed a roughly 2x speed-up on some of the metrics, even though the performance critical sections were still only going from C to C. Last night, I took the initial steps towards implementing the "C scanner infrastructure" proposed in this PR; I haven't tested it rigorously yet, and it doesn't have all the features or configuration options, but it works well enough to be considered a "blinking light" demo at least. This PR obviously won't apply to pu because the basis is drastically different (in addition to porting from "Ruby C" to POSIX C", I have done quite a bit of "clean up" and restructuring along the way), but what I did bring across is the idea of mmap()-ing a large slab and read()-ing into from an external process.

Here are some matcher numbers (I haven't set up scanner benchmarks yet, but I will). As always, CPU time is rendered without parens, wall-clock time (the actually interesting time) is rendered with parens. So, for example, we see average wall clock time for the "big" dataset go from 0.66s in Ruby to 0.47s in Lua. A smaller dataset like the "command-t" one goes from 0.48s in Ruby to 0.24s in Lua. The "pathological" dataset goes from 0.54s to 0.29s etc.

Ruby (main)

                    best    avg      sd     +/-   p   (best)    (avg)      (sd)     +/-   p
     pathological 0.53443 0.54394 0.00720 [-1.1%]   (0.53443) (0.54404) (0.00728) [-1.2%]
        command-t 0.47690 0.48567 0.00358 [-0.4%]   (0.47690) (0.48569) (0.00359) [-0.5%]
chromium (subset) 1.97640 1.98757 0.00640 [-0.1%]   (0.48688) (0.48923) (0.00199) [-0.3%]
 chromium (whole) 2.70771 2.72148 0.01165 [-0.4%]   (0.42343) (0.43254) (0.00931) [-3.1%]
       big (400k) 4.27406 4.31385 0.04007 [-0.4%]   (0.64053) (0.66157) (0.02466) [-1.5%]

Lua (pu)

                    best    avg      sd     +/-     p    (best)    (avg)      (sd)     +/-   p
     pathological 0.28685 0.29972 0.11067 [-0.6%]      (0.28685) (0.29980) (0.11089) [-1.4%]
        command-t 0.23675 0.24726 0.08370 [-0.1%]      (0.23675) (0.24759) (0.08559) [-1.2%]
chromium (subset) 1.63047 1.64562 0.10958 [+0.7%] 0.05 (0.42029) (0.43740) (0.19917) [+2.4%]
 chromium (whole) 1.88771 1.90492 0.09161 [+0.1%]      (0.30267) (0.31403) (0.07257) [+1.6%]
       big (400k) 2.91391 2.95510 0.14608 [+0.2%]      (0.45750) (0.47420) (0.07242) [+0.8%]
            total 6.96067 7.05262 0.29309 [+0.2%]      (1.70912) (1.77301) (0.28325) [+0.7%]

Next steps

pu is still a long way from being finished or usable; it builds, and you can test the scanning and the matching, but I haven't even built a proper UI for selecting results yet. So my next steps are going to be tidying a lots of loose ends (TODOs sprinkled all over the place), getting the UI to the point where you can actually use the Lua-powered version as a replacement for the Ruby-powered one, and then slowly evaluating which additional features/settings to bring forward. This PR is never going to see pu itself, but I still wanted to thank you, @kevincox, for submitting it — the ideas in the PR may live on in a new form, even if the original patch does not!

@kevincox
Copy link
Author

That is exciting 🎉

As for this patch I no longer work with a large repo so have stopped using it myself. So I'll close it, of course the code will be archived in case it is interesting to anyone.

If you want a second pair of eyes on the new patch feel free to ask for a review. But I'm afraid my C is growing rusty as well these days.

@kevincox kevincox closed this Jul 16, 2022
Padmamanickam added a commit to Padmamanickam/command-t that referenced this pull request Feb 6, 2023
Takes the idea from this 2016 PR that never got merged:

    wincent/command-t#253

of creating a large `mmap()`-backed allocation and having the scanner
read into it from an external process. This is by no means finished yet,
but I wanted to get it to the point where it runs, and it does run.

Key changes to the scanner:

- New `scanner_new_command()` function that takes a script command
  string and runs it, capturing the results. The output is stored in an
  `mmap()`-ed buffer, which is retained for the lifetime of the scanner.
  Associated `str_t` candidates also live in an `mmap()`-ed region.
  Their `contents` pointers point directly into the buffer, so there is
  no additional string copying.
- The existing `scanner_new_copy()` is updated to use an `mmap()`-backed
  allocation as well, just for simplicity (so I don't have to maintain
  book-keeping data-structures for `free()`-ing `malloc()`-ed memory
  blocks and `munmap()`-ing `mmap()`-ed ones). I will include benchmarks
  here to "prove" that this doesn't seem to have had any unanticipated
  performance effects.
- Deleted `scanner_new()` because it is was unused. As the only user of
  the `capacity` feature, I got rid of the `capacity` field. Because
  scanners use `mmap()` now, they can act as though they didn't have a
  capacity limit (technically, they do, but it is absurdly large and I
  can't see it being an issue in real workloads).
- Switched from `xmalloc()` to `xcalloc()` in a couple of places to
  ensure the `scanner_t` struct fields all get initialized to zero. This
  is important so that we don't end up trying to `munmap()`
  uninitialized fields like `buffer` when using the `scanner_new_copy()`
  form of scanner.
- The Watchman scanner continues to use `scanner_new_str()`, which is
  the non-copying version of the scanner. We don't want to call
  `munmap()` on those candidates, because they were allocated with
  `malloc(); I considered flagging this by setting `candidates_size` to
  zero, but given that there was no performance penalty detected by the
  benchmark run (see below), I opted to instead teach the Watchman
  scanner to use `xmap()` too.

Perf delta info follows. Nothing to see here. The benchmarks use
`scanner_new_copy()`, so we are indeed exercising the `mmap()`-ed code
path:

                      best    avg      sd     +/-     p    (best)    (avg)      (sd)     +/-   p
       pathological 0.28685 0.29972 0.11067 [-0.6%]      (0.28685) (0.29980) (0.11089) [-1.4%]
          command-t 0.23675 0.24726 0.08370 [-0.1%]      (0.23675) (0.24759) (0.08559) [-1.2%]
  chromium (subset) 1.63047 1.64562 0.10958 [+0.7%] 0.05 (0.42029) (0.43740) (0.19917) [+2.4%]
   chromium (whole) 1.88771 1.90492 0.09161 [+0.1%]      (0.30267) (0.31403) (0.07257) [+1.6%]
         big (400k) 2.91391 2.95510 0.14608 [+0.2%]      (0.45750) (0.47420) (0.07242) [+0.8%]
              total 6.96067 7.05262 0.29309 [+0.2%]      (1.70912) (1.77301) (0.28325) [+0.7%]

Differences from the PR:

- That one leans heavily on Ruby-level stuff (ie. both in the calling
  code, and in the C layer) whereas this one is "pure" C. For example,
  the `popen` call in the PR is made from Ruby, and lots of Ruby values
  are passed back and forth. There are calls back into the Ruby layer to
  do filtering that are not implemented here (and I rather like the idea
  of keep things in "pure" C, so they never be).
- There are lots of features in the PR that aren't implemented yet (and
  again, I haven't decided when/if I will implement them): things like
  control over the max files count, ability to pass in a `drop` value to
  skip over entries, ability to specify the field separator character etc.
Padmamanickam added a commit to Padmamanickam/command-t that referenced this pull request Feb 6, 2023
To reflect indirect contribution referenced in the parent commit.

Cross-referencing again:

    wincent/command-t#253
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants