Skip to content

Conversation

@sisp
Copy link
Contributor

@sisp sisp commented May 3, 2024

I've optimized LFS path filtering as we discussed in #338. The first optimization implements the suggestion in #338 (comment) to short-cut a single-path include filter. Here are the tests for the regex that extracts the path prefix: https://regex101.com/r/wBjHf0/1 Note that the extra \n on regex101.com is only necessary to allow one test case per line, it isn't needed in the actual regex. The second optimization unionizes the filename regex patterns derived from Unix filename patterns and matches each path against the pre-compiled single regex, which is faster than matching against the Unix filename patterns individually. Also, it avoids intermediate list materialization but instead implements a streaming filter.

As the two commits implement independent optimizations, I intend this PR to be rebase-merged without commit squashing.

Partially fixes #338.

@shcheklein shcheklein merged commit bd11bec into treeverse:main May 7, 2024
@sisp sisp deleted the lfs/collect-objects-perf branch May 7, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lfs: internal _filter_paths() function is prohibitively slow

2 participants