Match filename rather than paths #73

kimt33 · 2017-11-19T15:10:56Z

Split the given path into its components and compare each component
with the given pattern (re.split is used)
Add exception for double backslash (would cause problems with split)
Add tests and documentation
Little rearrangement

1. Split the given path into its components and compare each component with the given pattern (re.split is used) 2. Add exception for double backslash (would cause problems with split) 3. Add tests and documentation 4. Little rearrangement

codecov · 2017-11-19T15:18:18Z

Codecov Report

Merging #73 into master will increase coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #73      +/-   ##
==========================================
+ Coverage   98.13%   98.16%   +0.03%     
==========================================
  Files          17       17              
  Lines         642      655      +13     
  Branches       93       94       +1     
==========================================
+ Hits          630      643      +13     
  Misses         12       12

Impacted Files	Coverage Δ
cardboardlint/tests/test_common.py	`100% <100%> (ø)`	⬆️
cardboardlint/common.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed36bcb...aa56061. Read the comment docs.

tovrstra · 2017-11-20T09:45:50Z

Good idea but we need to work out a few more details.

Can you explain the purpose of the double backslash?

Some of the tests are not really in line with expected behavior on a bash terminal, e.g. I would expect
assert matches_filefilter('foo/a.py', ['+ *.py']) should fail when one tries to mimic the behavior of the glob function or bash. How do we want to deal with this? For example, it would be good to look at how rsync works because it has to handle to same problem, namely mimicking bash behavior, while also adding a few extra features where bash falls short for file filtering. See https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/rsync.1.html and scroll down to the section "INCLUDE/EXCLUDE PATTERN RULES".

The best way to implement this is to translate the pattern into one regular expression that works on the entire path name and gives a match or not. This is an example implementation of that idea: https://github.com/metagriffin/globre

tovrstra · 2017-11-20T09:47:42Z

P.S. When you put "Fixes #72" in the first message of your PR, that issue will get closed automatically when the PR is merged. This is better than putting it in the PR title because that only adds confusion, namely having two numbers appearing on the title line.

tovrstra · 2017-11-20T09:49:38Z

Oh, something we shouldn't forget: when we merge a change like this, we should do it at a moment where we can fix all the repositories that rely on the current behavior.

kimt33 · 2017-11-20T17:14:15Z

Can you explain the purpose of the double backslash?

I thought the backslash will cause some problems later on for someone that would use a character with a backslash. I mainly thought of it for people that would use spaces in the filename, like file\ name.py, but on second thought, this would probably get passed as a string so it would look more like file name.py. Still, if we allow backslash to be used, then the case of '\' + os.sep should be considered, which should be treated differently than just os.sep.

So then the question is to support it or not because these crazy names violate some sort of PEP or law of nature. I thought it should be supported because these names should be caught (hopefully) by some linter anyways and the functionality of the method is to filter out names with the given pattern (and not to judge the user for their unconventional naming practices).

The complexity introduced to the code wasn't that terrible (i.e. using regular expression) in exchange for slight generality, so I thought it was okay.

kimt33 · 2017-11-20T17:17:03Z

P.S. When you put "Fixes #72" in the first message of your PR, that issue will get closed automatically when the PR is merged. This is better than putting it in the PR title because that only adds confusion, namely having two numbers appearing on the title line.

Thank you, it does make things a bit ugly. I thought that adding the issue number will link the PR to the issue but that can be done somewhere in the text and not in the title.

kimt33 · 2017-11-21T15:06:49Z

I like the *.py format because it doesn't assume anything about the package structure (like gitignore). Normally, I would need to check all python files, and maybe skip over some files within certain directories. If I go up to three/four levels of subdirectory, then I would need to include multiple patterns that more or less do the same thing. And I would need to change them everytime I move something around.

I just thought that since statements like + *.py, + */*.py, + */*/*.py are impractical, we should use *.py to describe them all. If we use *.py format to search the ends of the path, then we should also be able to use test_*.py.

I just checked the rsync documentation and I think they do the same thing (I think):

if the pattern starts with a / then it is anchored to a particular spot in the hierarchy of files, otherwise it is matched against the end of the pathname.

Even the unanchored "sub/foo" would match at any point in the hierarchy where a "foo" was found within a directory named "sub".

kimt33 · 2017-11-30T06:18:01Z

@tovrstra I remembered the purpose of the double backslash. The code works by dividing up the given path (as a string) into the contributing directories (and subdirectories) and/or the filename. The path is "decomposed" by splitting it by the os.sep, which I'll denote / for the sake of argument. Then, /dir/sub/file should be divided into '', 'dir', 'sub', and 'file'.

However, if the directory name or the filename contains the os.sep, e.g. '2015/6/2.txt', then the code will misinterpret the name as a collection of subdirectories. Since this name will be passed in as 2015\/6\/2.txt, we use the regular expression to split using / only when it is not preceded by \.

However, if a directory name ends with a double backslash, e.g. why\\, then all the names that succeed it will be misinterpreted within this scheme. For example, the file not.txt in the directory why\\ will be described as why\\/not.txt. Since \/ are excluded, the code will interpret this as a single file rather than a file in a directory. To avoid these cases (which can be made even more complex with more backslashes), we raise an error if there are any double backslashes.

tovrstra · 2017-11-30T13:07:20Z

ok. I understand. Thanks for explaining.

kimt33 force-pushed the filefilter_path branch from ab47b53 to aa56061 Compare November 19, 2017 15:15

kimt33 changed the title ~~Match filename rather than paths (Issue #72)~~ Match filename rather than paths Nov 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match filename rather than paths #73

Match filename rather than paths #73

kimt33 commented Nov 19, 2017

codecov bot commented Nov 19, 2017 •

edited

Loading

tovrstra commented Nov 20, 2017

tovrstra commented Nov 20, 2017

tovrstra commented Nov 20, 2017

kimt33 commented Nov 20, 2017

kimt33 commented Nov 20, 2017

kimt33 commented Nov 21, 2017 •

edited

Loading

kimt33 commented Nov 30, 2017 •

edited

Loading

tovrstra commented Nov 30, 2017

Match filename rather than paths #73

Are you sure you want to change the base?

Match filename rather than paths #73

Conversation

kimt33 commented Nov 19, 2017

codecov bot commented Nov 19, 2017 • edited Loading

Codecov Report

tovrstra commented Nov 20, 2017

tovrstra commented Nov 20, 2017

tovrstra commented Nov 20, 2017

kimt33 commented Nov 20, 2017

kimt33 commented Nov 20, 2017

kimt33 commented Nov 21, 2017 • edited Loading

kimt33 commented Nov 30, 2017 • edited Loading

tovrstra commented Nov 30, 2017

codecov bot commented Nov 19, 2017 •

edited

Loading

kimt33 commented Nov 21, 2017 •

edited

Loading

kimt33 commented Nov 30, 2017 •

edited

Loading