Skip to content

Fix #12821 (GUI: exclude folders in imported project) #7645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

glankk
Copy link
Collaborator

@glankk glankk commented Jul 6, 2025

No description provided.

@glankk glankk force-pushed the 12821 branch 3 times, most recently from f29f42d to 572dedf Compare July 6, 2025 07:00
@firewave
Copy link
Collaborator

firewave commented Jul 6, 2025

See #7540. I had some issues with the behavior which I have not outlined in that PR yet because the code contains a bug which made the tests all pass although that should not have.

Also std::regex has abysmal performance and no code should ever use it. Thus we should not rely on it's interface. See #6211 a wrapper for our existing regex. That is complete and working but has a memory leak. I think the leak should be ignored for now since it is intermediate code anyways. I had a local branch which replaces the implement with std::regex as another intermediate step to see if our wrapper wi8ll work with other implementation.

@@ -157,7 +157,7 @@ def test_gui_project_loads_relative_vs_solution_2(tmp_path):
def test_gui_project_loads_relative_vs_solution_with_exclude(tmp_path):
proj_dir = tmp_path / 'proj2'
shutil.copytree(__proj_dir, proj_dir)
create_gui_project_file(os.path.join(tmp_path, 'test.cppcheck'), root_path='proj2', import_project='proj2/proj2.sln', exclude_paths=['b'])
create_gui_project_file(os.path.join(tmp_path, 'test.cppcheck'), root_path='proj2', import_project='proj2/proj2.sln', exclude_paths=['b/'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike that we need to rely on some options ending with / so they are treated as a path. That also complicated things while working on my PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think the path matching is currently trying to do too much which makes the syntax weird and difficult to implement. Ideally I would like it to make it simpler but I've been hesitant to do so because I don't know what might break because of it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I would like it to make it simpler but I've been hesitant to do so because I don't know what might break because of it.

I dislike it also.. but it is good imho that you try to limit the effects for now.. it's already a big PR anyway.

@@ -450,7 +450,7 @@ class TestImportProject : public TestFixture {
project.fileSettings = {std::move(fs1), std::move(fs2)};

project.ignorePaths({"*foo", "bar*"});
ASSERT_EQUALS(2, project.fileSettings.size());
ASSERT_EQUALS(1, project.fileSettings.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this match less? That obviously looks like a bug.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because * was changed to match any number of characters, including zero. I don't really have a strong opinion on that, but that's how globs usually behave.

if (caseSensitive)
mRegex = std::regex(regex_string, std::regex_constants::extended);
else
mRegex = std::regex(regex_string, std::regex_constants::extended | std::regex_constants::icase);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to print the regular expression and its respective input with the --debug-ignore option.

@firewave
Copy link
Collaborator

firewave commented Jul 6, 2025

See also https://trac.cppcheck.net/ticket/12268. And if you work on a ticket please assign yourself to it in Trac.

@glankk
Copy link
Collaborator Author

glankk commented Jul 7, 2025

See #7540. I had some issues with the behavior which I have not outlined in that PR yet because the code contains a bug which made the tests all pass although that should not have.

Also std::regex has abysmal performance and no code should ever use it. Thus we should not rely on it's interface. See #6211 a wrapper for our existing regex. That is complete and working but has a memory leak. I think the leak should be ignored for now since it is intermediate code anyways. I had a local branch which replaces the implement with std::regex as another intermediate step to see if our wrapper wi8ll work with other implementation.

I only noticed you were also working on PathMatch after I made this, sorry about that.

I agree the standard regex is slow and I'd rather use pcre, but I didn't want to make it a hard dependency just for this. For the common use case of PathMatch, which is matching maybe a few hundred files or so against a handful of patterns, the performance penalty of using std::regex instead is on the order of milliseconds per run. Having a backend-agnostic regex wrapper sounds like a good idea though.

@firewave
Copy link
Collaborator

firewave commented Jul 7, 2025

I only noticed you were also working on PathMatch after I made this, sorry about that.

No problem. I started working on it thinking it would a low-hanging fruit which wasn't the case. And I forgot to tag it with the ticket. And the one you worked on was no properly triaged.

I agree the standard regex is slow and I'd rather use pcre, but I didn't want to make it a hard dependency just for this.

Right, PCRE is currently not a hard dependency. So PCRE2 might not be a good replacement.

For the common use case of PathMatch, which is matching maybe a few hundred files or so against a handful of patterns, the performance penalty of using std::regex instead is on the order of milliseconds per run.

That might depend on the scenario. If there might be thousands of files it could be slow. It has been ages since I tried using std::regex but it was it beyond unacceptable I doubt it was improved upon (it seems like it was completely abandoned and nobody has the balls to suggest to remove it from the standard). PCRE feels like a nop.

Having a backend-agnostic regex wrapper sounds like a good idea though.

I will try to get the regex refactoring in. The first step is awkward but with some help we might get it into better shape fast.

@firewave
Copy link
Collaborator

firewave commented Jul 7, 2025

With a glob implementation inline in PathMatch we now have two different implementations. The other is matchglob() in utils.cpp. We should only have a single one.

That is probably where my performance concerns might have come from since the latter is also used outside of path handling stuff.

@glankk
Copy link
Collaborator Author

glankk commented Jul 8, 2025

With a glob implementation inline in PathMatch we now have two different implementations. The other is matchglob() in utils.cpp. We should only have a single one.

That is probably where my performance concerns might have come from since the latter is also used outside of path handling stuff.

I mostly agree with this, for the three places where matchglob is currently used for something other than paths it makes sense to keep it that way, all other uses of matchglob should be changed to use PathMatch instead.

@firewave
Copy link
Collaborator

firewave commented Jul 8, 2025

I mostly agree with this, for the three places where matchglob is currently used for something other than paths it makes sense to keep it that way, all other uses of matchglob should be changed to use PathMatch instead.

I agree since it makes things more explicit. But I would still prefer if we only have a single implementation of the glob logic.

if (Path::isAbsolute(path))
p = Path::simplifyPath(path);
else
p = Path::simplifyPath(Path::getCurrentPath() + "/" + path);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling it would be good to create a member mCurrentPath that is used instead of the function call. It will be good for unit testing purposes. And avoids repeated getcwd system calls.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PathMatch already has mWorkingDirectories which also uses CWD. That should be configured from the outside. The implementation should best be filesystem-agnostic (not taking the flawed case-sensitivity handling into mind right now).

@glankk glankk force-pushed the 12821 branch 5 times, most recently from 417d477 to 692cb54 Compare July 10, 2025 13:25
@glankk glankk force-pushed the 12821 branch 2 times, most recently from 8819409 to 3348484 Compare July 11, 2025 12:39
@glankk glankk marked this pull request as ready for review July 11, 2025 16:01
@glankk
Copy link
Collaborator Author

glankk commented Jul 11, 2025

I've updated this to use PathMatch everywhere a pattern is matched against a file, so that file matching is consistent. While doing so I found that there are places where the performance of PathMatch is actually quite important (especially in suppressions), so I've opted to use a hand-written matcher instead of regex.

Another noteworthy change is that directories can now be matched without a trailing '/'. This has the awkward side-effect that 'foo.cpp' matches 'foo.cpp/bar.cpp' when 'foo.cpp' happens to be a directory, and 'foo/' matches 'foo' even if foo is not a directory (enforcing directories for such patterns would require a file system lookup), but I still think it's preferable.

Regarding matchglob, I think it makes sense to have a more general wildcard implementation in matchglob, and a separate implementation in PathMatch because I'm not convinced that the rules for path globs should be the same as for matching general wildcards in error id's and such. Path globs need to match on path component boundaries, and '*' and '?' do not match path separators. For matching path separators as well, '**' is used instead.

These changes fix #12821, #12268, and #13997. Possibly related: #13983.

{
const std::string& nativePath = toNativeSeparators(path);
return c == '/' || c == '\\';
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm techically in linux, backslash is not a path separator. you can create a directory or file with backslash in the name.

case '*': {
bool slash = false;
++s;
if (*s == '*') {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a change of logic that will probably mean that lots of users will have to adapt their configurations. We have documented that a single star matches path separators in some places I believe so it would make sense to look through documentation .. But overtime I have the feeling it is better to use **.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would make sense to look through documentation

I tried to quickly locate some such documentation and did not see it.

But adding a note about it in the releasenotes.txt wouldn't hurt at least.

*/
explicit PathMatch(std::vector<std::string> paths, bool caseSensitive = true);
enum class Mode : std::uint8_t {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ehm.. Writing a enum is better. But not sure about the names here. I think I'd prefer to not abbreviate. I don't have a clear idea about what names I would prefer instead of Mode,icase,scase.

static bool isRelativePattern(const std::string &pattern)
{
if (pattern.empty() || pattern[0] != '.')
return false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spontanously I feel that file1.c should be a "relative pattern"?

@@ -157,7 +157,7 @@ def test_gui_project_loads_relative_vs_solution_2(tmp_path):
def test_gui_project_loads_relative_vs_solution_with_exclude(tmp_path):
proj_dir = tmp_path / 'proj2'
shutil.copytree(__proj_dir, proj_dir)
create_gui_project_file(os.path.join(tmp_path, 'test.cppcheck'), root_path='proj2', import_project='proj2/proj2.sln', exclude_paths=['b'])
create_gui_project_file(os.path.join(tmp_path, 'test.cppcheck'), root_path='proj2', import_project='proj2/proj2.sln', exclude_paths=['b/'])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I would like it to make it simpler but I've been hesitant to do so because I don't know what might break because of it.

I dislike it also.. but it is good imho that you try to limit the effects for now.. it's already a big PR anyway.

@@ -3345,7 +3345,7 @@ class TestCmdlineParser : public TestFixture {
const char * const argv[] = {"cppcheck", "-isrc\\file.cpp", "src/file.cpp"};
ASSERT(!fillSettingsFromArgs(argv));
ASSERT_EQUALS(1, parser->getIgnoredPaths().size());
ASSERT_EQUALS("src/file.cpp", parser->getIgnoredPaths()[0]);
ASSERT_EQUALS("src\\file.cpp", parser->getIgnoredPaths()[0]);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho it is preferable to always use / as path separator internally. we don't have to ensure that backslash is handled properly everywhere then. would it cause problems to keep this behavior?

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants