Skip to content

Files/FileList: adding the same file twice should not increment FileList::$numFiles #1150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rodrigoprimo
Copy link
Contributor

Description

This PR fixes the FileList::__construct() and FileList::addFile() methods to ensure that, when there is an attempt to add the same file twice, the FileList::$numFiles variable is not incremented. This behavior was causing a fatal error when using the --parallel option and passing the same file twice (see #1113).

The code was already handling duplicates correctly in the sense that duplicated files were not added to FileList::$files. However, FileList::$numFiles was being incorrectly incremented.

This PR also includes some basic tests for FileList::__construct() and FileList::addFile() in separate commits.

Suggested changelog entry

Fixed: fatal error when using the --parallel option and passing the same file twice

Related issues/external references

Fixes #1113

Additional notes

There is some duplicated logic in FileList::__construct() and FileList::addFile(). I considered refactoring the duplicated code to a private method before adding the check that is added in this commit. I decided not to do it as there are some subtle differences between the logic in the two methods that make the refactor not straightforward.

FileList::__construct() always sets the value of an entry in the FileList::$files to null and the key to whatever is returned by SplFileInfo::getPathname(). While FileList::addFile() sets the value of an entry in the FileList::$files to null or an object passed to the method, and the key to the path passed to the method.

I decided to leave this refactor to remove the duplication to the future and focus this commit on fixing the issue with handling duplicated files. I believe it will be necessary to write additional tests for this class and also incorporate more defensive code.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
    • This change is only breaking for integrators, not for external standards or end-users.
  • Documentation improvement

PR checklist

  • I have checked there is no other PR open for the same change.
  • I have read the Contribution Guidelines.
  • I grant the project the right to include and distribute the code under the BSD-3-Clause license (and I have the right to grant these rights).
  • I have added tests to cover my changes.
  • I have verified that the code complies with the projects coding standards.
  • [Required for new sniffs] I have added XML documentation for the sniff.

Copy link
Member

@jrfnl jrfnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rodrigoprimo, nice work!

I've left some small comments and questions in various places, nothing really blocking.

Only real question I have is with regards to the solution applied in the third commit versus the suggestion I left in #1113 (comment)

I've now started wondering if $numFiles shouldn't be set just once at the end of the __construct() method via a count($this->files) instead of doing the incrementing whenever a new file is being added... ? I'd expect that would solve the issue too.

While I can see that, as the addFile() method is public, this wouldn't solve anything for addFile() when not called from the constructor, I do imagine the above suggestion could simplify the code change for the constructor itself.
Was there a particular reason why you went with a different solution ?

/**
* Tests for the \PHP_CodeSniffer\Files\FileList::addFile method.
*
* @copyright 2019-2025 PHPCSStandards Contributors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the "2019-2025" come from ? (here and elsewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the line @copyright 2019-2024 PHPCSStandards Contributors from another file and, without thinking too much about it, updated it to 2025 since this is the current year.

It is not clear to me if the years apply to just this file or to the whole repository. If it is just this file, should it be?

@copyright 2025 PHPCSStandards Contributors

I can see that phpDocumentor leaves the format of the description of the @copyright tag to each individual project (https://docs.phpdoc.org/guide/references/phpdoc/tags/copyright.html). Maybe this is documented somewhere in this repo, and I missed it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me if the years apply to just this file or to the whole repository. If it is just this file, should it be?

Well, it's a bit messy in this repository, so generally speaking, you can assume it applies to the specific file. I still intend to look into streamlining this at some point, but there may be legal implications of doing that, so something I want to be careful with.

@copyright 2025 PHPCSStandards Contributors is what I would have expected for this new file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I opted to make this change in two separate commits (one for ConstructTest.php and FilterDouble.php, and another for AddFileTest.php) to hopefully make it easier when squashing these changes into the original commits.


$this->assertCount(count($expectedFiles), $fileList, 'File count mismatch');

$i = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the $i have value in the error messages ? I mean, the array is being sorted by key anyway and the $expectedFiles does not have explicit keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is that it would help identify which file caused the assertion to fail by providing the index of the corresponding value in the $expectedFiles array. But I can see that this is not very straightforward. Perhaps it would be better to modify the error message to include the $filePath instead of $i??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea. Sounds more descriptive to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* Setup objects needed for the tests.
*
* @before
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@before or could/should this be @beforeClass ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a vague memory of us discussing that creating the Ruleset class in the tests has a performance impact. That is why I went with the mock of this class. I don't think I'm able to create a mock of Ruleset if I use @beforeClass.

That being said, I'm not sure what is better: creating an instance of ConfigDouble and a mock of Ruleset for every single test, or creating an instance of ConfigDouble and an instance of Ruleset once for all the tests.

Is this more or less what you had in mind with this question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering you need both for the data provider anyway.... I imagine there is some optimization which can now be applied....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure if I understand what you're hinting at, but I gave it a try and added two new commits (one for ConstructTest.php and another for AddFileTest.php), making both test classes use an initializeConfigAndRuleset() method that uses the @beforeClass tag.

For AddFileTest, the Ruleset class is still instantiated twice, as initializeConfigAndRuleset() needs to be called directly inside dataAddFile().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was hinting at was that both the a beforeClass method and the data provider need to be static methods - and need to write to static properties, so the beforeClass method could check if the properties had been initialized and if not, could create the objects and the data provider could call the beforeClass method to do the same and then in both the tests as well as the data provider, the statically stored objects can be re-used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional details, Juliette! I have now implemented this change.

'STDIN' => ['fileName' => 'STDIN'],
'Regular file with file object' => [
'fileName' => 'test1.php',
'fileObject' => new stdClass(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should create a File object, not a stdClass object. As things are, this will break in PHPCS 4.0 when the type declarations are added (addFile(string $path, ?File $file=null)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I added a commit creating an instance of the File class. To do that, I had to instantiate both ConfigDouble and Ruleset. Let me know if you had a different approach in mind.

@rodrigoprimo
Copy link
Contributor Author

Thanks for your review, @jrfnl!

Only real question I have is with regards to the solution applied in the third commit versus the suggestion I left in #1113 (comment)

I've now started wondering if $numFiles shouldn't be set just once at the end of the __construct() method via a count($this->files) instead of doing the incrementing whenever a new file is being added... ? I'd expect that would solve the issue too.

While I can see that, as the addFile() method is public, this wouldn't solve anything for addFile() when not called from the constructor, I do imagine the above suggestion could simplify the code change for the constructor itself.
Was there a particular reason why you went with a different solution ?

I did consider your suggestion, but noticed that it did not apply to addFile(). What I failed to realize, is that I could apply it only to the __construct() method. I have now added a new commit simplifying how FileList::$numFiles is calculated in the constructor.

Besides that, I addressed all the other comments. Could you please take another look when you get a chance?

Also, before merging this PR, let me know if you prefer to squash the commits yourself or if you would like me to do it.

@rodrigoprimo
Copy link
Contributor Author

Rebased without changes to fix the failing remark test. I forgot to do it earlier for this PR.

@jrfnl
Copy link
Member

jrfnl commented Jun 30, 2025

@rodrigoprimo As I'm going to need to look at the complete thing again before merging this, feel free to reorganize the commits already when you make your final changes.

This is just an initial set of tests. It does not fully cover the
FileList::__construct() method.
This is just an initial set of tests. It does not fully cover the
FileList::addFile() method.
…List::$numFiles`

This commit fixes the `FileList::__construct()` and
`FileList::addFile()` methods to ensure that, when there is an attempt
to add the same file twice, the `FileList::$numFiles` variable is not
incremented.

The code was already handling duplicates correctly in the sense that
duplicated files were not added to `FileList::$files`. However,
`FileList::$numFiles` was being incorrectly incremented.

There is some duplicated logic in `FileList::__construct()` and
`FileList::addFile()`. I considered refactoring the duplicated code to a
private method before adding the check that is added in this commit. I
decided not to do it as there are some subtle differences between the
logic in the two methods.

`FileList::__construct()` always sets the value of an entry in the
`FileList::$files` to `null` and the key to whatever is returned by
`SplFileInfo::getPathname()`. While `FileList::addFile()` sets the value
of an entry in the `FileList::$files` to `null` or an object passed to
the method and the key to the path passed to the method.

I decided to leave this refactor to remove the duplication to the future
and focus this commit on fixing the issue with handling duplicated
files.
@rodrigoprimo rodrigoprimo force-pushed the fix-file-list-duplicate-files branch from 185e384 to 5165510 Compare June 30, 2025 18:26
@rodrigoprimo
Copy link
Contributor Author

@rodrigoprimo As I'm going to need to look at the complete thing again before merging this, feel free to reorganize the commits already when you make your final changes.

Done, this PR is now ready for a final review. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fatal error when inadvertently passing the same file twice and using parallel execution
2 participants