Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

Conversation

artms
Copy link
Contributor

@artms artms commented Nov 3, 2017

When phabricator jenkins plugin does coverage file parse it checks each covered file on slave which source directory contains source file this can be expensive if you have large code base or large amount if covered source directories or when you have large latency between master<->slave (on another coast), this actually happened with one of project where such "check" was taking 1.5h to completed.
Instead of checking individual files - we collect all needed files and make bulk request. PathResolverChooseMultiCallable is serialized by jenkins and sent to be executed on slave which does this efficiently and returns back result. This might cause extreme memory usage for extremely large code base if such case happens - bulk request will have to be chunked

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 90.799% when pulling ad43cd2 on artms:bulk_file_check into 6cfd526 on uber:master.

NodeList classes = entry.getKey();
List<String> sourceDirs = entry.getValue();

// Collect all filenames in coverage report
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually make this lot more efficient

Phab already knows the files affected as part of the diff. In the same spirit as #151

We can just collect file names from whats modified and just check state on those files and ignore the rest of the files in the coverage report. That will mitigate any large memory concerns as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kageiit - if we check only files modified won't we miss files which were covered by new tests added into code base? Eg. if I simply add add new unit test which tests some real code but not changing that code itself - we will not even calculate line coverage?

Copy link
Contributor

@kageiit kageiit Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont need to calculate the coverage for the main file. We just need aggregates to report back which we can still do without looking if the file exists. Processing line coverage for every line will still be useless as we cannot show it in the ui

List<String> sourceDirs = entry.getValue();

// Collect all filenames in coverage report
List<String> fileNames = new LinkedList<String>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by comment: from what I've read recently, either ArrayList or ArrayDeque is almost always better than LinkedList; this is now checked by ErrorProne

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 90.799% when pulling 8fc0ff7 on artms:bulk_file_check into 6cfd526 on uber:master.

Map<String, String> detectedSourceRoots = new PathResolver(workspace, sourceDirs).choose(fileNames);

// Loop over all files in the coverage report
for (int i = 0; i < classes.getLength(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can replace this for loop with for(fileName: fileNames)

Copy link
Contributor Author

@artms artms Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need xml nodes:

            // Loop over all files in the coverage report
            for (int i = 0; i < classes.getLength(); i++) {
                Node classNode = classes.item(i);
                String fileName = classNode.getAttributes().getNamedItem(NODE_FILENAME).getTextContent();

                if (includeFileNames != null && !includeFileNames.contains(FilenameUtils.getName(fileName))) {
                    continue;
                }

                fileName = join(detectedSourceRoots.get(fileName), fileName);

                SortedMap<Integer, Integer> hitCounts = internalCounts.get(fileName);
                if (hitCounts == null) {
                    hitCounts = new TreeMap<Integer, Integer>();
                }

                NodeList children = classNode.getChildNodes();

Can you be more specific?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im saying we are looping twice over the same things. There is opportunity to DRY something.

SInce we already know filenames we care about from previous loop, we dont have to repeat the same logic again

String detectedSourceRoot = new PathResolver(workspace, sourceDirs).choose(fileName);
fileName = join(detectedSourceRoot, fileName);
// Loop over all files which are needed for coverage report
for (int i = 0; i < fileNames.size(); i++) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kageiit - now I'm not looping again through same loop although it requires us to collect childNodes for this loop to work

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 90.793% when pulling 8a2362c on artms:bulk_file_check into 6cfd526 on uber:master.

@kageiit kageiit merged commit 5f39fa8 into uber-archive:master Nov 10, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants