-
Notifications
You must be signed in to change notification settings - Fork 100
Bulk file check on slaves #241
Conversation
…est recording when master<->slave has large latency (coast-to-coast)
NodeList classes = entry.getKey(); | ||
List<String> sourceDirs = entry.getValue(); | ||
|
||
// Collect all filenames in coverage report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can actually make this lot more efficient
Phab already knows the files affected as part of the diff. In the same spirit as #151
We can just collect file names from whats modified and just check state on those files and ignore the rest of the files in the coverage report. That will mitigate any large memory concerns as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kageiit - if we check only files modified won't we miss files which were covered by new tests added into code base? Eg. if I simply add add new unit test which tests some real code but not changing that code itself - we will not even calculate line coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont need to calculate the coverage for the main file. We just need aggregates to report back which we can still do without looking if the file exists. Processing line coverage for every line will still be useless as we cannot show it in the ui
List<String> sourceDirs = entry.getValue(); | ||
|
||
// Collect all filenames in coverage report | ||
List<String> fileNames = new LinkedList<String>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by comment: from what I've read recently, either ArrayList
or ArrayDeque
is almost always better than LinkedList
; this is now checked by ErrorProne
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed
Map<String, String> detectedSourceRoots = new PathResolver(workspace, sourceDirs).choose(fileNames); | ||
|
||
// Loop over all files in the coverage report | ||
for (int i = 0; i < classes.getLength(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can replace this for loop with for(fileName: fileNames)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need xml nodes:
// Loop over all files in the coverage report
for (int i = 0; i < classes.getLength(); i++) {
Node classNode = classes.item(i);
String fileName = classNode.getAttributes().getNamedItem(NODE_FILENAME).getTextContent();
if (includeFileNames != null && !includeFileNames.contains(FilenameUtils.getName(fileName))) {
continue;
}
fileName = join(detectedSourceRoots.get(fileName), fileName);
SortedMap<Integer, Integer> hitCounts = internalCounts.get(fileName);
if (hitCounts == null) {
hitCounts = new TreeMap<Integer, Integer>();
}
NodeList children = classNode.getChildNodes();
Can you be more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im saying we are looping twice over the same things. There is opportunity to DRY something.
SInce we already know filenames we care about from previous loop, we dont have to repeat the same logic again
String detectedSourceRoot = new PathResolver(workspace, sourceDirs).choose(fileName); | ||
fileName = join(detectedSourceRoot, fileName); | ||
// Loop over all files which are needed for coverage report | ||
for (int i = 0; i < fileNames.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kageiit - now I'm not looping again through same loop although it requires us to collect childNodes for this loop to work
When phabricator jenkins plugin does coverage file parse it checks each covered file on slave which source directory contains source file this can be expensive if you have large code base or large amount if covered source directories or when you have large latency between master<->slave (on another coast), this actually happened with one of project where such "check" was taking 1.5h to completed.
Instead of checking individual files - we collect all needed files and make bulk request.
PathResolverChooseMultiCallable
is serialized by jenkins and sent to be executed on slave which does this efficiently and returns back result. This might cause extreme memory usage for extremely large code base if such case happens - bulk request will have to be chunked