-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Fix: malformed brda records caused bad branching coverage #26342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
When does this actually occur?
So before the coverage merging runs, a single test has more than one line item for a given branch? What is creating that information and why? |
I have been unable to pinpoint why this happens in the first place. But here are the conditions I can use to reproduce it consistently:
We use exclusively the bazel built-in The submitted fix aims at solving this problem for the final merging of the .dat files. I am spending the day today trying to find how it occurs in the first place to see if I can fix it there too, but I have limited time to dedicate, so I hope this fix can be submitted independently of the upstream issue. Especially, given that one can use flags to replace the coverage mergers, I figure this needs to be fixed both for individual tests and for the final merging. |
I'll underline that the line 131 example above is taken from actual .dat files generated by Bazel and are not hypothetical. |
There is an issue where two different test targets instrument the same source file, but there's a mismatch in the branch data for some lines (the example I have in that issue is due to C++ template instantiation, but I believe there are other ways this can happen). But from what I understand, that's not the root of your problem. I would not expect to see duplicated "line, block, branch" within a single report. My understanding is that, in your case, a single test target is producing multiple branch data for a single "(line, branch, block)" combination? This of course would the trip up merging as per #12142. In which case, you should be able to identify a single test that produces an anomalous coverage report.
Can you run And which compiler version are you using? There are some differences between GCC versions, and Clang's coverage also works differently. |
While I can't share the source code, here is the (lightly) obfuscated coverage code for a single test (unmerged & copied straight out of the .dat):
We are using Since the data can get malformed due to upstreams problems (whether it is from the compiler, or the issue you mention), I think a bit of robustification is in order. Multiple BRDA records with the same unique id (line, block and branch number) should not be allowed. Especially since one cannot guarantee the upstream coverage generation. |
I believe I understand why this is happening. You have the same function listed three times in your report:
One is defined with I have actually encountered this before; I just forgot about this particular case. It is also, unfortunately, fundamentally intractable, being the same as the problem described in #12142. Your patch may solve your particular case, but it does not solve the general problem. I have a reproduction of the issue here: https://github.com/c-mita/bazel_issues/tree/master/lcov_issues/branch_merge
Executing a test with
The instantiation with an It also isn't immediately obvious to me which of those branches align with the ones in the "simple" case. So simply merging like this is not necessarily "correct". All that said, while there may not be a "correct" solution, there are varying degrees of "wrong" and I think Bazel should be "less wrong". At the very least, report merging should be a commutative and associative operation (the order in which merge reports should not matter) and this is currently not the case. I need to think and dig a little more before committing to an answer though. |
Glad to see that you have a good theory on what could cause this. As you point out, while this change "fixes" it for us, it might not fix it in all cases. This is a natural outcome of the fact that you cannot guarantee correct input data. However, to merge and keep records where the same (in lcov language) "unique" branch is reported multiple time is akin to making a hash map with duplicates of the same hash key pointing to different values. It is semantically and functionally broken. If our aim is to make a solution that guarantees branch merging to be correct in all cases, we need to work upstream: rewriting tests, code, setting compiler flags and making the organisation doing these things highly disciplined. You and I will not be able to solve this through code changes to Bazel. We can however do some mitigation. Here are some ideas:
Commutativeness |
Another note, @c-mita, when running coverage on the same files using cmake, gcov and gcovr, the merging works fine and we get the individual merged records, instead of duplicates like with bazels merging.
So while you and I agree that this pull request is not necessarily a "fix", given that the problem lies with the input data being merged. From the perspective of volvo developers, the bazel coverage solution is "incorrect" compared to their original experience. Leaving this unchanged should not be an option, and I hope we can find a way forward here. |
I have a change internally that simplifies the merging logic and, as a side effect, solves this problem too (it's very similar looking code). It shouldn't take too long to get submitted. |
Alrighty! I was looking forward to become a contributor to the bazel source, but the most important thing is that this is solved. Looking forward to seeing your change. |
@c-mita, how long do you think it will take until you get your changes submitted and merged? |
Coverage report merging should be consistent, no matter the order in which reports are merged (the merge operation should be commutative and associative). Dropping branches when the input reports disagree on the number of branches violated this (we used to drop the RHS, so the operation didn't commute). The new merging strategy is simply to combine the execution values for the same "(block, branch)" combination. If a combination doesn't exist for one report, it's simply added to the merged result. This may not still be "correct" for when the number of branches differs between reports, but it's arguably "less wrong" since it's at least now consistent. Moving this logic to the "addBranch" method on SourceFileCoverage also neatly handles the case where the same "block, branch" pair occurs more than once in a single coverage report for a file. This can occur legitimately when C++ templates are instantiated several times (see #26342 (comment) and #12142). To simplify the merging logic and avoid having to handle "BA" and "BRDA" style branches differently, we ensure that BranchCoverage always has a branch identifier set. This will just be a simple incrementing counter when parsing BA lines. A new test to ensure BA lines are correctly printed out has been added to verify the change to BranchCoverage doesn't break anything in that regard. Fixes #12142 PiperOrigin-RevId: 775577636 Change-Id: I1c23c89ca9c4477aeee7f4a7cca70612248c8a1c
4753fa0 should resolve the issue. I'll need to create a new release of the coverage tool for it to take effect normally. There's a bit more cleaning up I'd like to do, as well as some long standing performance issues. I'm not sure how much I'll get round to, but either way I should make a new release of the tool within the week. |
Looks very good. I really like replacing the merging function with just adding the branches per file. Makes it much simpler. A suggestion that could help improve your change even more: lines are currently used throughout the code as identifiers for branches. This is incorrect, the combination of line, branch and block is what makes a branch unique. Either one can remove using "line" entirely as input to the branch related functions (and thus simplify the code) or it should be replaces with a tuple or a hash of the id's (to make it correct). When you've uploaded the new version, could you pass me the hash and url, so I can point our internal Bazel to it while waiting for it to be included in the next release. |
You're suggesting changing the mapping from I can see some merit in that; I don't think we ever actually need to query branch data by line; we just need to walk the mapping in natural key order when printing out or merging. |
Yup, and this way bazels coverage management would be consistent with the way coverage tools treat branch coverage as well. |
@c-mita have you had a chance to publish the fixed coverageOutputRunner on the google mirrors yet? Also, will this be included in the next release of Bazel? |
While working on a large bazel migration project in Volvo Cars, I noticed that for merged coverage reports (i.e. `bazel coverage target1 target2 ... targetN) bazel would generate different branch coverage on each run. It would also be very inflated compared to running the same in our pre-bazel environment. This essentially made it unusable.
After much detective work it turns out that individual tests in bazel can create malformed BRDA records where the same unique branch (i.e. the same, line, block and branch) have multiple values. When this happens the coverageOutputGenerator simply adds all the incorrect records to the merged file. In the merging logic there are some checks that sometimes drops individual records based on the order in which the BRDA records have been merged. Given that bazel does not guarantee that order, this then leads to each run having different branch coverage.
E.g.
Bazel has generated this (counted as 6 branches)
BRDA:131,0,0,1
BRDA:131,0,1,1
BRDA:131,0,0,1
BRDA:131,0,1,0
BRDA:131,0,0,3
BRDA:131,0,1,1
Which should've been only this (counted as 2 branches):
BRDA:131,0,1,2
BRDA:131,0,0,5
This fix ensures that each branch is unique for a particular source file. Duplicates are now merged, ensuring that the tally is correct. I have verified that this works consistently on our large code base and added some regression tests to avoid this bug in the future.