-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TAP 7 Version 2: Test Case Specification #55
TAP 7 Version 2: Test Case Specification #55
Conversation
based on discussion occuring here: theupdateframework#36 (concerns from several parties that it could be too hard to integrate and use the existing design (Version 1) Version has been increased to 2.
also update post-history to be more explicit
I prefer this to the previous version, and a lot of my concerns in #36 and #48 have been well addressed. However, this still isn't capable of addressing these example cases:
This can't be expressed in the current metadata format, only using an incorrect key.
This would require an implementation to check the hashes provided in the files with a calculated hash to determine that the hash was bad. This is could be a source of errors and is much better expressed in the format I suggested with a flag like
While I don't think we should transmit the key IDs at all (as noted in theupdateframework/rust-tuf#118), there also is not an easy way to express this. The implementer would have to do a lookup on the ID and see that the There is an additional deficiency (one that I think I neglected to address earlier) in that there needs to be some sort of TUF features tag. An implementation can still be TUF without doing multi-repo support, so tests that rely on such features would need to be marked so that an automated tool could skip them. Any optional feature would need to be flagged. Overall, I still think this is an indirect way of going about the testing process, and it it will be simpler for implementers in all languages / formats to handle this when given a declarative DSL than converting existent metadata into the desired format. |
More thoughts. I actually think there shouldn't be the notion of a "control" for a test. There should just be independent tests. One case being "successfully updating a target given nested delegations" is the control case for some N tests that include incomplete path chains, wrong signatures, and a whole mess of other things. I think there should be many positive test cases where updates succeed in various configurations, but I think that inherently linking them is more likely to be a cause of problems than solutions. |
Thanks for the feedback, @heartsucker! Controls for Test CasesGood point. A particular test case needn't have its own control. The test cases should include a variety of "positive" cases with expected successes, in part so that it is less likely that a failure case failed for some unrelated reason not covered by another test. That doesn't require an association between failure cases and success cases, though, yes. I'll correct the language in a few places to remove this implication that a particular test needs control cases, and make sure it says instead that positive cases have to be included in the test cases for this purpose. (EDIT: Fixed now, I think.) TUF Features TagI think that's probably a good suggestion. I was previously thinking about have separate sets of test cases for different TUF features (and before this iteration, about dynamic test data that adapted to your configuration for the Tester). Flagging test cases with required TUF features (like TAP 4 support) is probably a better idea, though we'll still have to make sure there's enough redundancy in the list of test cases so that clients with and without some configuration will still be tested fully enough. I'll discuss this before making the change, though. Cases that this testing scheme can't addressYou're correct that because the test data defined in this incarnation of TAP 7 doesn't use behavior flags (like invalidate-this-signature-after-you-make-it) some things can't be tested. My impulse regarding the first case you raise (invalid signature from the right key compared to signature from the wrong or some foreign key) is to consider it the sort of thing that integration testing for the implementation should catch (and not necessarily something that TUF conformance testing need concern itself with), but I'm uncertain about that. The second case you raise is more concerning from a TUF spec conformance perspective: listing the wrong hash for targets.json in snapshot.json would be the sort of thing that I would have liked to test the client's behavior in response to. We'll have to think about this a bit more. These are things that at least some incarnations of the original (Tester) proposal would have handled. There is a reason for my hesitation to include things like instructions ("invalidate this signature") in this scheme; though they were part of the original Tester design ("instructions" argument to Wrapper functions in this section of v1 of the TAP), the more I thought about them, the more they seemed like a support and upkeep nightmare for a series of disconnected implementations. To the extent possible, I think we should avoid something like this. Now, if there prove to be critical conformance tests that we can't accomplish without something like how-to-reproduce instructions, then so be it, I guess. KeyIDsIIUC, this is an orthogonal issue: if we shift to using raw keys instead of key IDs (and maybe we should), that would be the form they'd take in keys.json as well. Either way, there seems to be another issue in your comment on this point that I'm not sure I understand, regarding an inference about a bad key ID. The way I imagine tests working, you convert the metadata provided, and you do not try to determine what keys to use for what roles through any kind of deduction: in delegations, you delegate to the keys indicated in the body of the delegation, and when signing, you sign with the key indicated by keys.json. At no point do you have to know if the signature is intended to be the right one. (This assumes we don't end up going with a design that includes instruction flags; see previous section.) |
My gut feeling is that there will be a lot of redundancy because we can't make any assumptions about the code logic of a given implementation. There might be short circuits that happen for some pieces of metadata but not others (intentionally or otherwise). To check that a given role does not have threshold set to 0, we will have to check that every role and 1 level of delegations cause errors, thus
Realistically, those should only be 2 code paths (1 for top level, 1 for delegations), but we don't know that is how the code is written. Likewise with a given feature, we'd have a multiplier like
But then again, this is automated, so we should be able to write code that generates the
My feeling is the conformance tester should be allowed to test anything that is in the spec. For example, the metadata might contain a size field for metadata, but never check that the received metadata is the right size. Not doing so means the implementation isn't doing TUF correctly, and we might as well make that obvious. The conformance test is an integration test that checks the TUF-related behavior of the implementation is correct.
I don't follow how these would need support. The conformance tests will apply only to the parts of the spec that apply to every implementation. Because of this, something like "invalidate the hash" or "extend the target" will always apply.
Are you saying that the public part of the key is the key's ID? I'm not sure I follow here.
Right now the spec says "transmit a map of
I think we need those instruction flags to provide some sort of checks to get some of the other behavior correct anyway. Having someone translate metadata and then still have to apply modifications based off another set of metadata seems like it would be error prone. I really do think that providing directives is the way to go. With all of this said, I am less convinced that TAP 7 is feasible or useful and that the only way to say that "this impl is TUF compliant" is through a 3rd party audit of the code. This comes back to the bigger problem in web programming of saying "is this webapp secure?" Sure, you could point some cracked-out version of metasploit at it and fire away to see if something breaks, but the real metric you care about is if some expert pentester can break it or not. That's the real world use case we care about with TUF: some elite hackers want to pwn a bunch of people's laptops, and they have $1m to spend and months to break it. Since it seems like we're already conceding that some number of edge cases can't be tested at all given the current proposition, I say we throw it away. The edge cases really are what we care about. TUF is complex, and the simple parts are easy to get correct. I'd expect a given impl to have already written tests for the obvious simple things as I suspect it is unlikely someone could write a TUF lib that works under ideal conditions without at least a few tests. |
I think part of what there is a disagreement about here is the goal of TAP 7. It is not meant to say that anything which passes the tests must be TUF compliant. It is instead meant to give implementers an easier way to test TUF than to write all of their own tests from scratch. With this in mind, @heartsucker do you see the value in TAP 7? |
If this is really just "a useful way to test," then I suppose it could be merged in. However, if it's non-mandatory, then I feel like it shouldn't be in the spec itself, but instead in some ancillary document. I do think it's valuable to have a list of possible edge cases to test to help people write tests that get enough coverage. |
I see, so instead of being a TAP, you'd just like us to have it as a
less-formal facility for testers. I don't have a problem with this.
Is anyone opposed to rejecting this as a TAP, but implementing it
nonetheless?
…On Wed, Aug 2, 2017 at 5:02 AM, heartsucker ***@***.***> wrote:
If this is really just "a useful way to test," then I suppose it could be
merged in. However, if it's non-mandatory, then I feel like it shouldn't be
in the spec itself, but instead in some ancillary document. I do think it's
valuable to have a list of possible edge cases to test to help people write
tests that get enough coverage.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA0XD9oSB_V6GMS3RDhxJfy2yknI4FCnks5sUDsdgaJpZM4OlZaw>
.
|
Rewrite TAP 7 as a test data specification (instead of Tester application and wrapper specification) based on discussion occurring here: #36 (on version 1 of TAP 7) and #48 (on an intermediate version of TAP 7).
This is a response to concerns from several parties that it could be too hard to
integrate and use the previous TAP 7 design (version 1).
Rather than expecting implementers to write a few functions in a Python wrapper to connect a Python testing module to their updaters, we've defined a format in which we will make test data (metadata, targets, and info on how these were generated) available for use in TUF conformance testing.
We look forward to more feedback! @ecordell @heartsucker @hannesm @endophage (among others)