Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAP 7 Version 2: Test Case Specification #55

Merged

Conversation

awwad
Copy link
Contributor

@awwad awwad commented Jul 27, 2017

Rewrite TAP 7 as a test data specification (instead of Tester application and wrapper specification) based on discussion occurring here: #36 (on version 1 of TAP 7) and #48 (on an intermediate version of TAP 7).

This is a response to concerns from several parties that it could be too hard to
integrate and use the previous TAP 7 design (version 1).

Rather than expecting implementers to write a few functions in a Python wrapper to connect a Python testing module to their updaters, we've defined a format in which we will make test data (metadata, targets, and info on how these were generated) available for use in TUF conformance testing.

We look forward to more feedback! @ecordell @heartsucker @hannesm @endophage (among others)

@heartsucker
Copy link
Contributor

heartsucker commented Jul 27, 2017

I prefer this to the previous version, and a lot of my concerns in #36 and #48 have been well addressed. However, this still isn't capable of addressing these example cases:

The targets metadata has an invalid signature from an authorized key.

This can't be expressed in the current metadata format, only using an incorrect key.

The snapshot metadata lists an incorrect hash for the targets metadata.

This would require an implementation to check the hashes provided in the files with a calculated hash to determine that the hash was bad. This is could be a source of errors and is much better expressed in the format I suggested with a flag like incorrect_hash: true.

The delegations object in the targets metadata contains a key with an incorrect key ID.

While I don't think we should transmit the key IDs at all (as noted in theupdateframework/rust-tuf#118), there also is not an easy way to express this. The implementer would have to do a lookup on the ID and see that the keys.json does contain such a key ID. They would infer that a bad key ID was intended. Again, this could be solved more easily and by an explicit bad_key_id: true in my proposed metadata format.

There is an additional deficiency (one that I think I neglected to address earlier) in that there needs to be some sort of TUF features tag. An implementation can still be TUF without doing multi-repo support, so tests that rely on such features would need to be marked so that an automated tool could skip them. Any optional feature would need to be flagged.

Overall, I still think this is an indirect way of going about the testing process, and it it will be simpler for implementers in all languages / formats to handle this when given a declarative DSL than converting existent metadata into the desired format.

@heartsucker
Copy link
Contributor

More thoughts. I actually think there shouldn't be the notion of a "control" for a test. There should just be independent tests. One case being "successfully updating a target given nested delegations" is the control case for some N tests that include incomplete path chains, wrong signatures, and a whole mess of other things.

I think there should be many positive test cases where updates succeed in various configurations, but I think that inherently linking them is more likely to be a cause of problems than solutions.

@awwad
Copy link
Contributor Author

awwad commented Jul 27, 2017

Thanks for the feedback, @heartsucker!

Controls for Test Cases

Good point. A particular test case needn't have its own control. The test cases should include a variety of "positive" cases with expected successes, in part so that it is less likely that a failure case failed for some unrelated reason not covered by another test. That doesn't require an association between failure cases and success cases, though, yes. I'll correct the language in a few places to remove this implication that a particular test needs control cases, and make sure it says instead that positive cases have to be included in the test cases for this purpose. (EDIT: Fixed now, I think.)

TUF Features Tag

I think that's probably a good suggestion. I was previously thinking about have separate sets of test cases for different TUF features (and before this iteration, about dynamic test data that adapted to your configuration for the Tester). Flagging test cases with required TUF features (like TAP 4 support) is probably a better idea, though we'll still have to make sure there's enough redundancy in the list of test cases so that clients with and without some configuration will still be tested fully enough. I'll discuss this before making the change, though.

Cases that this testing scheme can't address

You're correct that because the test data defined in this incarnation of TAP 7 doesn't use behavior flags (like invalidate-this-signature-after-you-make-it) some things can't be tested. My impulse regarding the first case you raise (invalid signature from the right key compared to signature from the wrong or some foreign key) is to consider it the sort of thing that integration testing for the implementation should catch (and not necessarily something that TUF conformance testing need concern itself with), but I'm uncertain about that. The second case you raise is more concerning from a TUF spec conformance perspective: listing the wrong hash for targets.json in snapshot.json would be the sort of thing that I would have liked to test the client's behavior in response to. We'll have to think about this a bit more. These are things that at least some incarnations of the original (Tester) proposal would have handled.

There is a reason for my hesitation to include things like instructions ("invalidate this signature") in this scheme; though they were part of the original Tester design ("instructions" argument to Wrapper functions in this section of v1 of the TAP), the more I thought about them, the more they seemed like a support and upkeep nightmare for a series of disconnected implementations. To the extent possible, I think we should avoid something like this. Now, if there prove to be critical conformance tests that we can't accomplish without something like how-to-reproduce instructions, then so be it, I guess.

KeyIDs

IIUC, this is an orthogonal issue: if we shift to using raw keys instead of key IDs (and maybe we should), that would be the form they'd take in keys.json as well.

Either way, there seems to be another issue in your comment on this point that I'm not sure I understand, regarding an inference about a bad key ID. The way I imagine tests working, you convert the metadata provided, and you do not try to determine what keys to use for what roles through any kind of deduction: in delegations, you delegate to the keys indicated in the body of the delegation, and when signing, you sign with the key indicated by keys.json. At no point do you have to know if the signature is intended to be the right one. (This assumes we don't end up going with a design that includes instruction flags; see previous section.)

@heartsucker
Copy link
Contributor

we'll still have to make sure there's enough redundancy in the list of test cases

My gut feeling is that there will be a lot of redundancy because we can't make any assumptions about the code logic of a given implementation. There might be short circuits that happen for some pieces of metadata but not others (intentionally or otherwise). To check that a given role does not have threshold set to 0, we will have to check that every role and 1 level of delegations cause errors, thus

  • root_threshold_zero_test
  • targets_threshold_zero_test
  • timestamp_threshold_zero_test
  • snapshot_threshold_zero_test
  • mirror_threshold_zero_test
  • delegation_threshold_zero_test

Realistically, those should only be 2 code paths (1 for top level, 1 for delegations), but we don't know that is how the code is written.

Likewise with a given feature, we'd have a multiplier like

  1. Do all tests
  2. Do all tests again with the map file
  3. Do all tests with map file and with mirrors
  4. Do all tests with just mirrors

But then again, this is automated, so we should be able to write code that generates the N x M x O ... test cases. In effect, the conformance test should be seen as something like a fuzzer (granted, one without infinitely variable inputs).

is to consider it the sort of thing that integration testing for the implementation should catch (and not necessarily something that TUF conformance testing need concern itself with)

My feeling is the conformance tester should be allowed to test anything that is in the spec. For example, the metadata might contain a size field for metadata, but never check that the received metadata is the right size. Not doing so means the implementation isn't doing TUF correctly, and we might as well make that obvious. The conformance test is an integration test that checks the TUF-related behavior of the implementation is correct.

they seemed like a support and upkeep nightmare for a series of disconnected implementations

I don't follow how these would need support. The conformance tests will apply only to the parts of the spec that apply to every implementation. Because of this, something like "invalidate the hash" or "extend the target" will always apply.

IIUC, this is an orthogonal issue: if we shift to using raw keys instead of key IDs (and maybe we should), that would be the form they'd take in keys.json as well.

Are you saying that the public part of the key is the key's ID? I'm not sure I follow here.

Either way, there seems to be another issue in your comment on this point that I'm not sure I understand, regarding an inference about a bad key ID.

Right now the spec says "transmit a map of keyId -> pubKey and then check that the calculated keyId matches the transmitted keyId." I think this shouldn't be in the spec as a MUST. It should say "if key IDs are transmitted as keys to a map of public keys, then clients MUST check that they match," but I think it's better to just transmit a list of public keys and derive the key ID from it. Anyway. If we assume that this is a MUST in the spec, then it should be tested because failure to match key IDs to public keys is a vulnerability and therefore must be checked.

This assumes we don't end up going with a design that includes instruction flags; see previous section.

I think we need those instruction flags to provide some sort of checks to get some of the other behavior correct anyway. Having someone translate metadata and then still have to apply modifications based off another set of metadata seems like it would be error prone. I really do think that providing directives is the way to go.

With all of this said, I am less convinced that TAP 7 is feasible or useful and that the only way to say that "this impl is TUF compliant" is through a 3rd party audit of the code. This comes back to the bigger problem in web programming of saying "is this webapp secure?" Sure, you could point some cracked-out version of metasploit at it and fire away to see if something breaks, but the real metric you care about is if some expert pentester can break it or not. That's the real world use case we care about with TUF: some elite hackers want to pwn a bunch of people's laptops, and they have $1m to spend and months to break it.

Since it seems like we're already conceding that some number of edge cases can't be tested at all given the current proposition, I say we throw it away. The edge cases really are what we care about. TUF is complex, and the simple parts are easy to get correct. I'd expect a given impl to have already written tests for the obvious simple things as I suspect it is unlikely someone could write a TUF lib that works under ideal conditions without at least a few tests.

@JustinCappos
Copy link
Member

I think part of what there is a disagreement about here is the goal of TAP 7. It is not meant to say that anything which passes the tests must be TUF compliant. It is instead meant to give implementers an easier way to test TUF than to write all of their own tests from scratch.

With this in mind, @heartsucker do you see the value in TAP 7?

@heartsucker
Copy link
Contributor

If this is really just "a useful way to test," then I suppose it could be merged in. However, if it's non-mandatory, then I feel like it shouldn't be in the spec itself, but instead in some ancillary document. I do think it's valuable to have a list of possible edge cases to test to help people write tests that get enough coverage.

@JustinCappos
Copy link
Member

JustinCappos commented Aug 2, 2017 via email

@awwad awwad mentioned this pull request Aug 22, 2017
@vladimir-v-diaz vladimir-v-diaz merged commit 46d98fb into theupdateframework:master Aug 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants