From fa36b9ea80e72f62492065917550a194306ab3b1 Mon Sep 17 00:00:00 2001 From: jugglinmike Date: Wed, 30 Oct 2019 19:49:14 -0400 Subject: [PATCH] [docs] Add guide "Making a Testing Plan" (#19628) --- docs/writing-tests/index.md | 8 +- docs/writing-tests/making-a-testing-plan.md | 535 ++++++++++++++++++++ 2 files changed, 541 insertions(+), 2 deletions(-) create mode 100644 docs/writing-tests/making-a-testing-plan.md diff --git a/docs/writing-tests/index.md b/docs/writing-tests/index.md index 88e279b0ad128d..0e0d3f35b00a20 100644 --- a/docs/writing-tests/index.md +++ b/docs/writing-tests/index.md @@ -1,7 +1,10 @@ # Writing Tests -If you haven't already, it's strongly recommended to read -the [introduction](../index) first, as it introduces the various test types. +So you'd like to write new tests for WPT? Great! For starters, we recommend +reading [the introduction](../index) to learn how the tests are organized and +interpreted. You might already have an idea about what needs testing, but it's +okay if you don't know where to begin. In either case, [the guide on making a +testing plan](making-a-testing-plan) will help you decide what to write. There's also a load of [general guidelines](general-guidelines) that apply to all tests. @@ -17,6 +20,7 @@ There's also a load of [general guidelines](general-guidelines) that apply to al file-names h2tests lint-tool + making-a-testing-plan manual reftest-tutorial reftests diff --git a/docs/writing-tests/making-a-testing-plan.md b/docs/writing-tests/making-a-testing-plan.md new file mode 100644 index 00000000000000..0f5673016c0502 --- /dev/null +++ b/docs/writing-tests/making-a-testing-plan.md @@ -0,0 +1,535 @@ +# Making a Testing Plan + +When contributing to a project as large and open-ended as WPT, it's easy to get +lost in the details. It can be helpful to start by making a rough list of tests +you intend to write. That plan will let you anticipate how much work will be +involved, and it will help you stay focused once you begin. + +Many people come to WPT with a general testing goal in mind: + +- specification authors often want to test for new spec text +- browser maintainers often want to test new features or fixes to existing + features +- web developers often want to test discrepancies between browsers on their web + applications + +(If you don't have any particular goal, we can help you get started. Check out +[the issues labeled with `type:missing-coverage` on +GitHub.com](https://github.com/web-platform-tests/wpt/labels/type%3Amissing-coverage). +Leave a comment if you'd like to get started with one, and don't hesitate to +ask clarifying questions!) + +This guide will help you write a testing plan by: + +1. showing you how to use the specifications to learn what kinds of tests will + be most helpful +2. developing your sense for what *doesn't* need to be tested +3. demonstrating methods for figuring out which tests (if any) have already + been written for WPT + +The level of detail in useful testing plans can vary widely. From [a list of +specific +cases](https://github.com/web-platform-tests/wpt/issues/6980#issue-252255894), +to [an outline of important coverage +areas](https://github.com/web-platform-tests/wpt/issues/18549#issuecomment-522631537), +to [an annotated version of the specification under +test](https://rwaldron.github.io/webrtc-pc/), the appropriate fidelity depends +on your needs, so you can be as precise as you feel is helpful. + +## Understanding the "testing surface" + +Web platform specifications are instructions about how a feature should work. +They're critical for implementers to "build the right thing," but they are also +important for anyone writing tests. We can use the same instructions to infer +what kinds of tests would be likely to detect mistakes. Here are a few common +patterns in specification text and the kind of tests they suggest. + +### Input sources + +Algorithms may accept input from many sources. Modifying the input is the most +direct way we can influence the browser's behavior and verify that it matches +the specifications. That's why it's helpful to be able to recognize different +sources of input. + +```eval_rst +================ ============================================================== +Type of feature Potential input sources +================ ============================================================== +JavaScript parameters, `context object `_ +HTML element content, attributes, attribute values +CSS selector strings, property values, markup +================ ============================================================== +``` + +Determine which input sources are relevant for your chosen feature, and build a +list of values which seem worthwhile to test (keep reading for advice on +identifying worthwhile values). For features that accept multiple sources of +input, remember that the interaction between values can often produce +interesting results. Every value you identify should go into your testing plan. + +*Example:* This is the first step of the `Notification` constructor from [the +Notifications standard](https://notifications.spec.whatwg.org/#constructors): + +> The Notification(title, options) constructor, when invoked, must run these steps: +> +> 1. If the [current global +> object](https://html.spec.whatwg.org/multipage/webappapis.html#current-global-object) +> is a +> [ServiceWorkerGlobalScope](https://w3c.github.io/ServiceWorker/#serviceworkerglobalscope) +> object, then [throw](https://heycam.github.io/webidl/#dfn-throw) a +> `TypeError` exception. +> 2. Let *notification* be the result of [creating a +> notification](https://notifications.spec.whatwg.org/#create-a-notification) +> given *title* and *options*. Rethrow any exceptions. +> +> [...] + +A thorough test suite for this constructor will include tests for the behavior +of many different values of the *title* parameter and the *options* parameter. +Choosing those values can be a challenge unto itself--see [Avoid Excessive +Breadth](#avoid-excessive-breadth) for advice. + +### Browser state + +The state of the browser may also influence algorithm behavior. Examples +include the current document, the dimensions of the viewport, and the entries +in the browsing history. Just like with direct input, a thorough set of tests +will likely need to control these values. Browser state is often more expensive +to manipulate (whether in terms of code, execution time, or system resources), +and you may want to design your tests to mitigate these costs (e.g. by writing +many subtests from the same state). + +You may not be able to control all relevant aspects of the browser's state. +[The `type:untestable` +label](https://github.com/web-platform-tests/wpt/issues?q=is%3Aopen+is%3Aissue+label%3Atype%3Auntestable) +includes issues for web platform features which cannot be controlled in a +cross-browser way. You should include tests like these in your plan both to +communicate your intention and to remind you when/if testing solutions become +available. + +*Example:* In [the `Notification` constructor referenced +above](https://notifications.spec.whatwg.org/#constructors), the type of "the +current global object" is also a form of input. The test suite should include +tests which execute with different types of global objects. + +### Branches + +When an algorithm branches based on some condition, that's an indication of an +interesting behavior that might be missed. Your testing plan should have at +least one test that verifies the behavior when the branch is taken and at least +one more test that verifies the behavior when the branch is *not* taken. + +*Example:* The following algorithm from [the HTML +standard](https://html.spec.whatwg.org/) describes how the +`localStorage.getItem` method works: + +> The `getItem`(*key*) method must return the current value associated with the +> given *key*. If the given *key* does not exist in the list associated with +> the object then this method must return null. + +This algorithm exhibits different behavior depending on whether or not an item +exists at the provided key. To test this thoroughly, we would write two tests: +one test would verify that `null` is returned when there is no item at the +provided key, and the other test would verify that an item we previously stored +was correctly retrieved when we called the method with its name. + +### Sequence + +Even without branching, the interplay between sequential algorithm steps can +suggest interesting test cases. If two steps have observable side-effects, then +it can be useful to verify they happen in the correct order. + +Most of the time, step sequence is implicit in the nature of the +algorithm--each step operates on the result of the step that precedes it, so +verifying the end result implicitly verifies the sequence of the steps. But +sometimes, the order of two steps isn't particularly relevant to the result of +the overall algorithm. This makes it easier for implementations to diverge. + +There are many common patterns where step sequence is observable but not +necessarily inherent to the correctness of the algorithm: + +- input validation (when an algorithm verifies that two or more input values + satisfy some criteria) +- event dispatch (when an algorithm + [fires](https://dom.spec.whatwg.org/#concept-event-fire) two or more events) +- object property access (when an algorithm retrieves two or more property + values from an object provided as input) + +*Example:* The following text is an abbreviated excerpt of the algorithm that +runs during drag operations (from [the HTML +specification](https://html.spec.whatwg.org/multipage/dnd.html#dnd)): + +> [...] +> 4. Otherwise, if the user ended the drag-and-drop operation (e.g. by +> releasing the mouse button in a mouse-driven drag-and-drop interface), or +> if the `drag` event was canceled, then this will be the last iteration. +> Run the following steps, then stop the drag-and-drop operation: +> 1. If the [current drag +> operation](https://html.spec.whatwg.org/multipage/dnd.html#current-drag-operation) +> is "`none`" (no drag operation) [...] Otherwise, the drag operation +> might be a success; run these substeps: +> 1. Let *dropped* be true. +> 2. If the [current target +> element](https://html.spec.whatwg.org/multipage/dnd.html#current-target-element) +> is a DOM element, [fire a DND +> event](https://html.spec.whatwg.org/multipage/dnd.html#fire-a-dnd-event) +> named `drop` at it; otherwise, use platform-specific conventions for +> indicating a drop. +> 3. [...] +> 2. [Fire a DND +> event](https://html.spec.whatwg.org/multipage/dnd.html#fire-a-dnd-event) +> named `dragend` at the [source +> node](https://html.spec.whatwg.org/multipage/dnd.html#source-node). +> 3. [...] + +A thorough test suite will verify that the `drop` event is fired as specified, +and it will also verify that the `dragend` event is fired as specified. An even +better test suite will also verify that the `drop` event is fired *before* the +`dragend` event. + +In September of 2019, [Chromium accidentally changed the ordering of the `drop` +and `dragend` +events](https://bugs.chromium.org/p/chromium/issues/detail?id=1005747), and as +a result, real web applications stopped functioning. If there had been a test +for the sequence of these events, then this confusion would have been avoided. + +When making your testing plan, be sure to look carefully for event dispatch and +the other patterns listed above. They won't always be as clear as the "drag" +example! + +### Optional behavior + +Specifications occasionally allow browsers discretion in how they implement +certain features. These are described using [RFC +2119](https://tools.ietf.org/html/rfc2119) terms like "MAY" and "OPTIONAL". +Although browsers should not be penalized for deciding not to implement such +behavior, WPT offers tests that verify the correctness of the browsers which +do. Be sure to [label the test as optional according to WPT's +conventions](file-names) so that people reviewing test results know how to +interpret failures. + +*Example:* The algorithm underpinning +[`document.getElementsByTagName`](https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementsByTagName) +includes the following paragraph: + +> When invoked with the same argument, and as long as *root*'s [node +> document](https://dom.spec.whatwg.org/#concept-node-document)'s +> [type](https://dom.spec.whatwg.org/#concept-document-type) has not changed, +> the same [HTMLCollection](https://dom.spec.whatwg.org/#htmlcollection) object +> may be returned as returned by an earlier call. + +That statement uses the word "may," so even though it modifies the behavior of +the preceding algorithm, it is strictly optional. The test we write for this +should be designated accordingly. + +It's important to read these sections carefully because the distinction between +"mandatory" behavior and "optional" behavior can be nuanced. In this case, the +optional behavior is never allowed if the document's type has changed. That +makes for a mandatory test, one that verifies browsers don't return the same +result when the document's type changes. + +## Exercising Restraint + +When writing conformance tests, choosing what *not* to test is sometimes just +as hard as finding what needs testing. + +### Don't dive too deep + +Algorithms are composed of many other algorithms which themselves are defined +in terms of still more algorithms. It can be intimidating to consider +exhaustively testing one of those "nested" algorithms, especially when they are +shared by many different APIs. + +In general, you should plan to write "surface tests" for the nested algorithms. +That means only verifying that they exhibit the basic behavior you are +expecting. + +It's definitely important to test exhaustively, but it's just as important to +do so in a structured way. Reach out to the test suite's maintainers to learn +if and how they have already tested those algorithms. In many cases, it's +acceptable to test them in just one place (and maybe through a different API +entirely), and rely only on surface-level testing everywhere else. While it's +always possible for more tests to uncover new bugs, the chances may be slim. +The time we spend writing tests is highly valuable, so we have to be efficient! + +*Example:* The following algorithm from [the DOM +standard](https://dom.spec.whatwg.org/) powers +[`document.querySelector`](https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector): + +> To **scope-match a selectors string** *selectors* against a *node*, run these +> steps: +> +> 1. Let *s* be the result of [parse a +> selector](https://drafts.csswg.org/selectors-4/#parse-a-selector) +> *selectors*. +> 2. If *s* is failure, then +> [throw](https://heycam.github.io/webidl/#dfn-throw) a +> "[`SyntaxError`](https://heycam.github.io/webidl/#syntaxerror)" +> [DOMException](https://heycam.github.io/webidl/#idl-DOMException). +> 3. Return the result of [match a selector against a +> tree](https://drafts.csswg.org/selectors-4/#match-a-selector-against-a-tree) +> with *s* and *node*'s +> [root](https://dom.spec.whatwg.org/#concept-tree-root) using [scoping +> root](https://drafts.csswg.org/selectors-4/#scoping-root) *node*. + +As described earlier in this guide, we'd certainly want to test the branch +regarding the parsing failure. However, there are many ways a string might fail +to parse--should we verify them all in the tests for `document.querySelector`? +What about `document.querySelectorAll`? Should we test them all there, too? + +The answers depend on the current state of the test suite: whether or not tests +for selector parsing exist and where they are located. That's why it's best to +confer with the people who are maintaining the tests. + +### Avoid excessive breadth + +When the set of input values is finite, it can be tempting to test them all +exhaustively. When the set is very large, test authors can reduce repetition by +defining tests programmatically in loops. + +Using advanced control flow techniques to dynamically generate tests can +actually *reduce* test quality. It may obscure the intent of the tests since +readers have to mentally "unwind" the iteration to determine what is actually +being verified. The practice is more susceptible to bugs. These bugs may not be +obvious--they may not cause failures, and they may exercise fewer cases than +intended. Finally, tests authored using this approach often take a relatively +long time to complete, and that puts a burden on people who collect test +results in large numbers. + +The severity of these drawbacks varies with the complexity of the generation +logic. For example, it would be pronounced in a test which conditionally made +different assertions within many nested loops. Conversely, the severity would +be low in a test which only iterated over a list of values in order to make the +same assertions about each. Recognizing when the benefits outweigh the risks +requires discretion, so once you understand them, you should use your best +judgement. + +*Example:* We can see this consideration in the very first step of the +`Response` constructor from [the Fetch +standard](https://fetch.spec.whatwg.org/) + +> The `Response`(*body*, *init*) constructor, when invoked, must run these +> steps: +> +> 1. If *init*["`status`"] is not in the range `200` to `599`, inclusive, then +> [throw](https://heycam.github.io/webidl/#dfn-throw) a `RangeError`. +> +> [...] + +This function accepts exactly 400 values for the "status." With [WPT's +testharness.js](./testharness), it's easy to dynamically create one test for +each value. Unless we have reason to believe that a browser may exhibit +drastically different behavior for any of those values (e.g. correctly +accepting `546` but incorrectly rejecting `547`), then the complexity of +testing those cases probably isn't warranted. + +Instead, focus on writing declarative tests for specific values which are novel +in the context of the algorithm. For ranges like in this example, testing the +boundaries is a good idea. `200` and `599` should not produce an error while +`199` and `600` should produce an error. Feel free to use what you know about +the feature to choose additional values. In this case, HTTP response status +codes are classified by the "hundred" order of magnitude, so we might also want +to test a "3xx" value and a "4xx" value. + +## Assessing coverage + +It's very likely that WPT already has some tests for the feature (or at least +the specification) that you're interesting in testing. In that case, you'll +have to learn what's already been done before starting to write new tests. +Understanding the design of existing tests will let you avoid duplicating +effort, and it will also help you integrate your work more logically. + +Even if the feature you're testing does *not* have any tests, you should still +keep these guidelines in mind. Sooner or later, someone else will want to +extend your work, so you ought to give them a good starting point! + +### File names + +The names of existing files and folders in the repository can help you find +tests that are relevant to your work. [This page on the design of +WPT](../test-suite-design) goes into detail about how files are generally laid +out in the repository. + +Generally speaking, every conformance tests is stored in a subdirectory +dedicated to the specification it verifies. The structure of these +subdirectories vary. Some organize tests in directories related to algorithms +or behaviors. Others have a more "flat" layout, where all tests are listed +together. + +Whatever the case, test authors try to choose names that communicate the +behavior under test, so you can use them to make an educated guess about where +your tests should go. + +*Example:* Imagine you wanted to write a test to verify that headers were made +immutable by the `Request.error` method defined in [the Fetch +standard](https://fetch.spec.whatwg.org). Here's the algorithm: + +> The static error() method, when invoked, must run these steps: +> +> 1. Let *r* be a new [Response](https://fetch.spec.whatwg.org/#response) +> object, whose +> [response](https://fetch.spec.whatwg.org/#concept-response-response) is a +> new [network error](https://fetch.spec.whatwg.org/#concept-network-error). +> 2. Set *r*'s [headers](https://fetch.spec.whatwg.org/#response-headers) to a +> new [Headers](https://fetch.spec.whatwg.org/#headers) object whose +> [guard](https://fetch.spec.whatwg.org/#concept-headers-guard) is +> "`immutable`". +> 3. Return *r*. + +In order to figure out where to write the test (and whether it's needed at +all), you can review the contents of the `fetch/` directory in WPT. Here's how +that looks on a UNIX-like command line: + + $ ls fetch + api/ data-urls/ range/ + content-encoding/ http-cache/ README.md + content-length/ images/ redirect-navigate/ + content-type/ metadata/ security/ + corb/ META.yml stale-while-revalidate/ + cors-rfc1918/ nosniff/ + cross-origin-resource-policy/ origin/ + +This test is for a behavior directly exposed through the API, so we should look +in the `api/` directory: + + $ ls fetch/api + abort/ cors/ headers/ policies/ request/ response/ + basic/ credentials/ idlharness.any.js redirect/ resources/ + +And since this is a static method on the `Response` constructor, we would +expect the test to belong in the `response/` directory: + + $ ls fetch/api/response + multi-globals/ response-static-error.html + response-cancel-stream.html response-static-redirect.html + response-clone.html response-stream-disturbed-1.html + response-consume-empty.html response-stream-disturbed-2.html + response-consume.html response-stream-disturbed-3.html + response-consume-stream.html response-stream-disturbed-4.html + response-error-from-stream.html response-stream-disturbed-5.html + response-error.html response-stream-disturbed-6.html + response-from-stream.any.js response-stream-with-broken-then.any.js + response-init-001.html response-trailer.html + response-init-002.html + +There seems to be a test file for the `error` method: +`response-static-error.html`. We can open that to decide if the behavior is +already covered. If not, then we know where to [write the +test](https://github.com/web-platform-tests/wpt/pull/19601)! + +### Failures on wpt.fyi + +There are many behaviors that are difficult to describe in a succinct file +name. That's commonly the case with low-level rendering details of CSS +specifications. Test authors may resort to generic number-based naming schemes +for their files, e.g. `feature-001.html`, `feature-002.html`, etc. This makes +it difficult to determine if a test case exists judging only by the names of +files. + +If the behavior you want to test is demonstrated by some browsers but not by +others, you may be able to use the *results* of the tests to locate the +relevant test. + +[wpt.fyi](https://wpt.fyi) is a website which publishes results of WPT in +various browsers. Because most browsers pass most tests, the pass/fail +characteristics of the behavior you're testing can help you filter through a +large number of highly similar tests. + +*Example:* Imagine you've found a bug in the way Safari renders the top CSS +border of HTML tables. By searching through directory names and file names, +you've determined the probable location for the test: the `css/CSS2/borders/` +directory. However, there are *three hundred* files that begin with +`border-top-`! None of the names mention the `` element, so any one of +the files may already be testing the case you found. + +Luckily, you also know that Firefox and Chrome do not exhibit this bug. You +could find such tests by visual inspection of the [wpt.fyi](https://wpt.fyi) +results overview, but [the website's "search" feature includes operators that +let you query for this information +directly](https://github.com/web-platform-tests/wpt.fyi/blob/master/api/query/README.md). +To find the tests which begin with `border-top-`, pass in Chrome, pass in +Firefox, and fail in Safari, you could write [`border-top- chrome:pass +firefox:pass +safari:fail](https://wpt.fyi/results/?label=master&label=experimental&aligned&q=border-top-%20safari%3Afail%20firefox%3Apass%20chrome%3Apass). +The results show only three such tests exist: + +- `border-top-applies-to-005.xht` +- `border-top-color-applies-to-005.xht` +- `border-top-width-applies-to-005.xht` + +These may not describe the behavior you're interested in testing; the only way +to know for sure is to review their contents. However, this is a much more +manageable set to work with! + +### Querying file contents + +Some web platform features are enabled with a predictable pattern. For example, +HTML attributes follow a fairly consistent format. If you're interested in +testing a feature like this, you may be able to learn where your tests belong +by querying the contents of the files in WPT. + +You may be able to perform such a search on the web. WPT is hosted on +GitHub.com, and [GitHub offers some basic functionality for querying +code](https://help.github.com/en/articles/about-searching-on-github). If your +search criteria are short and distinctive (e.g. all files containing +"querySelectorAll"), then this interface may be sufficient. However, more +complicated criteria may require [regular +expressions](https://www.regular-expressions.info/). For that, you can +[download the WPT +repository](https://web-platform-tests.org/writing-tests/github-intro.html) and +use [git](https://git-scm.com) to perform more powerful searches. + +The following table lists some common search criteria and examples of how they +can be expressed using regular expressions: + +```eval_rst +================================= ================== ========================== +Criteria Example match Example regular expression +================================= ================== ========================== +JavaScript identifier references ``obj.foo()`` ``\bfoo\b`` +JavaScript string literals ``x = "foo";`` ``(["'])foo\1`` +HTML tag names ```` ``|$)`` +HTML attributes ``
`` ``<[a-zA-Z][^>]*\sfoo(\s|>|=|$)`` +CSS property name ``style="foo: 4"`` ``([{;=\"']|\s|^)foo\s+:`` +================================= ================== ========================== +``` + +Bear in mind that searches like this are not necessarily exhaustive. Depending +on the feature, it may be difficult (or even impossible) to write a query that +correctly identifies all relevant tests. This strategy can give a helpful +guide, but the results may not be conclusive. + +*Example:* Imagine you're interested in testing how the `src` attribute of the +`iframe` element works with `javascript:` URLs. Judging only from the names of +directories, you've found a lot of potential locations for such a test. You +also know many tests use `javascript:` URLs without describing that in their +name. How can you find where to contribute new tests? + +You can design a regular expression that matches many cases where a +`javascript:` URL is assigned to the `src` property in HTML. You can use the +`git grep` command to query the contents of the `html/` directory: + + $ git grep -lE "src\s*=\s*[\"']?javascript:" html + html/browsers/browsing-the-web/navigating-across-documents/javascript-url-query-fragment-components.html + html/browsers/browsing-the-web/navigating-across-documents/javascript-url-return-value-handling.html + html/dom/documents/dom-tree-accessors/Document.currentScript.html + html/dom/self-origin.sub.html + html/editing/dnd/target-origin/114-manual.html + html/semantics/embedded-content/media-elements/track/track-element/cloneNode.html + html/semantics/scripting-1/the-script-element/execution-timing/040.html + html/semantics/scripting-1/the-script-element/execution-timing/080.html + html/semantics/scripting-1/the-script-element/execution-timing/108.html + html/semantics/scripting-1/the-script-element/execution-timing/109.html + html/webappapis/dynamic-markup-insertion/opening-the-input-stream/document-open-cancels-javascript-url-navigation.html + +You will still have to review the contents to know which are relevant for your +purposes (if any), but compared to the 5,000 files in the `html/` directory, +this list is far more approachable! + +## Writing the Tests + +With a complete testing plan in hand, you now have a good idea of the scope of +your work. It's finally time to write the tests! There's a lot to say about how +this is done technically. To learn more, check out [the WPT "reftest" +tutorial](./reftest-tutorial) and [the testharness.js +tutorial](./testharness-tutorial).