This document describes how we're currently translating the process of filling out a web form into YAML, a human-readable data serialization format.
This schema is now fairly concrete in that it's deployed to several production systems. To suggest a change, please file a ticket to (the issue tracker) and allow time for it to be discussed with implementors prior to making changes.
You can jump to the examples if you just want a quick reference.
The top level of a member's contact schema includes only two fields:
contact_form is a nested hash of the pertinent details of successfully
filling out and validating receipt of the member's contact form.
Contact form fields
The HTTP method used to submit the form in all caps, most likely
The URL the form should submit to. An empty string ('') can be used to represent
the URL the form is located at, but otherwise, this should be an absolute URL
http://github.com/unitedstates/congress-contact, rather than
/unitedstates/congress-contact, even if that is what appears in the form.
A list of the steps that make up a successful submission of the form. Steps are a subset of Capybara methods, one of:
visit: The act of navigating to a given url.
find: Locating a selector on the page, an indication that no further steps should be executed until the selector is present and visible.
fill_in: Entering text into a text
select: Choosing a value from a
selectlist. If this value isn't found, choose an option with the text matching the value of
valuein the YAML file. (This is so that we can choose options by text, since some forms do not include
check: Ticking a checkbox
uncheckThe opposite of
chooseTicking a specific item in a set of radio buttons.
click_onClicking a link or
button, most likely to submit a
wait: Experimental. Indicates that the specified time interval should pass before proceeding.
A basic description of what a successful HTTP response looks like. This is a
Any standard HTTP header can be expected here, but most implementations won't need this information.
statusis provided as an example.
status: The numeric http code the response should match, eg
contains: A plain string that should be present in the body. This is preferred over
matchesunless a more complex rule is necessary.
matches: A regular expression bounded by plain string delimiters ("") for portability. It's preferable to provide a pattern (if one is needed) that can be matched case-insensitively and on one line.
Failure, in general, can be assumed in the absence of success. But, there are some conditions where feedback can be provided to the 'user' to correct an error. Address validation and CAPTCHA failure are the two prime examples of scenarios in which a retry could be prudent.
retry should be specified as an array of retry conditions, including a
contains value, and a
resubmit array, containing:
An identifier of the failure type. Currently,
captchais the only specified value.
A css selector whose presence indicates this specific type of failure.
Text whose presence indicates this specific type of failure.
A list of field values (such as
$CAPTCHA) that need to be resubmitted. The expectation is that clients will redraw whatever interface is necessary using the definition of that value's field in the form YAML.
Types of steps
The value of a visit step is just a
selector of a find step is just a string CSS selector which should
be found on the page (and should be visible) before proceeding to execute more steps.
value of a find step is optional, and it may specify the markup contained
in the element that is required for this element to be found.
options attribute for the
find step may be specified as
x is an integer number of seconds. If the element is not found within
this number of seconds, the form fill will be abandoned and should return an error to the caller.
within_frame attribute is optional and consists of a string denoting the selector of an iframe on the page. If present, the find step will be executed in the context of the matching iframe.
The value of a fill_in step can be a single field, or a list of hashes defining a batch of fields to fill in at once, but should be defined as a list either way. Each hash describes a form field by a few attributes, many of which are common to most steps:
nameHTML attribute of the field to be filled out.
selector: It's expected that a specific CSS selector will be provided in addition to the
namefield, because it's possible that more than one field with the same name (
value: Either a string value to enter into the form, or a 'variable' placeholder, such as
required(ironically, optional): This field will be present if a field must be filled out with a value in order for the form to be valid.
options(fittingly, optional): This attributes meaning changes with
value. If the
valueis one of the following,
optionscan be specified accordingly:
allows_plus: false, depending on if the form allows a plus sign in the
max_length: This field will be present if a field has a maximum character length. This value should be a number. It's very useful where max length is only enforced server-side.
within_frame(optional): Consists of a string denoting the selector of an iframe on the page. If present, the fill_in step will be executed in the context of the matching iframe.
A note on CAPTCHAs
Contact forms may present a captcha challenge, which of course is difficult
to deal with in an automated fashion. CAPTCHAs should be handled as
fields with the variable
$CAPTCHA_SOLUTION as the value. These fields should
also describe a
captcha_selector key for retrieving the captcha image and
returning it to a solver of the implementer's choosing.
Google's new ReCAPTCHAs have special syntax. In this case, the
captcha_selctor should be the iframe containing the ReCAPTCHA. The
google_recaptcha option should be set to true as well. See here for an example.
These steps can also either list one or many hashes. It should be expected
that a single form can be filled out with many steps until a
is encountered, at which time the form should be submitted.
The attributes of these steps are the same as those of
fill_in, and should
be treated as such with the exception of
value. In a checkbox or radio
value describes the actual
value attribute of the checkbox
that should be checked/unchecked/chosen, in case several have the same
within_frame attribute is optional and consists of a string denoting the selector of an iframe on the page. If present, the check/uncheck/choose step will be executed in the context of the matching iframe.
Like the other input-related steps,
selects can list either one or many hashes.
Attributes are the same as
fill_in with the addition of
options, a list
of the possible options which can be selected. If the
value attributes of
the select's options are obscure abbreviations or otherwise non-human-readable,
the value of
options can be a hash where the key is the text that appears
in the select box when the option is selected, and the value is the option's
value attribute. In cases where the options are common across several
members' forms, a constant may be used as a placeholder. Available constants
are listed in constants.yaml in this repository.
Currently the only available constants are a list of the postal codes of the
50 US states plus DC, and the full list of states and territories. The constants
encountered in options lists comprise the keys in
constants.yaml so the resulting
constants hash can be indexed directly with them.
within_frame attribute is optional and consists of a string denoting the selector of an iframe on the page. If present, the select step will be executed in the context of the matching iframe.
A click_on step terminates the preceding list of input-related steps, by
submitting the web form. It is a list containing a hash with only two possible
selector is the CSS selector for finding the
button or link to click, and
value is the HTML value attribute if present,
both to disambiguate and for the benefit of clients which may be POSTing
directly instead of using a headless browser, though this is not recommended.
selector is the only attribute you must provide/should expect to be guaranteed.
within_frame attribute is optional and consists of a string denoting the selector of an iframe on the page. If present, the click_on step will be executed in the context of the matching iframe.
This step should be considered experimental and subject to change
This is not part of the capybara command set, but is in place at least
find at a later date. It indicates the integer number of seconds
that should be waited before performing the next action.
This step is not to be confused with the
wait option under
denotes the maximum time that should pass while waiting for an element to appear.
this instruction should only be used sparingly. It is better to mimic user
behavior as closely as possible, but if there is no way to proceed with normal
ux steps, this instruction may be used.
To put it all together, let's do a google search for our last name as an example. For some reason, this search requires that you fill out a captcha:
bioguide: #(well, it's google, so there isn't one) contact_form: method: GET action: http://google.com/search steps: - visit: http://google.com - fill_in: - name: q selector: "#gbqfq" value: $NAME_LAST required: Yes - name: recaptcha_response_field selector: "#recaptcha_response_field" captcha_selector: img captcha_id_selector: "#recaptcha_challenge_field" value: $CAPTCHA_SOLUTION required: Yes - click_on: - selector: "#gbqfba" success: body: contains: "results (" retry: - reason: captcha contains: "captcha was invalid" resubmit: - $CAPTCHA_SOLUTION
Here is a list of examples that may help you: