Skip to content
irwink edited this page Jun 16, 2015 · 1 revision

Metadata Checking

The metadata checking of the WPSS Validation Tool analyses metadata within HTML/XHTML and PDF documents. It checks HTML/XHTML documents for:

  • the presence of required metadata tags.
  • that non-white space content is provided where required.
  • the content value where the value set is known. For example, language values.
  • the content format where it must have a specific pattern. For example, date values.

The metadata check also checks PDF documents for the presence of required file properties.

Metadata Checking Profiles

The Validation Tool uses a profile to check for the required metadata tags or PDF file properties. The profile specifies the metadata tag name or PDF file property name, whether or not they need content and the content type. The metadata profiles used are:

  • PWGSC SWU – Standard on Web Usability metadata tags
  • TBS SWU – Treasury Board Standard on Web Usability metadata tags
  • Canada.ca – Canada.ca metadata tags
  • PWGSC – CLF 2.0 tags plus additional PWGSC tags
  • TBS CLF 2.0 – Treasury Board CLF 2.0 list of metadata tags
  • None – No metadata tags are required. The PDF file property profiles used are:
  • PWGSC –PWGSC properties (Title).
  • None – No properties required.

Required Metadata Tags

The PWGSC SWU metadata profile requires:

  • dcterms.description
  • dcterms.creator
  • dcterms.issued
  • dcterms.language
  • dcterms.modified
  • dcterms.subject
  • dcterms.title
  • description
  • keywords
  • title

The Treasury Board SWU metadata profile requires:

  • dcterms.creator
  • dcterms.issued
  • dcterms.language
  • dcterms.modified
  • dcterms.subject
  • dcterms.title
  • description
  • title

The Canada.ca metadata profile requires:

  • description
  • title

The PWGSC CLF 2.0 metadata profile requires:

  • dc.creator
  • dc.language
  • dc.publisher
  • dc.subject
  • dc.title
  • dcterms.issued
  • dcterms.modified
  • description
  • keywords
  • title

The Treasury Board CLF 2.0 (TBS CLF 2.0) metadata profile requires:

  • dc.creator
  • dc.language
  • dc.subject
  • dc.title
  • dcterms.issued
  • dcterms.modified
  • description
  • keywords
  • title

The PWGSC PDF file property set includes the “Title” property.

Required Metadata Content

While the metadata profiles are required, not all the items require content. Those items that do require content, those metadata tags and PDF file properties must contain meaningful text. The Validation Tool checks for the absence of required text, or if the text consists of white space only (spaces, tabs, etc). The set of metadata items that require content is specified in a profile supplied to the Validation Tool. The PWGSC SWU metadata items that require content are:

  • dcterms.description
  • dcterms.creator
  • dcterms.issued
  • dcterms.language
  • dcterms.modified
  • dcterms.subject
  • dcterms.title
  • description
  • keywords
  • title

The Treasury Board SWU metadata items that require content are:

  • dcterms.creator
  • dcterms.issued
  • dcterms.language
  • dcterms.modified
  • dcterms.subject
  • dcterms.title
  • description
  • title

The Canada.ca metadata items that require content are:

  • description
  • title

The PWGSC CLF 2.0 (TBS CLF 2.0) metadata items that require content are:

  • dc.creator
  • dc.language
  • dc.publisher
  • dc.subject
  • dc.title
  • dcterms.issued
  • dcterms.modified
  • description
  • keywords
  • title

The Treasury Board CLF 2.0 (TBS CLF 2.0) metadata items that require content are:

  • dc.creator
  • dc.language
  • dc.subject
  • dc.title
  • dcterms.issued
  • dcterms.modified
  • description
  • keywords
  • title

Metadata Content and Format

Some metadata tags must contain specific values or values in a specific format. The Validation Tool checks for a specific content formats, such as date values, or that content comes from a controlled vocabulary. It cannot check that the actual value is appropriate. Specific content formats and values are further specified below.

Date Values

The content of the dcterms.issued and dcterms.modified metadata items is a date value that must be expressed in YYYY-MM-DD format. The Validation Tool checks that the content conforms to this format and that the month is in the range 01 to 12 and that the date is in the range 01 to 31. For example: <meta name="dcterms.issued" scheme="W3CDTF" content="2005-11-21" /> The Validation Tool also checks that dates are consistent; dcterms.modified is not before dcterms.issued.

Language Code

The content of the dc.language metadata item is a three character language code. For example: <meta name="dc.language" scheme="ISO639" content="eng" />

Subject Value

The content of the dc.subject metadata item on PWGSC web pages is a list of terms that describe the topic of the web page. These terms must be selected from a controlled vocabulary, the Government of Canada Core Subject Thesaurus. The terms must be separated with a semicolon selected from the list of preferred terms, not the non-preferred terms. The Validation Tool verifies that the dc.subject terms are taken from the Thesaurus matching the language of the page. For example: <meta name="dc.subject" scheme="gccore" content="Newsletters, Information, Information sources" />

Email Address

The content of the pwgsc.contact.email item is a single email address. The Validation Tool verifies that the content is a well formed e-mail address. It does not verifies that it’s a valid address. For example: <meta name="pwgsc.contact.email" content="questions@pwgsc-tpsgc.gc.ca"/>

Scheme

The scheme attribute of some metadata tags must contain specific values. The TBS CLF 2.0 profile requires the following scheme value for the specified metadata tags: Tag Name Scheme Attribute Value dcterms.issued W3CDTF dcterms.modified W3CDTF dc.language ISO639-2/T dc.subject gccore The PWGSC SWU profile requires the following scheme value for the specified metadata tags: Tag Name Scheme Attribute Value dcterms.issued W3CDTF dcterms.modified W3CDTF dcterms.language ISO639-2 dcterms.subject gccore The Validation Tool reports invalid content errors for the metadata tags that do not have these scheme values.

PDF File Properties

The Validation Tool checks for the set of required PDF file properties specified by a profile. The PWGSC PDF File Properties profile contains the following item:

  • Title