New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax for assigning word-level descriptive attributes #24

klassenjm opened this Issue Jul 7, 2016 · 0 comments


None yet
1 participant

klassenjm commented Jul 7, 2016


  • Specify a syntax for adding named attributes to character level elements.
    • Attributes provide descriptive metadata about the marked content.

This syntax provides a general purpose method for extending the meta information contained within in a USFM text.

  • USFM 3.0 and subsequent USFM versions will define the official list of character markers which provide descriptive attributes, and the list of attribute names and value options encoded for each.
  • User defined attributes are allowed using a specific naming syntax.

A companion USX 3.0 proposal exists at: ubsicap/usx#18

General Syntax

Within a character marker span, an attributes list is separated from the text content by a vertical bar |. Attributes are listed as pairs of name and corresponding value using the syntax: attribute = "value". The attribute name is a single ASCII string. The value is wrapped in quotes.

\w gracious|lemma="grace"\w*

Default Attribute
When content is supplied in the position of a marker attribute, but without an explicit attribute name, the USFM specification defines a single default.

\w gracious|grace\w*

… where the default attribute for \w …\w* is defined as being "lemma". This allows a common use case for an attribute to be expressed with as little additional markup as possible.

Multiple Values
In cases where more than one value should be provided for an attribute key, the author should provide a comma separated list within the value string. See strong attribute example in #26 (Descriptive attributes for \w ...\w*).

\w gracious|strong="H01234,G05485"\w*

Backward compatibility

Any pre-existing markers which would specify attributes in USFM 3.0 may continue to be used “un-decorated” (without attributes). \w gracious\w* remains valid USFM content.

User Defined Attributes

Using the general syntax, attributes may be added to character markers within a text beyond the defined set for the latest USFM version (3.0 or later). These will not be considered strictly USFM compliant, and there is no assurance that they will be supported by compliant software tools or processes. Future versions of USFM may formally define additional attributes.

Any user defined attributes must begin with the prefix x-.

\w gracious|x-myattr="metadata"\w*
\w gracious|lemma="grace" x-myattr="metadata"\w*

User defined attributes can be added to any USFM character marker, even if it is not within the list of character markers officially providing descriptive attributes for the current version of the USFM specification.

3.0 Markers Officially Providing Descriptive Attributes

  • \w ...\w* - USFM 3.0 #26
  • \fig ...\fig* (harmonizing existing markup syntax with this attribute specification) - USFM 3.0 #27

@klassenjm klassenjm added this to the 3.0.rc1 milestone Jul 7, 2016

@klassenjm klassenjm added the editor label Jul 8, 2016

@klassenjm klassenjm added new attribute and removed marker labels Jul 12, 2016

@klassenjm klassenjm closed this Sep 9, 2016

@klassenjm klassenjm modified the milestones: 3.0.rc1, 3.0.0 Oct 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment