New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax for applying ruby annotations (CJK texts) (add \rb ...\rb*) #31

klassenjm opened this Issue Jul 14, 2016 · 0 comments


None yet
1 participant

klassenjm commented Jul 14, 2016

Updated: January 2018

In the course of implementing support for ruby editing in one application (Paratext), the specification and markup needs were clarified and refined. Note the following updated proposal. The original proposal has been retained at the end of this description.

Update notes

  • We do not need two markers after all (\rb and \rt), just one (\rb), plus a pipe (|) and sometimes some colons (:).
  • We need to remove \rt from the spec and make \rb non-optional.

Updated Proposal

  • Add a character marker pairs \rb ...\rb* to mark the base text being glossed.
  • Within the \rb ...\rb span, separate the base text from the ruby glosses using a vertical bar | (using syntax following word-level descriptive attributes #24)
  • Use of a colon : to separate multiple pieces within a phrase gloss.

For example: If the base text being glossed is a phrase of two Han characters (B), then the ruby gloss text (gg) may contain two elements, one for glossing each of the base text characters making up the phrase.

\rb BB|gg:gg\rb*

This syntax allows the decision to present glosses by phrase or by group to be made at the publication stage, rather than pre-determined during translation.

Supporting a null gloss: Allow parts of the gloss to be empty

In order to preserve the whole phrase unit (rather than breaking off just the characters that have glosses), USFM needs some way to specify a null gloss piece. Since the separator character (colon :) is visible, a visible character for null gloss is not strictly needed.

  • Allow any slot in the gloss string to be empty.
    • If the publication decision is to gloss by character, then skip the corresponding base character when aligning glosses above base characters (or gloss it using whitespace).

Examples of omission:

Second and fourth base characters are unglossed:

\rb BBBB|g1::g3:\rb*

Second base character is unglossed:

\rb 神の子|つく::ぬし\rb*

A companion USX 3.0 proposal exists at: ubsicap/usx#24

A result of this proposal is the corresponding proposal to deprecate the existing pronunciation marker \pro ...\pro*. See #32.



Han characters: Chinese, Japanese, and Korean texts have some characters that they share in common. In Japanese these are called Kanji (literally “Han characters”). There are several thousand of these characters to learn. For new readers or readers new to the Biblical texts it may be very difficult for them to recognize what Chinese or Japanese word corresponds to the Han character(s) they are seeing.

Ruby glosses: In order to help these readers, some Bibles are printed with glosses using small phonetic characters (e.g. Japanese uses the hiragana alphabet) placed above the more symbolic Han characters to tell the reader how to pronounce the character. These phonetic characters are generically called "ruby glosses" or "rubies". In Japanese this technique is called Furigana.

Note: These are character glosses regarding pronunciation, not linguistic glosses per se, though they do effectively indicate the word’s meaning.

Ruby for characters and phrases

A single piece of gloss data handles individual characters well, but not phrases. It requires that the typesetting decision to gloss phrases "by character" or "by group" be made too early, since we must choose between (a) identifying the phrase or (b) identifying word glosses (the pieces of the phrase gloss).

  • For maximum freedom in publishing we should allow the markup to simultaneously (a) identify the phrase and also (b) store the gloss in pieces.
  • Using a simple separator character is much more concise than nested markup.
  • Do not require empty markup to represent a non-glossed base character (a null gloss).
  • Using a separator instead of markers eliminates the need to add a marker later in order to support double-sided glossing.


  1. One Han character with a single ruby gloss.
  1. Two Han characters with a single ruby phrase gloss
\rb 話賄|はなはなし\rb*
  1. Phrase gloss broken down into individual pieces by adding colons between ruby characters
\rb 話賄|はな:はなし\rb*
  1. A character sequence which includes non-Han characters (hiragana), which are NOT glossed.
\rb 定ま|さだ:\rb*
  1. An un-glossed character occurring between glossed characters in the "phrase".
\rb 神の子|かみ::こ\rb*


  • Add character marker pairs \rb ...\rb* and \rt ...\rt* to support markup of ruby annotations added to CJK texts.
    • \rb ...\rb* is used to mark the base text - the text being annotated with ruby character(s).
    • \rt ...\rt* is used to mark the ruby text

Note: In cases where the annotation text is associated with only a single preceding ideogram, only the \rt ...\rt* marker is required (i.e. the base text markup is optional in these cases).


\rb 北\rb*\rt ㄅㄟˇ\rt* \rb 京\rb*\rt ㄐ丨ㄥ\rt*

\rb 東京\rb*\rt とうきょう\rt*

@klassenjm klassenjm added this to the 3.0.rc1 milestone Jul 14, 2016

klassenjm added a commit that referenced this issue Sep 7, 2016

@klassenjm klassenjm modified the milestones: 3.0.rc1, 3.0.rc2 Sep 9, 2016

@klassenjm klassenjm modified the milestones: 3.0.rc2, 3.0.0 Oct 27, 2017

@klassenjm klassenjm changed the title from Ruby annotations (CJK texts) (add \rb ...\rb*; add \rt ...\rt*) to Syntax for applying ruby annotations (CJK texts) (add \rb ...\rb*) Feb 14, 2018

@klassenjm klassenjm closed this Feb 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment