Skip to content

Alignment Data Structure: USFM 3

Jesse Griffin edited this page Jan 29, 2018 · 5 revisions

Overview

Determining the desired data structure for alignment data includes the obvious data to be stored and used as well as the not so obvious long term planning of storing and managing of the new data. We need to plan how this data will fit in our existing uW eco-system.

Case for CSV

While CSV is easier to edit by hand, it will require extra measures to ensure it is in sync with the target language USFM project file. Once the implementation is complete editing by hand will not be necessary and negate advantages.

Pros

  • CSV is easier to edit by hand
  • USFM target files stay lightweight and easier to edit

Cons

  • Managing synchronization may cause more issues than it solves
  • Added complexity of planning how to store and publish the alignment data
  • Needs additional planning that could add months to implementation

Case for USFM 3

If USFM 3 is used to store alignment data in the target text USFM markup, then no extra data is required. It is more difficult to edit by hand. Once implementation is complete editing by hand will no longer be necessary and http://ubsicap.github.io/usfm/characters/index.html?highlight=srclocnegate the disadvantage.

Pros

  • Use development time from CSV to improve existing USFM support
  • Synchronization is much easier
  • No extra files to manage and publish
  • Use existing libraries for validation, etc...

Cons

  • USFM markup is more complex
  • USFM is now harder to edit by hand

Data

Essential

This information is required for alignment data and will cause issues if it is missing.

  • Source
    • Reference
    • Word/Phrase
    • Occurrence/Occurrences
  • Target
    • Reference
    • Word/Phrase
    • Occurrence/Occurrences

Supplemental

This information is nice so that lookups to the source text do not have to be done.

  • Source
    • Strongs Number
    • Morphology

CSV Examples

Essential Data

book, c:v, source_phrase, source_occurrence, target_phrase, target_occurrence
mat, 1:1, "βίβλος", 1/1, "the book of", 1/1
mat, 1:1, "γενέσεως", 1/1, "the genealogy of", 1/1
mat, 1:1, "ἰησοῦ", 1/1, "jesus", 1/1
mat, 1:1, "χριστοῦ", 1/1, "christ", 1/1
mat, 1:1, "υἱοῦ", 1/2, "son of", 1/2
mat, 1:1, "δαυεὶδ", 1/1, "david", 1/1
mat, 1:1, "υἱοῦ", 2/2, "son of", 2/2
mat, 1:1, "ἀβραάμ", 1/1, "abraham", 1/1

Supplemental Data

book, c:v, source_phrase, strong, source_morph, source_occurrence, target_phrase, target_occurrence
mat, 1:1, "βίβλος", G9760, "N./....NFS", 1/1, "the book of", 1/1
mat, 1:1, "γενέσεως", G10780, "N./....GFS", 1/1, "the genealogy of", 1/1
mat, 1:1, "ἰησοῦ", G24240, "N./....GMS", 1/1, "jesus", 1/1
mat, 1:1, "χριστοῦ", G55470, "N./....GMS", 1/1, "christ", 1/1
mat, 1:1, "υἱοῦ", G52070, "N./....GMS", 1/2, "son of", 1/2
mat, 1:1, "δαυεὶδ", G11380, "N./....GMS", 1/1, "david", 1/1
mat, 1:1, "υἱοῦ", G52070, "N./....GMS", 2/2, "son of", 2/2
mat, 1:1, "ἀβραάμ", G110, "N./....GMS", 1/1, "abraham", 1/1

USFM 3 Example

Notes

  • Not sure if a phrase can be used in the \w tag
  • If phrase cannot be used, example can be produced with each target word

Essential Data

\v 1
\w the book of|x-ugnt-phrase="βίβλος" x-source_occurrence="1/1" x-target-occurrence="1/1" \w*
\w the genealogy of|x-ugnt-phrase="γενέσεως" x-source_occurrence="1/1" x-target-occurrence="1/1" \w*
\w jesus|x-ugnt-phrase="ἰησοῦ" x-source_occurrence="1/1" x-target-occurrence="1/1" \w*
\w christ|x-ugnt-phrase="χριστοῦ" x-source_occurrence="1/1" x-target-occurrence="1/1" \w*
\w son of|x-ugnt-phrase="υἱοῦ" x-source_occurrence="1/2" x-target-occurrence="1/2" \w*
\w david|x-ugnt-phrase="δαυεὶδ" x-source_occurrence="1/1" x-target-occurrence="1/1" \w*
\w son of|x-ugnt-phrase="υἱοῦ" x-source_occurrence="2/2" x-target-occurrence="2/2" \w*
\w abraham|x-ugnt-phrase="ἀβραάμ" x-source_occurrence="1/1" x-target-occurrence="1/1" \w*

Supplemental Data

\v 1
\w the book of|x-ugnt-phrase="βίβλος" x-source_occurrence="1/1" strong="G9760:N./....NFS" x-target-occurrence="1/1" \w*
\w the genealogy of|x-ugnt-phrase="γενέσεως" x-source_occurrence="1/1" strong="G10780:N./....GFS" x-target-occurrence="1/1" \w*
\w jesus|x-ugnt-phrase="ἰησοῦ" x-source_occurrence="1/1" strong="G24240:N./....GMS" x-target-occurrence="1/1" \w*
\w christ|x-ugnt-phrase="χριστοῦ" x-source_occurrence="1/1" strong="G55470:N./....GMS" x-target-occurrence="1/1" \w*
\w son of|x-ugnt-phrase="υἱοῦ" x-source_occurrence="1/2" strong="G52070:N./....GMS" x-target-occurrence="1/2" \w*
\w david|x-ugnt-phrase="δαυεὶδ" x-source_occurrence="1/1" strong="G11380:N./....GMS" x-target-occurrence="1/1" \w*
\w son of|x-ugnt-phrase="υἱοῦ" x-source_occurrence="2/2" strong="G52070:N./....GMS" x-target-occurrence="2/2" \w*
\w abraham|x-ugnt-phrase="ἀβραάμ" x-source_occurrence="1/1" strong="G110:N./....GMS" x-target-occurrence="1/1" \w*