Skip to content
Christopher Klapp edited this page Nov 16, 2016 · 16 revisions

Codeship Status for unfoldingWord-dev/tact Status Alpha

Due to the regularly evolving codebase, documentation is limited to high level concepts and workflow.

Approach (process of elimination)

Generate permutations of all possible alignments. Score them based on weighted average of common patterns. Pick the best single option. Remove obvious results it can no longer be. Penalize conflicting remaining results. Pick the next best single result... repeat until all words are aligned. Reorder alignment to original source order.

Learning (Correction and Appending)

To prioritize the learning ability, correctional tables and phrase table appending are leveraged.

Correctional tables only store human verified word/phrase alignment. There are no permutations. These are scored and sorted with the rest of the possible alignment options but with a boost to ensure they are used before the statistical permutations.

Phrase table generation stops at statistical tallies of permutations. The trade off is more burden in the alignment step. This way the table can be appended to at any point and not result in any difference from complete retraining.

More info: