# Differences in MT and SP in parasha #2: Noach (Genesis 6:9-11:32)

## Table of Content <a class="anchor" id="TOC"></a> (ToC)

* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Compare surface texts of SP and MT</a>
* <a href="#bullet4">4 - Compare texts using minimum Levenshtein distance</a>
* <a href="#bullet5">5 - Comparison of spelling of proper nouns between SP and MT</a>
* <a href="#bullet6">6 - References and acknowledgement</a>
* <a href="#bullet7">7 - Required libraries</a>
* <a href="#bullet8">8 - Notebook version details</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

The Samaritan Pentateuch (SP) is a version of the Torah preserved by the Samaritan community, differing from the Masoretic Text (MT) in several aspects, including language, orthography, and occasionally theological emphasis. This notebook compares the text of the Masoretic Text, based on the BHSA dataset in Text-Fabric, with the Samaritan Pentateuch, also available as a Text-Fabric dataset.<a href="#ref1"><sup>1</sup></a>

In this analysis, we focus on comparing the text of the verses in a specific parasha, highlighting differences in wording and orthography. Additionally, special attention is given to spelling variations of proper nouns between the two traditions. This notebook draws inspiration from the notebook provided by Martijn Naaijer<a href="#ref2"><sup>2</sup></a> and aims to explore the textual nuances between these two important versions of the Torah.

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

The following code will load the Text-Fabric version of the [Samaritan Pentatuch](https://github.com/DT-UCPH/sp), the [Biblia Hebraica Stuttgartensia (Amstelodamensis)](https://etcbc.github.io/bhsa/) together with the additonal parasha related features from [tonyjurg/BHSaddons](https://github.com/tonyjurg/BHSaddons).

In [1]:
from tf.app import use

# Load the SP data, and rename the node features class F,
# the locality class L and the text class T, 
# then they cannot be overwritten while loading the MT.
SP = use('DT-UCPH/sp', version='3.4')
Fsp, Lsp, Tsp = SP.api.F, SP.api.L, SP.api.T

# Do the same for the MT dataset (BHSA) together with BHSaddons 
MT = use('etcbc/bhsa', version='2021',mod="tonyjurg/BHSaddons/tf/")
Fmt, Lmt, Tmt = MT.api.F, MT.api.L, MT.api.T

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,5,79878.4,100
chapter,187,2135.79,100
verse,5841,68.38,100
word,114890,3.48,100
sign,399392,1.0,100


**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,39,10938.21,100
chapter,929,459.19,100
lex,9230,46.22,100
verse,23213,18.38,100
half_verse,45179,9.44,100
sentence,63717,6.7,100
sentence_atom,64514,6.61,100
clause,88131,4.84,100
clause_atom,90704,4.7,100
phrase,253203,1.68,100


# 3 - Compare surface texts of SP and MT <a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

In this section, we compare the surface texts of the Samaritan Pentateuch (SP) and the Masoretic Text (MT) at the verse level. By analyzing the wording and structure of these texts, we aim to identify variations.

In [2]:
# find all word nodes for this parasha (we can either use the transliterated name or the sequence number)
parashaQuery = '''
verse parashanum=2
'''
parashaResults = MT.search(parashaQuery)

  0.01s 153 results


In [3]:
# Extract book, chapter, and verse information
bookChapterVerseList = [
    Tmt.sectionFromNode(verse[0]) for verse in parashaResults
]

# Store parashname, start and end verse for future use
startNode=parashaResults[0][0]
endNode=parashaResults[-1][0]
parashaNameHebrew=Fmt.parashahebr.v(startNode)
parashaNameEnglish=Fmt.parashatrans.v(startNode)
bookStart,chapterStart,startVerse=Tmt.sectionFromNode(startNode)
parashaStart=f'{bookStart} {chapterStart}:{startVerse}'
bookEnd,chapterEnd,startEnd=Tmt.sectionFromNode(endNode)
parashaEnd=f'{chapterEnd}:{startEnd}'
htmlStart='<html><body>'
htmlFooter=f'<p>Data generated by `delta_mt_and_sp.ipynb` at `<a href=\"https://github.com/tonyjurg/Parashot\" target=\"_blank\">github.com/tonyjurg/Parashot</a>`</p></body></html>`'

In [4]:
# Function to reconstruct verses
def reconstructVerses(F, L, T, textFeature, inputList):
    """Reconstruct text for each verse."""
    verseTexts = {}
    for verseName in inputList:
        verseText = ''
        verseNode = T.nodeFromSection(verseName)
        wordNodes = L.d(verseNode, 'word')
        for wordNode in wordNodes:
            wordText = eval(f'F.{textFeature}.v(wordNode)')
            trailer = F.trailer.v(wordNode)
            if wordText:
                verseText += wordText + (trailer if trailer else ' ')
        verseTexts[verseName] = verseText.strip()
    return verseTexts
    
SPverses = reconstructVerses(Fsp, Lsp, Tsp, 'g_cons', bookChapterVerseList)
MTverses = reconstructVerses(Fmt, Lmt, Tmt, 'g_cons', bookChapterVerseList)

In [5]:
from difflib import SequenceMatcher
from IPython.display import HTML, display

def highlightMatches(baseText, comparisonText):
    matcher = SequenceMatcher(None, baseText, comparisonText)
    highlightedComparisonText = "" 
    for tag, i1, i2, j1, j2 in matcher.get_opcodes():
        if tag == "equal":  # Identical parts
            highlightedComparisonText += comparisonText[j1:j2]
        else:  # Non-matching parts
            highlightedComparisonText += f'<mark>{comparisonText[j1:j2]}</mark>'  
    return highlightedComparisonText

def cleanText(text):
    replacements = [
         # for the transcoded strings
         ('00_P', ''),  # Remove '00_P'
         ('00_S', ''),  # Remove '00_S'
         ('00', ''),    # Remove '00'
         ('&', ' '),    # Replace '&' with a space
         # for the Hebrew strings
         ('◊° ', ''),    # Final Samekh
         ('◊§ ', ''),    # Final Pe
         ('◊É', ''),     # End of verse
         ('÷æ',' ')      # maqaf
    ]
    # Apply each replacement
    for old, new in replacements:
        text = text.replace(old, new)
    return text

# Function to format and highlight verse differences between MT and SP
def formatAndHighlight(label, MTverseText, SPverseText):
    book, chapter, verse = label
    MTverseNode = Tmt.nodeFromSection(label)
    MTtext = cleanText(Tmt.text(MTverseNode, "text-orig-plain"))
    SPverseNode = Tsp.nodeFromSection(label)
    SPtext = Tsp.text(SPverseNode)
    SPmarkedText = highlightMatches(MTtext, SPtext)
    MTmarkedText = highlightMatches(SPtext, MTtext)
    formattedDiff = (
        f'<h4><a href=\"https://www.stepbible.org/?q=version=NASB2020&reference='
        f'{book}.{chapter}:{verse}&options=HNVUG\" target=\"_blank\">{book} {chapter}:{verse}</a></h4>'
        f'<p><b>SP:</b> {SPmarkedText}<br><b>MT:</b> {MTmarkedText}</p>'
    )
    return formattedDiff

# Gather differences into an HTML string
htmlContent = f'<h2>Differences between MT and SP for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})</h2>'
for label, MTverseText in MTverses.items():
    SPverseText = SPverses.get(label, '')
    MTverseText = cleanText(MTverseText)
    if MTverseText != SPverseText:  # Check for differences
        difference = formatAndHighlight(label, MTverseText, SPverseText)
        htmlContent += difference

# Save the content to an HTML file
fileName = f"differences_MT_SP({parashaNameEnglish.replace(' ','%20')}).html"
with open(fileName, "w", encoding="utf-8") as file:
    file.write(htmlContent)

# Display the content in the notebook
display(HTML(htmlContent))

# wrap html header and footer and display a download button
htmlContentFull = f'{htmlStart}{htmlContent}{htmlFooter}'
downloadButton = f"""
<a download="{fileName}" href="data:text/html;charset=utf-8,{htmlContentFull.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace('"', '&quot;').replace("'", '&#39;')}" target="_blank">
    <button>Download Differences as HTML</button>
</a>
"""
display(HTML(downloadButton))

# 4 - Compare texts using minimum Levenshtein distance<a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

The Levenshtein distance measures the minimum number of single-character edits (insertions, deletions, or substitutions) needed to transform one text into another, providing a quantitative way to compare textual differences. For comparing the Masoretic Text and Samaritan Pentateuch, it highlights variations in spelling, word order, or minor textual changes. 
In the context of the Levenshtein distance (in the script below `threshold`), a higher number indicates greater dissimilarity between two texts, meaning more edits (insertions, deletions, or substitutions) are needed to transform one text into the other.

In [6]:
from Levenshtein import distance
from IPython.display import HTML, display

threshold = 20

# Create an HTML string to store the output
htmlContent = f'<h2>Levenshtein distance >{threshold} between MT and SP for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})</h2>'

# Create header
MT.dm(f'### Levenshtein distance >{threshold} between MT and SP for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})')

# Generate the HTML content
for label, MTverseText in MTverses.items():
    SPverseText = SPverses.get(label, '')
    levDistance = distance(MTverseText, SPverseText)  # Calculate the distance
    if levDistance > threshold:
        formattedDiff = formatAndHighlight(label, MTverseText, SPverseText)
        formattedDiff += f'<p>Levenshtein Distance: {levDistance}</p>'  # Add the distance
        MT.dm(formattedDiff)
        htmlContent += formattedDiff  # Append to the HTML content

# Save the content to an HTML file
fileName = f"levenshtein_differences_MT_SP({parashaNameEnglish.replace(' ','%20')}).html"
with open(fileName, "w", encoding="utf-8") as file:
    file.write(htmlContent)

# wrap html header and footer and display a download button
htmlContentFull = f'{htmlStart}{htmlContent}{htmlFooter}'
downloadButton = f"""
<a download="{fileName}" href="data:text/html;charset=utf-8,{htmlContentFull.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace('"', '&quot;').replace("'", '&#39;')}" target="_blank">
    <button>Download Differences as HTML</button>
</a>
"""
display(HTML(downloadButton))

### Levenshtein distance >20 between MT and SP for parasha Noach (Genesis 6:9-11:32)

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.6:20&options=HNVUG" target="_blank">Genesis 6:20</a></h4><p><b>SP:</b> <mark>◊ï◊î◊ô◊î </mark>◊û<mark>◊ü </mark>◊î◊¢◊ï◊£ ◊ú◊û◊ô◊†◊î◊ï ◊ï◊û◊ü ◊î◊ë◊î◊û◊î ◊ú◊û◊ô◊†◊î <mark>◊ï</mark>◊û◊õ◊ú <mark>◊ê◊©◊® </mark>◊®◊û◊©<mark> ◊¢◊ú</mark> ◊î◊ê◊ì◊û◊î ◊ú◊û◊ô◊†<mark>◊ô</mark>◊î<mark>◊ù</mark> ◊©<mark></mark>◊†◊ô◊ù ◊û◊õ◊ú ◊ô◊ë◊ê◊ï ◊ê◊ú◊ô◊ö ◊ú◊î◊ó◊ô◊ï◊™ <br><b>MT:</b> <mark></mark>◊û<mark></mark>◊î◊¢◊ï◊£ ◊ú◊û◊ô◊†◊î◊ï ◊ï◊û◊ü ◊î◊ë◊î◊û◊î ◊ú◊û◊ô◊†◊î <mark></mark>◊û◊õ◊ú <mark></mark>◊®◊û◊©<mark>◊Ç</mark> ◊î◊ê◊ì◊û◊î ◊ú◊û◊ô◊†<mark></mark>◊î<mark>◊ï</mark> ◊©<mark>◊Å</mark>◊†◊ô◊ù ◊û◊õ◊ú ◊ô◊ë◊ê◊ï ◊ê◊ú◊ô◊ö ◊ú◊î◊ó◊ô◊ï◊™ </p><p>Levenshtein Distance: 21</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.7:2&options=HNVUG" target="_blank">Genesis 7:2</a></h4><p><b>SP:</b> ◊û◊õ◊ú<mark></mark> ◊î◊ë◊î◊û◊î ◊î◊ò◊î<mark></mark>◊®◊î ◊™◊ß◊ó ◊ú◊ö ◊©<mark></mark>◊ë◊¢◊î ◊©<mark></mark>◊ë◊¢◊î <mark>◊ñ◊õ◊®</mark> ◊ï<mark>◊†◊ß◊ë◊î</mark> ◊ï◊û◊ü ◊î◊ë◊î◊û◊î ◊ê◊©<mark></mark>◊® ◊ú◊ê ◊ò◊î◊®◊î ◊î<mark>◊ô</mark>◊ê ◊©<mark></mark>◊†◊ô◊ù <mark>◊©◊†</mark>◊ô<mark>◊ù ◊ñ◊õ◊®</mark> ◊ï<mark>◊†◊ß◊ë◊î</mark> <br><b>MT:</b> ◊û◊õ◊ú<mark>◊Ä</mark> ◊î◊ë◊î◊û◊î ◊î◊ò◊î<mark>◊ï</mark>◊®◊î ◊™◊ß◊ó ◊ú◊ö ◊©<mark>◊Å</mark>◊ë◊¢◊î ◊©<mark>◊Å</mark>◊ë◊¢◊î <mark>◊ê◊ô◊©◊Å</mark> ◊ï<mark>◊ê◊©◊Å◊™◊ï</mark> ◊ï◊û◊ü ◊î◊ë◊î◊û◊î ◊ê◊©<mark>◊Å</mark>◊® ◊ú◊ê ◊ò◊î◊®◊î ◊î<mark>◊ï</mark>◊ê ◊©<mark>◊Å</mark>◊†◊ô◊ù <mark>◊ê◊ô</mark>◊©<mark>◊Å</mark> ◊ï<mark>◊ê◊©◊Å◊™◊ï</mark> </p><p>Levenshtein Distance: 25</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.8:19&options=HNVUG" target="_blank">Genesis 8:19</a></h4><p><b>SP:</b> <mark>◊ï</mark>◊õ◊ú ◊î◊ó◊ô◊î<mark></mark> ◊ï◊õ◊ú ◊î◊¢◊ï◊£ <mark>◊ï</mark>◊õ◊ú <mark>◊î</mark>◊®<mark></mark>◊û◊©<mark> ◊î◊®◊û◊©</mark> ◊¢◊ú ◊î◊ê◊®◊• ◊ú◊û◊©<mark></mark>◊§◊ó<mark>◊ï</mark>◊™◊ô◊î◊ù ◊ô◊¶◊ê◊ï ◊û◊ü ◊î◊™◊ë◊î <br><b>MT:</b> <mark></mark>◊õ◊ú ◊î◊ó◊ô◊î<mark> ◊õ◊ú ◊î◊®◊û◊©◊Ç</mark> ◊ï◊õ◊ú ◊î◊¢◊ï◊£ <mark></mark>◊õ◊ú <mark></mark>◊®<mark>◊ï</mark>◊û◊©<mark>◊Ç</mark> ◊¢◊ú ◊î◊ê◊®◊• ◊ú◊û◊©<mark>◊Å</mark>◊§◊ó<mark></mark>◊™◊ô◊î◊ù ◊ô◊¶◊ê◊ï ◊û◊ü ◊î◊™◊ë◊î </p><p>Levenshtein Distance: 21</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.9:15&options=HNVUG" target="_blank">Genesis 9:15</a></h4><p><b>SP:</b> ◊ï◊ñ◊õ◊®◊™◊ô ◊ê◊™ ◊ë◊®◊ô◊™◊ô ◊ê◊©<mark></mark>◊® ◊ë◊ô◊†◊ô ◊ï◊ë◊ô◊†<mark></mark>◊õ◊ù ◊ï◊ë◊ô◊ü ◊õ◊ú ◊†◊§◊©<mark></mark> <mark>◊î</mark>◊ó◊ô◊î<mark> ◊ê◊©◊® ◊ê◊™◊õ◊ù</mark> ◊ë◊õ◊ú ◊ë◊©<mark></mark>◊® ◊ï◊ú◊ê ◊ô◊î◊ô◊î ◊¢◊ï◊ì ◊î◊û◊ô◊ù ◊ú◊û◊ë◊ï◊ú ◊ú<mark>◊î</mark>◊©<mark></mark>◊ó<mark>◊ô</mark>◊™ ◊õ◊ú ◊ë◊©<mark></mark>◊® <br><b>MT:</b> ◊ï◊ñ◊õ◊®◊™◊ô ◊ê◊™ ◊ë◊®◊ô◊™◊ô ◊ê◊©<mark>◊Å</mark>◊® ◊ë◊ô◊†◊ô ◊ï◊ë◊ô◊†<mark>◊ô</mark>◊õ◊ù ◊ï◊ë◊ô◊ü ◊õ◊ú ◊†◊§◊©<mark>◊Å</mark> <mark></mark>◊ó◊ô◊î<mark></mark> ◊ë◊õ◊ú ◊ë◊©<mark>◊Ç</mark>◊® ◊ï◊ú◊ê ◊ô◊î◊ô◊î ◊¢◊ï◊ì ◊î◊û◊ô◊ù ◊ú◊û◊ë◊ï◊ú ◊ú<mark></mark>◊©<mark>◊Å</mark>◊ó<mark></mark>◊™ ◊õ◊ú ◊ë◊©<mark>◊Ç</mark>◊® </p><p>Levenshtein Distance: 21</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.10:19&options=HNVUG" target="_blank">Genesis 10:19</a></h4><p><b>SP:</b> ◊ï◊ô◊î◊ô ◊í◊ë◊ï◊ú ◊î◊õ◊†◊¢◊†◊ô ◊û<mark>◊†◊î◊® ◊û</mark>◊¶<mark></mark>◊®<mark></mark>◊ô◊ù ◊¢◊ì <mark>◊î◊†◊î◊® ◊î◊í◊ì◊ï</mark>◊ú<mark> ◊†◊î◊® ◊§◊®◊™ ◊ï</mark>◊¢<mark>◊ì</mark> <mark>◊î◊ô◊ù ◊î◊ê◊ó◊®◊ï◊ü </mark><br><b>MT:</b> ◊ï◊ô◊î◊ô ◊í◊ë◊ï◊ú ◊î◊õ◊†◊¢◊†◊ô ◊û<mark>◊¶◊ô◊ì◊ü ◊ë◊ê◊õ</mark>◊î<mark> ◊í</mark>◊®<mark>◊®◊î</mark> <mark>◊¢◊ì ◊¢◊ñ◊î ◊ë◊ê◊õ◊î ◊°◊ì</mark>◊û<mark>◊î ◊ï◊¢◊û◊®◊î ◊ï◊ê◊ì◊û◊î ◊ï</mark>◊¶<mark>◊ë</mark>◊ô◊ù ◊¢◊ì <mark>◊ú◊©◊Å◊¢</mark> <mark></mark></p><p>Levenshtein Distance: 42</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.11:11&options=HNVUG" target="_blank">Genesis 11:11</a></h4><p><b>SP:</b> ◊ï◊ô◊ó◊ô ◊©<mark></mark>◊ù ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊ê◊®◊§◊õ◊©<mark></mark>◊ì ◊ó◊û◊©<mark></mark> ◊û◊ê◊ï◊™ ◊©<mark></mark>◊†◊î ◊ï◊ô◊ï◊ú<mark>◊ô</mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark>◊ï◊ô◊î◊ô◊ï ◊õ◊ú ◊ô◊û◊ô ◊©◊ù ◊©◊© ◊û◊ê◊ï◊™ ◊©◊†◊î ◊ï◊ô◊û◊™ </mark><br><b>MT:</b> ◊ï◊ô◊ó◊ô ◊©<mark>◊Å</mark>◊ù ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊ê◊®◊§◊õ◊©<mark>◊Å</mark>◊ì ◊ó◊û◊©<mark>◊Å</mark> ◊û◊ê◊ï◊™ ◊©<mark>◊Å</mark>◊†◊î ◊ï◊ô◊ï◊ú<mark></mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark></mark></p><p>Levenshtein Distance: 38</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.11:13&options=HNVUG" target="_blank">Genesis 11:13</a></h4><p><b>SP:</b> ◊ï◊ô◊ó◊ô ◊ê◊®◊§◊õ◊©<mark></mark>◊ì ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊©<mark></mark>◊ú◊ó ◊©<mark></mark>◊ú◊©<mark></mark> ◊©<mark></mark>◊†◊ô◊ù<mark> ◊ï◊©◊ú◊© ◊û◊ê◊ï◊™ ◊©◊†◊î ◊ï◊ô◊ï◊ú◊ô◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ ◊ï◊ô◊î◊ô◊ï ◊õ◊ú ◊ô◊û◊ô ◊ê◊®◊§◊õ◊©◊ì ◊©◊û◊†◊î ◊ï◊©◊ú◊©◊ô◊ù ◊©◊†◊î</mark> ◊ï◊ê◊®◊ë◊¢ ◊û◊ê◊ï◊™ ◊©<mark></mark>◊†◊î ◊ï◊ô<mark>◊û</mark>◊™ <br><b>MT:</b> ◊ï◊ô◊ó◊ô ◊ê◊®◊§◊õ◊©<mark>◊Å</mark>◊ì ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊©<mark>◊Å</mark>◊ú◊ó ◊©<mark>◊Å</mark>◊ú◊©<mark>◊Å</mark> ◊©<mark>◊Å</mark>◊†◊ô◊ù ◊ï<mark>◊ê◊®◊ë◊¢</mark> ◊û◊ê◊ï◊™ ◊©<mark>◊Å</mark>◊†◊î ◊ï◊ô◊ï◊ú<mark></mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark></mark></p><p>Levenshtein Distance: 66</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.11:15&options=HNVUG" target="_blank">Genesis 11:15</a></h4><p><b>SP:</b> ◊ï◊ô◊ó◊ô ◊©<mark></mark>◊ú◊ó ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊¢◊ë◊® ◊©<mark></mark>◊ú◊©<mark></mark> ◊©<mark></mark>◊†◊ô◊ù<mark> ◊ï◊©◊ú◊© ◊û◊ê◊ï◊™ ◊©◊†◊î ◊ï◊ô◊ï◊ú◊ô◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ ◊ï◊ô◊î◊ô◊ï ◊õ◊ú ◊ô◊û◊ô ◊©◊ú◊ó ◊©◊ú◊© ◊ï◊©◊ú◊©◊ô◊ù ◊©◊†◊î</mark> ◊ï◊ê◊®◊ë◊¢ ◊û◊ê◊ï◊™ ◊©<mark></mark>◊†◊î ◊ï◊ô<mark>◊û</mark>◊™ <br><b>MT:</b> ◊ï◊ô◊ó◊ô ◊©<mark>◊Å</mark>◊ú◊ó ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊¢◊ë◊® ◊©<mark>◊Å</mark>◊ú◊©<mark>◊Å</mark> ◊©<mark>◊Å</mark>◊†◊ô◊ù ◊ï<mark>◊ê◊®◊ë◊¢</mark> ◊û◊ê◊ï◊™ ◊©<mark>◊Å</mark>◊†◊î ◊ï◊ô◊ï◊ú<mark></mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark></mark></p><p>Levenshtein Distance: 63</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.11:17&options=HNVUG" target="_blank">Genesis 11:17</a></h4><p><b>SP:</b> ◊ï◊ô◊ó◊ô ◊¢◊ë◊® ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊§◊ú◊í ◊©<mark>◊ë◊¢</mark>◊ô◊ù ◊©<mark></mark>◊†◊î<mark> ◊ï◊û◊ê◊™◊ô◊ù ◊©◊†◊î ◊ï◊ô◊ï◊ú◊ô◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ ◊ï◊ô◊î◊ô◊ï ◊õ◊ú ◊ô◊û◊ô ◊¢◊ë◊® ◊ê◊®◊ë◊¢ ◊©◊†◊ô◊ù</mark> ◊ï◊ê◊®◊ë◊¢ ◊û◊ê◊ï◊™ ◊©<mark></mark>◊†◊î ◊ï◊ô<mark>◊û</mark>◊™ <br><b>MT:</b> ◊ï◊ô◊ó◊ô ◊¢◊ë◊® ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊§◊ú◊í ◊©<mark>◊Å◊ú◊©◊Å</mark>◊ô◊ù ◊©<mark>◊Å</mark>◊†◊î ◊ï<mark>◊ê◊®◊ë◊¢ </mark>◊û◊ê<mark>◊ï</mark>◊™<mark></mark> ◊©<mark>◊Å</mark>◊†◊î ◊ï◊ô◊ï◊ú<mark></mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark></mark></p><p>Levenshtein Distance: 61</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.11:19&options=HNVUG" target="_blank">Genesis 11:19</a></h4><p><b>SP:</b> ◊ï◊ô◊ó◊ô ◊§◊ú◊í ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊®◊¢◊ï ◊™◊©<mark></mark>◊¢ ◊©<mark></mark>◊†◊ô◊ù ◊ï◊û◊ê◊™<mark></mark> ◊©<mark></mark>◊†◊î ◊ï◊ô◊ï◊ú<mark>◊ô</mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark>◊ï◊ô◊î◊ô◊ï ◊õ◊ú ◊ô◊û◊ô ◊§◊ú◊í ◊™◊©◊¢ ◊ï◊©◊ú◊©◊ô◊ù ◊ï◊û◊ê◊™◊ô◊ù ◊©◊†◊î ◊ï◊ô◊û◊™ </mark><br><b>MT:</b> ◊ï◊ô◊ó◊ô ◊§◊ú◊í ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊®◊¢◊ï ◊™◊©<mark>◊Å</mark>◊¢ ◊©<mark>◊Å</mark>◊†◊ô◊ù ◊ï◊û◊ê◊™<mark>◊ô◊ù</mark> ◊©<mark>◊Å</mark>◊†◊î ◊ï◊ô◊ï◊ú<mark></mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark></mark></p><p>Levenshtein Distance: 53</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.11:21&options=HNVUG" target="_blank">Genesis 11:21</a></h4><p><b>SP:</b> ◊ï◊ô◊ó◊ô ◊®◊¢◊ï ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊©<mark></mark>◊®◊ï◊í ◊©<mark></mark>◊ë◊¢ ◊©<mark></mark>◊†◊ô◊ù ◊ï◊û◊ê◊™<mark></mark> ◊©<mark></mark>◊†◊î ◊ï◊ô◊ï◊ú<mark>◊ô</mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark>◊ï◊ô◊î◊ô◊ï ◊õ◊ú ◊ô◊û◊ô ◊®◊¢◊ï ◊™◊©◊¢ ◊ï◊©◊ú◊©◊ô◊ù ◊ï◊û◊ê◊™◊ô◊ù ◊©◊†◊î ◊ï◊ô◊û◊™ </mark><br><b>MT:</b> ◊ï◊ô◊ó◊ô ◊®◊¢◊ï ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊©<mark>◊Ç</mark>◊®◊ï◊í ◊©<mark>◊Å</mark>◊ë◊¢ ◊©<mark>◊Å</mark>◊†◊ô◊ù ◊ï◊û◊ê◊™<mark>◊ô◊ù</mark> ◊©<mark>◊Å</mark>◊†◊î ◊ï◊ô◊ï◊ú<mark></mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark></mark></p><p>Levenshtein Distance: 52</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.11:23&options=HNVUG" target="_blank">Genesis 11:23</a></h4><p><b>SP:</b> ◊ï◊ô◊ó◊ô ◊©<mark></mark>◊®◊ï◊í ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊†◊ó◊ï◊® ◊û◊ê◊™<mark></mark> ◊©<mark></mark>◊†◊î ◊ï◊ô◊ï◊ú<mark>◊ô</mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark>◊ï◊ô◊î◊ô◊ï ◊õ◊ú ◊ô◊û◊ô ◊©◊®◊ï◊í ◊©◊ú◊©◊ô◊ù ◊©◊†◊î ◊ï◊û◊ê◊™◊ô◊ù ◊©◊†◊î ◊ï◊ô◊û◊™ </mark><br><b>MT:</b> ◊ï◊ô◊ó◊ô ◊©<mark>◊Ç</mark>◊®◊ï◊í ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊†◊ó◊ï◊® ◊û◊ê◊™<mark>◊ô◊ù</mark> ◊©<mark>◊Å</mark>◊†◊î ◊ï◊ô◊ï◊ú<mark></mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark></mark></p><p>Levenshtein Distance: 51</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.11:25&options=HNVUG" target="_blank">Genesis 11:25</a></h4><p><b>SP:</b> ◊ï◊ô◊ó◊ô ◊†◊ó◊ï◊® ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊™◊®◊ó ◊™◊©<mark></mark>◊¢ <mark></mark>◊©<mark>◊†◊ô◊ù ◊ï◊©◊©◊ô◊ù</mark> ◊©<mark></mark>◊†◊î ◊ï◊ô◊ï◊ú<mark>◊ô</mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark>◊ï◊ô◊î◊ô◊ï ◊õ◊ú ◊ô◊û◊ô ◊†◊ó◊ï◊® ◊©◊û◊†◊î ◊ï◊ê◊®◊ë◊¢◊ô◊ù ◊©◊†◊î ◊ï◊û◊ê◊™ ◊©◊†◊î ◊ï◊ô◊û◊™ </mark><br><b>MT:</b> ◊ï◊ô◊ó◊ô ◊†◊ó◊ï◊® ◊ê◊ó◊®◊ô ◊î◊ï◊ú◊ô◊ì◊ï ◊ê◊™ ◊™◊®◊ó ◊™◊©<mark>◊Å</mark>◊¢ <mark>◊¢</mark>◊©<mark>◊Ç◊®◊î ◊©◊Å</mark>◊†<mark>◊î</mark> ◊ï<mark>◊û◊ê◊™</mark> ◊©<mark>◊Å</mark>◊†◊î ◊ï◊ô◊ï◊ú<mark></mark>◊ì ◊ë◊†◊ô◊ù ◊ï◊ë◊†◊ï◊™ <mark></mark></p><p>Levenshtein Distance: 65</p>

<h4><a href="https://www.stepbible.org/?q=version=NASB2020&reference=Genesis.11:31&options=HNVUG" target="_blank">Genesis 11:31</a></h4><p><b>SP:</b> ◊ï◊ô◊ß◊ó ◊™◊®◊ó ◊ê◊™ ◊ê◊ë◊®◊ù ◊ë◊†◊ï ◊ï◊ê◊™ ◊ú◊ï◊ò ◊ë◊ü ◊î◊®◊ü ◊ë◊ü ◊ë◊†◊ï ◊ï◊ê◊™ ◊©<mark></mark>◊®◊ô <mark>◊ï◊ê◊™ ◊û◊ú◊õ◊î </mark>◊õ◊ú<mark>◊ï</mark>◊™◊ï ◊ê◊©<mark></mark>◊™ ◊ê◊ë◊®◊ù <mark>◊ï◊†◊ó◊ï◊® </mark>◊ë◊†<mark>◊ô</mark>◊ï ◊ï◊ô<mark>◊ï</mark>◊¶◊ê<mark></mark> ◊ê◊™◊ù ◊û◊ê◊ï◊® ◊õ◊©<mark></mark>◊ì◊ô◊ù ◊ú◊ú◊õ◊™ ◊ê◊®◊¶◊î ◊õ◊†◊¢◊ü ◊ï◊ô◊ë◊ê◊ï ◊¢◊ì ◊ó◊®◊ü ◊ï◊ô◊©<mark></mark>◊ë◊ï ◊©<mark></mark>◊ù <br><b>MT:</b> ◊ï◊ô◊ß◊ó ◊™◊®◊ó ◊ê◊™ ◊ê◊ë◊®◊ù ◊ë◊†◊ï ◊ï◊ê◊™ ◊ú◊ï◊ò ◊ë◊ü ◊î◊®◊ü ◊ë◊ü ◊ë◊†◊ï ◊ï◊ê◊™ ◊©<mark>◊Ç</mark>◊®◊ô <mark></mark>◊õ◊ú<mark></mark>◊™◊ï ◊ê◊©<mark>◊Å</mark>◊™ ◊ê◊ë◊®◊ù <mark></mark>◊ë◊†<mark></mark>◊ï ◊ï◊ô<mark></mark>◊¶◊ê<mark>◊ï</mark> ◊ê◊™◊ù ◊û◊ê◊ï◊® ◊õ◊©<mark>◊Ç</mark>◊ì◊ô◊ù ◊ú◊ú◊õ◊™ ◊ê◊®◊¶◊î ◊õ◊†◊¢◊ü ◊ï◊ô◊ë◊ê◊ï ◊¢◊ì ◊ó◊®◊ü ◊ï◊ô◊©<mark>◊Å</mark>◊ë◊ï ◊©<mark>◊Å</mark>◊ù </p><p>Levenshtein Distance: 28</p>

# 5 - Comparison of spelling of proper nouns between SP and MT<a class="anchor" id="bullet5"></a>
##### [Back to ToC](#TOC)

This section focuses on comparing the spelling of proper nouns between the Samaritan Pentateuch (SP) and the Masoretic Text (MT). Proper nouns, including names of people, places, and unique terms, often exhibit variations in spelling

In [7]:
import collections

def collectProperNounSpellings(F, L, T, inputList):
    """
    Collect proper noun spellings and their associated word node numbers.
    Ensures only one tuple is stored for each lexeme-to-spelling mapping.
    """
    properNounsSpellings = {}
    for bookChapterVerse in inputList:
        verseNode = T.nodeFromSection(bookChapterVerse)
        wordNodes = L.d(verseNode, 'word')
        for wordNode in wordNodes:
            if F.sp.v(wordNode) == 'nmpr':  # Check if the word is a proper noun
                lex = F.lex.v(wordNode)    # Lexical form
                spelling = F.g_cons.v(wordNode)  # Spelling
                # Store only the first occurrence for each lex-to-cons mapping
                if lex not in properNounsSpellings or spelling not in {item[0] for item in properNounsSpellings[lex]}:
                    properNounsSpellings.setdefault(lex, []).append((spelling, wordNode))
    return properNounsSpellings
        
SPspellingDict = collectProperNounSpellings(Fsp, Lsp, Tsp, bookChapterVerseList) 
MTspellingDict = collectProperNounSpellings(Fmt, Lmt, Tmt, bookChapterVerseList)

In [8]:
from IPython.display import HTML, display

# Initialize HTML content
htmlContent = f'<h2>Spelling differences in proper nouns between SP and MT for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})</h2>'

# Generate the HTML output
for lex, MTspellings in MTspellingDict.items():
    # Retrieve SP spellings, defaulting to an empty set if lex is not found
    SPspellings = SPspellingDict.get(lex, set())

    # Extract only the spellings (ignoring node numbers) for comparison
    MTspellingSet = {spelling for spelling, _ in MTspellings}
    SPspellingSet = {spelling for spelling, _ in SPspellings}

    # Compare the sets of spellings
    if MTspellingSet != SPspellingSet:
        # Print MT spelling with reference
        MTnode = list(MTspellings)[0][1]  # Get first tuple's node number
        book, chapter, verse = Tmt.sectionFromNode(MTnode)
        MTgloss = Fmt.gloss.v(MTnode)
        MTspelling = Fmt.g_cons_utf8.v(MTnode)

        # Build HTML output
        output = (
            f'<h4>Word: <b>{MTgloss}</b> '
            f'<a href="https://www.stepbible.org/?q=version=NASB2020&reference={book}.{chapter}:{verse}&options=HNVUG" target="_blank">'
            f'{book} {chapter}:{verse}</a></h4>'
            f'<ul><li><b>MT Spelling:</b> {MTspelling}</li>'
        )

        # Print SP spellings with reference
        if SPspellings:
            SPnode = list(SPspellings)[0][1]  # Get first tuple's node number
            SPspelling = Fsp.g_cons_utf8.v(SPnode)
            output += f'<li><b>SP Spelling:</b> {SPspelling}</li></ul>'
        else:
            output += '<li><b>SP Spelling:</b> None</li></ul>'

        # Append the output to the HTML content
        htmlContent += output

# Save the HTML content to a file
fileName = f"spelling_differences_SP_MT({parashaNameEnglish.replace(' ','%20')}).html"
with open(fileName, "w", encoding="utf-8") as file:
    file.write(htmlContent)

# Display the HTML content in the notebook
display(HTML(htmlContent))

# wrap html header and footer and display a download button
htmlContentFull = f'{htmlStart}{htmlContent}{htmlFooter}'
downloadButton = f"""
<a download="{fileName}" href="data:text/html;charset=utf-8,{htmlContentFull.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace('"', '&quot;').replace("'", '&#39;')}" target="_blank">
    <button>Download Differences as HTML</button>
</a>
"""
display(HTML(downloadButton))

# 6 - References and acknowledgement <a class="anchor" id="bullet6"></a>
##### [Back to ToC](#TOC)

<a class="anchor" id="bullet1"><sup>1</sup></a> Christian Canu H√∏jgaard, Martijn Naaijer, & Stefan Schorch. (2023). Text-Fabric Dataset of the Samaritan Pentateuch. Zenodo. https://doi.org/10.5281/zenodo.7734632

<a class="anchor" id="bullet2"><sup>2</sup> [Notebook created by Martijn Naaijer](https://github.com/DT-UCPH/sp/blob/main/notebooks/combine_sp_with_mt_data.ipynb)

# 7 - Required libraries <a class="anchor" id="bullet7"></a>
##### [Back to ToC](#TOC)

The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:

    collections
    difflib
    Levenshtein

You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.

# 8 - Notebook version details<a class="anchor" id="bullet8"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.1</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>18 November 2024</td>
    </tr>
  </table>
</div>