# Double lexeme in Parasha #48: Shoftim (Deut. 16:18-21:9)

## Table of Content (ToC) <a class="anchor" id="TOC"></a>

* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Perform the queries</a>
   * <a href="#bullet3x1">3.1 - Create the verse list</a>
   * <a href="#bullet3x2">3.2 - Create the report</a>
   * <a href="#bullet3x3">3.3 - Provide download link</a>
* <a href="#bullet4">4 - Required libraries</a>
* <a href="#bullet5">5 - Notebook version details</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

This notebook examines the occurrences of consecutive identical lexemes within the parasha and haftarah texts. It groups the occurences by distance (of node ID).

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

The following code will load the Text-Fabric version of the [Biblia Hebraica Stuttgartensia (Amstelodamensis)](https://etcbc.github.io/bhsa/) together with the additonal parasha related features from [tonyjurg/BHSaddons](https://github.com/tonyjurg/BHSaddons).

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment.
from tf.fabric import Fabric
from tf.app import use

In [3]:
# load the app and data
BHSA = use ("etcbc/BHSA", version="2021", mod="tonyjurg/BHSaddons/tf/", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,39,10938.21,100
chapter,929,459.19,100
lex,9230,46.22,100
verse,23213,18.38,100
half_verse,45179,9.44,100
sentence,63717,6.7,100
sentence_atom,64514,6.61,100
clause,88131,4.84,100
clause_atom,90704,4.7,100
phrase,253203,1.68,100


# 3 - Performing the queries <a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

## 3.1 - Create the verse list <a class="anchor" id="bullet3x1"></a>

In [4]:
# find all word nodes for this parasha (we can either use the transliterated name or the sequence number)
parashaQuery = '''
verse parashanum=48
'''
parashaResults = BHSA.search(parashaQuery)

  0.01s 97 results


In [5]:
# Store parashaname, start and end verse for future use
startNode=parashaResults[0][0]
endNode=parashaResults[-1][0]
parashaNameHebrew=F.parashahebr.v(startNode)
parashaNameEnglish=F.parashatrans.v(startNode)
bookStart,chapterStart,startVerse=T.sectionFromNode(startNode)
parashaStart=f'{bookStart} {chapterStart}:{startVerse}'
bookEnd,chapterEnd,startEnd=T.sectionFromNode(endNode)
parashaEnd=f'{chapterEnd}:{startEnd}'

In [6]:
parashaList = [x for (x,) in parashaResults]

Now obtain all verse nodes for the haftarah.

In [7]:
# find first verse node of the haftara for this parasha 
startQuery = '''
verse book=Jesaia chapter=51 verse=12
'''
startResults = BHSA.search(startQuery)

# get the value of the first node in this list of tuples
startVerse=startResults[0][0]

# find last verse node for this parasha 
endQuery = '''
verse book=Jesaia chapter=52 verse=12
'''
endResults = BHSA.search(endQuery)

# get the value of the last node in this list of tuples
endVerse=endResults[0][0]

  0.02s 1 result
  0.01s 1 result


In [8]:
haftaraList = list(range(startVerse, endVerse + 1))

In [9]:
combinedList= parashaList+haftaraList

## 3.2 - Create the output report <a class="anchor" id="bullet3x2"></a>

In [16]:
# --- Config -------------------------------------------------------------------
# Maximum allowed distance (in node IDs) between repeated lexemes
MAX_DISTANCE = 10

# Restrict matching only to these parts of speech (BHSA feature sp)
POS_KEEP = {'verb', 'subs', 'adjv', 'prps', 'advb', 'intj'}

# Collapsing behavior for the HTML output
COLLAPSE_DEFAULT = True   # make sections collapsible/closed by default
OPEN_FIRST = True         # first section opened initially

# Make the whole verse clickable (True) or only the reference label (False)
LINK_WHOLE_VERSE = False

# --- Implementation -----------------------------------------------------------
from collections import defaultdict
from html import escape as _esc
from IPython.display import HTML
from urllib.parse import quote  # for safe URL construction

# Characters that indicate a join (no space after the token)
JOIN_CHARS = ("־", "-", "\u2010", "\u2011", "\u2012", "\u2013")  # U+05BE, ASCII -, ‐, -, ‒, –

def _space_after(node, F):
    """Should a space follow this node according to 'wordboundary'?"""
    try:
        val = F.wordboundary.v(node)
    except Exception:
        return True
    return val in (1, "1", True, "true", "True")

def _has_joiner_after(node, T):
    """No space if token ends with maqqef/hyphen-like joiner."""
    tok = T.text(node) or ""
    tok = tok.rstrip()
    return tok.endswith(JOIN_CHARS)

def highlight_verse_pair_html(verseNode, prevNode, currNode, L, T, F):
    """
    Render one verse (RTL), highlight prev/curr nodes, and space per 'wordboundary'
    unless a maqqef-like joiner suppresses spacing.
    """
    out = []
    words = sorted(L.d(verseNode, 'word'))
    last_index = len(words) - 1

    for i, n in enumerate(words):
        token = _esc(T.text(n) or "")
        token_html = f"<mark class='hl'>{token}</mark>" if (n == prevNode or n == currNode) else token
        out.append(token_html)
        if i != last_index and _space_after(n, F) and not _has_joiner_after(n, T):
            out.append(" ")
    return "".join(out)

def find_repeats_by_distance(combinedList, F, L, T, maxDistance=MAX_DISTANCE, posKeep=POS_KEEP):
    """
    Find repeated lexemes per verse, grouped by node-distance.
    """
    groups = defaultdict(list)
    for verseNode in combinedList:
        lastPosByLex = {}
        for wordNode in sorted(L.d(verseNode, 'word')):
            sp = F.sp.v(wordNode)
            if sp not in posKeep:
                continue
            lex = F.lex.v(wordNode)
            prevNode = lastPosByLex.get(lex)
            if prevNode is not None:
                dist = wordNode - prevNode
                if dist <= maxDistance:
                    book, chapter, verse = T.sectionFromNode(verseNode)
                    verseHtml = highlight_verse_pair_html(verseNode, prevNode, wordNode, L, T, F)
                    groups[dist].append({
                        "book": book,
                        "chapter": chapter,
                        "verse": verse,
                        "ref": f"{book} {chapter}:{verse}",
                        "wordNode": wordNode,
                        "prevNode": prevNode,
                        "lex": lex,
                        "sp": sp,
                        "gloss": F.gloss.v(wordNode) or "",
                        "dist": dist,
                        "verseHtml": verseHtml,
                        "form": T.text(wordNode),
                        "prevForm": T.text(prevNode),
                    })
            lastPosByLex[lex] = wordNode
    return groups

def _step_link(book: str, chapter: int, verse: int) -> str:
    """
    Build a STEP Bible URL for the given reference.
    Example: https://www.stepbible.org/?q=version=NASB2020&reference=Deuteronomy.16:20&options=HNVUG
    """
    # "Book.Chapter:Verse" needs URL-encoding for spaces, etc.
    ref_param = quote(f"{book}.{chapter}:{verse}", safe=":.")
    return f"https://www.stepbible.org/?q=version=NASB2020&reference={ref_param}&options=HNVUG"

def make_html_report(groups):
    """
    Build one big HTML report string with collapsible sections,
    STEP Bible links on the reference (and optionally on the verse text).
    """
    total = sum(len(v) for v in groups.values())
    parts = [f"""
<style>
.section     {{ margin-top: 1.25rem; border-top: 1px solid #ddd; padding-top: 0.75rem; }}
.meta        {{ direction: ltr; unicode-bidi: isolate; color:#444; font-size: 0.95rem; margin: 0.25rem 0; }}
.verse-rtl   {{ direction: rtl; unicode-bidi: isolate-override;
               font-family: 'SBL Hebrew','Ezra SIL','Times New Roman',serif;
               font-size: 1.25rem; line-height: 1.8; }}
.hl          {{ background: yellow; padding: 0 0.1em; }}
.item        {{ margin: 0.6rem 0; }}
summary      {{ cursor: pointer; font-weight: 600; padding: 0.25rem 0; }}
summary span.badge {{ display:inline-block; padding:0 .45em; margin-left:.5em; border-radius:1em; background:#eee; font-weight:500; }}
.topsum      {{ margin-bottom:.5rem; color:#333; }}
a.ref-link   {{ text-decoration: none; border-bottom: 1px dotted #888; color: inherit; }}
</style>
<h1>Double root for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})</h1>
<div class="topsum"><strong>Total hits:</strong> {total}</div>
<div>
""".strip()]

    dists = sorted(groups.keys())
    for i, dist in enumerate(dists):
        hits = groups[dist]
        count = len(hits)
        open_attr = " open" if (OPEN_FIRST and i == 0 and COLLAPSE_DEFAULT) else ""
        parts.append(f"<div class='section'><details{open_attr}>")
        parts.append(
            f"<summary>Distance = {_esc(str(dist))} "
            f"<span class='badge'>{count} hit(s)</span></summary>"
        )
        for r in hits:
            href = _step_link(r["book"], r["chapter"], r["verse"])
            # Clickable reference (STEP Bible)
            ref_html = (
                f'<a class="ref-link" href="{_esc(href)}" target="_blank" rel="noopener noreferrer">'
                f'{_esc(r["book"])} {r["chapter"]}:{r["verse"]}'
                f'</a>'
            )
            meta = (
                f"{ref_html} "
                f"<span class='meta'>"
                f"lex={_esc(r['lex'])} · sp={_esc(r['sp'])} · gloss={_esc(r['gloss'])} · "
                f"nodes {r['prevNode']}→{r['wordNode']}</span>"
            )

            verse_block = f"<div class='verse-rtl'>{r['verseHtml']}</div>"
            if LINK_WHOLE_VERSE:
                # Make the entire verse text clickable too
                verse_block = (
                    f"<a href=\"{_esc(href)}\" target=\"_blank\" rel=\"noopener noreferrer\" "
                    f"style=\"text-decoration:none; color:inherit;\">{verse_block}</a>"
                )

            parts.append(f"<div class='item'>{meta}{verse_block}</div>")
        parts.append("</details></div>")

    parts.append("</div>")
    return "\n".join(parts)

# --- Run ----------------------------------------------------------------------
_groups = find_repeats_by_distance(
    combinedList=combinedList, F=F, L=L, T=T,
    maxDistance=MAX_DISTANCE, posKeep=POS_KEEP,
)

html_report = make_html_report(_groups)

HTML(html_report)


## 3.3 - Create the downlaod link <a class="anchor" id="bullet3x3"></a>

In [18]:
import base64
from IPython.display import HTML

def wrapHTML(body, title):
    output = (
        f'<html><head><title>{title}</title></head>'
        f'<body>{body}<p>Data generated by `double_root.ipynb` at ' 
        '`<a href="https://github.com/tonyjurg/Parashot" target="_blank">'
        'github.com/tonyjurg/Parashot</a>`</p></body></html>'
    )
    return output

# Initialize HTML content
reportTitle=f'Double root for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})'

# Define the HTML filename and store to file
fileName = f"double_root({parashaNameEnglish.replace(' ','_')}).html"
htmlContentFull = wrapHTML(html_report,reportTitle)

# Encode the HTML string to base64
b64 = base64.b64encode(htmlContentFull.encode("utf-8")).decode("utf-8")

# Create a download button with a data URL
button_html = f"""
<a download={fileName}
href="data:text/html;base64,{b64}"
   target="_blank"
   style="display:inline-block;
          padding:0.5em 1em;
          background:#1976d2;
          color:white;
          border-radius:6px;
          text-decoration:none;
          font-weight:600;">
   Download HTML report
</a>
"""

HTML(button_html)

# 4 - Required libraries <a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:

    base64
    collections
    html
    urllib

You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.

# 5 - Notebook version details<a class="anchor" id="bullet5"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.3</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>August 24, 2025</td>
    </tr>
  </table>
</div>