# Follow up on error cases

## Table of content (ToC)<a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Create list of Greek words in Unicode</a>
* <a href="#bullet3">3 - Analyze Unicode accent storage</a>
* <a href="#bullet4">4 - Convert the word list into betacode</a>
* <a href="#bullet5">5 - Create a JSON dictionairy</a>
* <a href="#bullet6">6 - Atribution and footnotes</a>
* <a href="#bullet7">7 - Required libraries</a>
* <a href="#bullet8">8 - Notebook version</a>


# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

This Jupyter notebook follows up on the cases that did not result in Morpheus results.

# 2 - Harvest error cases from Morpheus output <a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

The betacode values here should largely match the ones in feature [betacode](https://github.com/tonyjurg/N1904addons/blob/main/docs/features/betacode.md), except for cases where the Greek word is all capitals, so later I need to take care of this.

In [17]:
input_path = "gnt_morphology_results.txt"   # the output from Morpheus first run
failingWords = []                           # collected results

with open(input_path, encoding="utf-8") as f:
    for line in f:
        if "No response for" in line:
            parts = line.strip().split()   # default split on any whitespace
            if len(parts) >= 5:            # make sure a 5-th item exists
                raw_word = parts[4]        # the original item (python index starts with 0)
                cleaned = raw_word.lstrip("'") # Morpheus prepended each errored word with a '
                failingWords.append(cleaned) # now add the 5-th item 


# 3 - Load N1904-TF with N1904addons featureset <a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

In [5]:
%load_ext autoreload
%autoreload 2

In [6]:
# Loading the Text-Fabric code
from tf.fabric import Fabric
from tf.app import use

In [7]:
# Load the N1904-TF app and data with the additional features
A = use ("CenterBLC/N1904", version="1.0.0", mod="tonyjurg/N1904addons/tf/", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
group,8945,7.01,46
clause,42506,8.36,258
wg,106868,6.88,533
phrase,69007,1.9,95
subphrase,116178,1.6,135
word,137779,1.0,100


Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)

See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes

# 4 - Postprocess the failing results <a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

In [30]:
for betacodeWord in failingWords:
    # get a word node that has this betacode
    betacodeNodes=F.betacode.s(betacodeWord)
    if len(betacodeNodes)>0:
        firstBetacodeNode=betacodeNodes[0]
        print (F.sp.v(firstBetacodeNode),F.text.v(firstBetacodeNode),F.typems.v(firstBetacodeNode))
    else:
        print (f'not found {betacodeWord}')

not found *agnwstw
not found *anaqema
subs Αἰνὼν proper
not found *babulwn
not found *basileus
not found *basilewn
not found *bdelugmatwn
subs Βάαλ proper
subs Βαλαάμ proper
subs Βαλαὰμ proper
subs Βαλὰκ proper
subs Βαράκ proper
subs Βαραββᾶν proper
subs Βαραββᾶς proper
subs Βαριησοῦς proper
subs Βαριωνᾶ proper
subs Βαρσαββᾶν proper
subs Βαρτιμαῖος proper
subs Βεελζεβοὺλ proper
subs Βελιάρ proper
subs Βενιαμείν proper
subs Βενιαμεὶν proper
subs Βερνίκη proper
subs Βερνίκης proper
subs Βεώρ proper
subs Βηθανίαν proper
subs Βηθζαθά proper
subs Βηθλέεμ proper
subs Βηθσαϊδά proper
subs Βηθσαϊδάν proper
subs Βηθσαϊδὰ proper
subs Βηθφαγὴ proper
subs Βοανηργές proper
not found *ghs
verb Γέγοναν None
subs Γαββαθα proper
subs Γαβριὴλ proper
subs Γαμαλιήλ proper
subs Γεθσημανεί proper
subs Γεννησαρέτ proper
subs Γεννησαρὲτ proper
subs Γολγοθᾶ proper
subs Γολγοθᾶν proper
subs Γομόρρας proper
subs Γομόρρων proper
subs Γόμορρα proper
subs Γὰδ proper
subs Γὼγ proper
subs Δάμαρις proper
subs Δαλμανου

# 6 - Footnotes and attribution<a class="anchor" id="bullet6"></a>
##### [Back to ToC](#TOC)

The engine of the conversion is provided by the `beta-code-py` library found on GitHub repository [perseids-tools/beta-code-py](https://github.com/perseids-tools/beta-code-py) available under MIT license.

The source data for the conversion are the XML node files representing the macula-greek version of Eberhard Nestle's 1904 Greek New Testament (British Foreign Bible Society 1904). The starting dataset is formatted according to Syntax diagram markup initially prepared by the Asia Bible Society and currently made available by <a href="https://www.biblica.com/" target="_blank">Biblica, Inc</a>. The most recent source data can be found on [GitHub](https://github.com/Clear-Bible/macula-greek/tree/main/Nestle1904/nodes). 

# 7 - Required libraries<a class="anchor" id="bullet7"></a>
##### [Back to ToC](#TOC)

The scripts in this notebook require the following Python libraries to be installed in the environment:

    beta_code 
    json
    os  
    pathlib
    re
    requests
    unicodedata
    xml

You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.

# 8 - Notebook version<a class="anchor" id="bullet8"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.3</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>29 April 2025</td>
    </tr>
  </table>
</div>