# Use Case 1: Layered Structure of Morphological Analysis (N1904addons)

## Table of content (ToC)<a class="anchor" id="TOC"></a>

* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Step 1: Setting up the environment</a>
* <a href="#bullet3">3 - Step 2: Start at the meta level (mm*)</a>    
* <a href="#bullet4">4 - Step 3: Explore at the summary level (ms*)</a>
* <a href="#bullet5">5 - Step 4: Deep dive at the detail level (md*)</a>
* <a href="#bullet6">6 - Making it efficient</a> 
* <a href="#bullet7">7 - Attribution and footnotes</a>
* <a href="#bullet8">8 - Notebook version</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

This notebook will provide you a compact overview of how the concept of [Morpheus TF feature classes](https://tonyjurg.github.io/N1904addons/using_the_morpheus_features.html) is helpfull. The example below shows you how to begin at the meta level, drill down to the summary level, and—when needed—inspect the full, detailed analytic blocks. Everything happens within a single environment that brings together the core N1904-TF dataset, the N1904addons module, and optionaly a set of detailed functions when desired.

# 2 - Step 1: Setting up the environment <a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

First load the base N1904-TF package together with the standard N1904addonns. The `mod` argument below pulls the feature files straight from the GitHub repository behind N1904addons so no manual download is needed.

In [1]:
%load_ext autoreload
%autoreload 2

In [6]:
from tf.app import use
A = use( "CenterBLC/N1904", version="1.0.0", mod="tonyjurg/N1904addons/tf/", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
group,8945,7.01,46
clause,42506,8.36,258
wg,106868,6.88,533
phrase,69007,1.9,95
subphrase,116178,1.6,135
word,137779,1.0,100


Display is setup for viewtype [syntax-view](https://github.com/saulocantanhede/tfgreek2/blob/main/docs/syntax-view.md#start)

See [here](https://github.com/saulocantanhede/tfgreek2/blob/main/docs/viewtypes.md#start) for more information on viewtypes

In [7]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
A.dh(A.getCss())

# 3 - Step 2: Start at the meta level (`mm*`) <a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

Suppose we want to understand how many wordforms be derived from a different lemmas. The first step could be to just create a frequency table:

In [8]:
F.mm_num_lemmas.freqList()

((1, 78324),
 (2, 29104),
 (3, 20501),
 (4, 5148),
 (5, 2124),
 (6, 154),
 (8, 142),
 (7, 99),
 (9, 42),
 (13, 10),
 (12, 6),
 (11, 1))

Now lets assume we would like to examine the wordforms that migth be derived from - say - exactly 9 lemmas.
With a very simple TF Query template we can obtain a list of all wordnodes that provide us access to all these wordforms:

In [10]:
multiLemmaList=A.search("word mm_num_lemmas=9")

  0.08s 42 results


The TF Query returns a list with 1-tuples which we now can examine further. So the next step is to get a frequency table of the surface forms:

In [11]:
from collections import Counter

cnt = Counter(
    F.mm_raw_uc.v(wordNode)              # surface form Morpheus analysed
    for (wordNode,) in multiLemmaList    # multiLemmaList is a list of 1-tuples [(wordNode,),]
)

print(f"\nsurface forms that triggered multiple lemma groups:\n")
for surface, freq in cnt.most_common():
    print(f"{surface:<20} {freq:>5}")


surface forms that triggered multiple lemma groups:

Διὰ                     36
περάτων                  2
ἀπόλωνται                1
Δία                      1
διαπονηθεὶς              1
διακονιῶν                1


Now assume we would like to dig deeper into the last wordform, διακονιῶν. 

This means that we are looking at an unique occurence of a wordform. There are various methods to identify the value of its wordNode, which is the key to unlock all the data we would like to see. This one is a quick and easy one:

In [12]:
wordNodeList=A.search("word mm_raw_uc=διακονιῶν")

  0.09s 1 result


As expected this returns a list with just one tuple. Now we can print some more details:

In [13]:
for (wordNode,) in wordNodeList:        # wordNodeList is a list of 1-tuples: [(wordNode,),]
    print(f'surface form Morpheus used: {F.mm_raw_uc.v(wordNode)}') 
    print(f'lemma groups found: {F.mm_num_lemmas.v(wordNode)}')
    print(f'Total number of analytic blocks: {F.mm_num_blocks.v(wordNode)}')

surface form Morpheus used: διακονιῶν
lemma groups found: 9
Total number of analytic blocks: 15


# 4 - Step 3: Explore at the summary level (`ms*`)<a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

To explore these 9 lemmas and their respective grammatical details (like morph code) in detail we need to turn to the summary features. This can easily be done using simple Python code. The real trick in there is the use of a variable name that is constructed dynamicaly in combination with using Text-Fabric's `Fs('fff')` method:

First we exploit the power of the summary features by printing for each lemma the associated morphological tags which provide a very consise summary of all the grammatical detailed created by Morpheus.

In [14]:
for (wordNode,) in wordNodeList:        # wordNodeList is a list of 1-tuples: [(wordNode,),]
    # How many lemma-groups does this word actually have
    n_lemmas = int(F.mm_num_lemmas.v(wordNode) or 0)

    # loop over the existing groups 1 … n_lemmas
    for index in range(1, n_lemmas + 1):
        lemma   = Fs(f"ms{index}_lem_base_uc").v(wordNode)  # e.g. "ms1_lem_base_uc"
        morph   = Fs(f"ms{index}_morph").v(wordNode)      
        sim     = Fs(f"ms{index}_morph_sim").v(wordNode)   
        print(f"#{index}: lemma={lemma:12}  morph={morph} sim={sim}")

#1: lemma=διακονία      morph=N-GPF sim=100
#2: lemma=διακόνιον     morph=N-GPN sim=92
#3: lemma=διά-ἀκονάω    morph=V-PAP-NSM sim=49
#4: lemma=διά-κονέω     morph=V-PAP-NSM sim=49
#5: lemma=διά-κονίω     morph=V-PAP-NSM sim=49
#6: lemma=διά-κονίζω    morph=V-FAP-NSM-ATT sim=49
#7: lemma=διά-κονιάω    morph=V-PAP-VSM/V-PAP-NSN/V-PAP-VSN/V-PAP-ASN/V-PAP-NSM-ATT/V-IAI-3P/V-IAI-1S sim=49/49/49/49/49/38/25
#8: lemma=διά-κονιάζω   morph=V-FAP-VSM/V-FAP-NSN/V-FAP-VSN/V-FAP-ASN/V-FAP-NSM-ATT sim=49/49/49/49/49
#9: lemma=διακονέω      morph=V-PAP-NSM sim=49


# 5 - Step 4: Deep dive at the detail level (`md*`) <a class="anchor" id="bullet5"></a>
##### [Back to ToC](#TOC)

When we want to dig one level deeper, we should dump the data in feature [`ms{num}_block_nums`](https://tonyjurg.github.io/N1904addons/features/ms%7Bnum%7D_block_nums.html), as this allows us to map the summary features (which are numbered) with their related detailed features (which are also numbered but with a different key).

In [15]:
for (wordNode,) in wordNodeList:        # wordNodeList is a list of 1-tuples: [(wordNode,),]
    # How many lemma-groups does this word actually have
    n_lemmas = int(F.mm_num_lemmas.v(wordNode) or 0)

    # loop over the existing groups 1 … n_lemmas
    for index in range(1, n_lemmas + 1):
        lemma   = Fs(f"ms{index}_lem_base_uc").v(wordNode)  # e.g. "ms1_lem_base_uc"
        morph   = Fs(f"ms{index}_morph").v(wordNode)      
        sim     = Fs(f"ms{index}_morph_sim").v(wordNode)   
        print(f"#{index}: lemma={lemma:12}  morph={morph} sim={sim}")

#1: lemma=διακονία      morph=N-GPF sim=100
#2: lemma=διακόνιον     morph=N-GPN sim=92
#3: lemma=διά-ἀκονάω    morph=V-PAP-NSM sim=49
#4: lemma=διά-κονέω     morph=V-PAP-NSM sim=49
#5: lemma=διά-κονίω     morph=V-PAP-NSM sim=49
#6: lemma=διά-κονίζω    morph=V-FAP-NSM-ATT sim=49
#7: lemma=διά-κονιάω    morph=V-PAP-VSM/V-PAP-NSN/V-PAP-VSN/V-PAP-ASN/V-PAP-NSM-ATT/V-IAI-3P/V-IAI-1S sim=49/49/49/49/49/38/25
#8: lemma=διά-κονιάζω   morph=V-FAP-VSM/V-FAP-NSN/V-FAP-VSN/V-FAP-ASN/V-FAP-NSM-ATT sim=49/49/49/49/49
#9: lemma=διακονέω      morph=V-PAP-NSM sim=49


Now we can pull from the Morpheus analytic blocks any detailed grammatical property we would like to evaluate.
But in order to be able to use these detailed features, we need to load them first. 

In [11]:
A.isLoaded("md1_case")

md1_case             NOT LOADED


The previous cell shows that the detailed dataset is indeed not loaded. To do so. we reload again with inclusion on the detailed dataset:

In [12]:
A = use( "CenterBLC/N1904", version="1.0.0", mod=["tonyjurg/N1904addons/tf/", "tonyjurg/N1904addons/detailed_set"], hoist=globals(), silence="terse",)


**Locating corpus resources ...**

   |     0.00s T md10_aug1_bc         from ~/text-fabric-data/github/tonyjurg/N1904addons/detailed_set/1.0.0
   |     0.01s T md10_aug1_uc         from ~/text-fabric-data/github/tonyjurg/N1904addons/detailed_set/1.0.0
   |     0.01s T md10_case            from ~/text-fabric-data/github/tonyjurg/N1904addons/detailed_set/1.0.0
   |     0.01s T md10_degree          from ~/text-fabric-data/github/tonyjurg/N1904addons/detailed_set/1.0.0
   |     0.00s T md10_dialects        from ~/text-fabric-data/github/tonyjurg/N1904addons/detailed_set/1.0.0
   |     0.02s T md10_end_bc          from ~/text-fabric-data/github/tonyjurg/N1904addons/detailed_set/1.0.0
   |     0.02s T md10_end_codes       from ~/text-fabric-data/github/tonyjurg/N1904addons/detailed_set/1.0.0
   |     0.01s T md10_end_flags       from ~/text-fabric-data/github/tonyjurg/N1904addons/detailed_set/1.0.0
   |     0.02s T md10_end_uc          from ~/text-fabric-data/github/tonyjurg/N1904addons/detailed_set/1.0.0
   |     0.01s T md

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
group,8945,7.01,46
clause,42506,8.36,258
wg,106868,6.88,533
phrase,69007,1.9,95
subphrase,116178,1.6,135
word,137779,1.0,100


Display is setup for viewtype [syntax-view](https://github.com/saulocantanhede/tfgreek2/blob/main/docs/syntax-view.md#start)

See [here](https://github.com/saulocantanhede/tfgreek2/blob/main/docs/viewtypes.md#start) for more information on viewtypes

Now we can create tables using specific part of the Morpheus output. For example aspects like preposition, stem, lemma and ending for each analysis block:

In [18]:
wordNodeList=A.search("word mm_raw_uc=διακονιῶν")   # we have to re-do this since we reloaded!

for (wordNode,) in wordNodeList:        # wordNodeList is a list of 1-tuples: [(wordNode,),]
    # How many lemma-groups does this word actually have
    n_lemmas = int(F.mm_num_lemmas.v(wordNode) or 0)

    # loop over the existing groups 1 … n_lemmas
    for index in range(1, n_lemmas + 1):
        feat = f"ms{index}_block_nums"          # e.g. "ms1_block_nums"
        block_nums = Fs(feat).v(wordNode)        # Text-Fabric feature lookup
        block_list = block_nums.split('/')
        print(f'index={index}  blocks={block_nums}')
        for block in block_list:
            lem_base_uc = Fs(f"md{block}_lem_base_uc").v(wordNode)
            prvb_uc = Fs(f"md{block}_prvb_uc").v(wordNode)
            stem_uc = Fs(f"md{block}_stem_uc").v(wordNode)
            end_uc = Fs(f"md{block}_end_uc").v(wordNode)
            print(f'{lem_base_uc=}  {stem_uc=}  {end_uc=}')

  0.08s 1 result
index=1  blocks=1
lem_base_uc='διακονία'  stem_uc='δια—κονι'  end_uc='ῶν'
index=2  blocks=2
lem_base_uc='διακόνιον'  stem_uc='διακονι'  end_uc='ων'
index=3  blocks=3
lem_base_uc='διά-ἀκονάω'  stem_uc='ἀκον'  end_uc='ίων'
index=4  blocks=4
lem_base_uc='διά-κονέω'  stem_uc='κον'  end_uc='ίων'
index=5  blocks=5
lem_base_uc='διά-κονίω'  stem_uc='κονι—'  end_uc='ων'
index=6  blocks=6
lem_base_uc='διά-κονίζω'  stem_uc='κονι'  end_uc='ῶν'
index=7  blocks=7/8/9/10/11
lem_base_uc='διά-κονιάω'  stem_uc='κονι'  end_uc='ῶν'
lem_base_uc='διά-κονιάω'  stem_uc='κονι'  end_uc='ῶν'
lem_base_uc='διά-κονιάω'  stem_uc='κονι'  end_uc='ῶν'
lem_base_uc='διά-κονιάω'  stem_uc='κονι'  end_uc='ων'
lem_base_uc='διά-κονιάω'  stem_uc='κονι'  end_uc='ων'
index=8  blocks=12/13/14
lem_base_uc='διά-κονιάζω'  stem_uc='κονι'  end_uc='ῶν'
lem_base_uc='διά-κονιάζω'  stem_uc='κονι'  end_uc='ῶν'
lem_base_uc='διά-κονιάζω'  stem_uc='κονι'  end_uc='ῶν'
index=9  blocks=15
lem_base_uc='διακονέω'  stem_uc='δια—κον

This three-tier approach keeps the usual workflow fast and lightweight (meta + summary only) while it still giving you the option to dive deep whenever your research question demands it.

# 6 - Making it efficient <a class="anchor" id="bullet6"></a>
##### [Back to ToC](#TOC)

In the previous example, just a few lookups were performed. Access to the detailed features can be made dramaticlay more efficient. In the above situation, Text-Fabric needs to resolve the string name we passed (for example "md7_workw_bc") into an immutable feature object that it knows how to fetch the value for any node. Doing that name-lookup 24 × ≈138 000 word-nodes = about 3.3 million calls. 

Especialy if we need to perform such actions multiple times, we can better caching the result once per block, thus making it 24 calls total, and then access it with a simple .v(node) method. This caching can be done using the following code: 

What we created here are just two straight hash-tables `lem_feat[b]` and `workw_feat[b]` (where b is the block number in the range of 1 to 24, inclusive). Now the lookups are very simple to perform:

Putting it all together in a working example where we would like to do something with the combination of lemma and working word for all Morpheus analytic blocks.

In [19]:
# Once, before the big loop, cache the md* features we would like to access
BLOCK_RANGE  = range(1, 25)
lem_feat     = {b: Fs(f"md{b}_lem_full_bc")  for b in BLOCK_RANGE}
workw_feat   = {b: Fs(f"md{b}_workw_bc")     for b in BLOCK_RANGE}

import time
t0 = time.perf_counter()   # start timer
counter=0

# Now fire up our word nodes loop
for wordNode in F.otype.s("word"):
    # quickly collect (lemma, workw) pairs for this word
    for block in BLOCK_RANGE:
        lem   = lem_feat[block].v(wordNode)
        if not lem:                      # skip empty Morpheus slots fast
            continue
        workw = workw_feat[block].v(wordNode)
        counter +=1
        # do something interesting with (lem, workw) 

elapsed = time.perf_counter() - t0   #stop timer

print(f"Processed {counter:,} lemma–workw pairs in {elapsed:.2f} s ({counter/elapsed:,.0f} pairs/s)")

Processed 361,580 lemma–workw pairs in 0.67 s (535,984 pairs/s)


The fact that we now have hashable objects opens the door to interesting new analytic usages. 

For example to get a list of tuples (wordNode, lem_full_bc) for all cases where the 24th Morpheus analytic block does contain a lemma:

In [20]:
lem_feat[24].items()

dict_items([(11994, 'e)laia/w'), (14204, 'e)laia/w'), (16185, 'e)laia/w'), (25434, 'e)laia/w'), (26821, 'e)laia/w'), (27825, 'e)laia/w'), (42922, 'dia/,peri/-i(/hmi'), (44695, 'e)laia/w'), (44807, 'e)laia/w'), (46274, 'e)laia/w'), (46880, 'e)laia/w'), (68218, 'a)/llacis'), (78139, 'dia/,a)po/-e)ra/w2'), (82086, 'paraine/w'), (104357, 'a)po/,kata/-la/zomai'), (109819, 'e)n-a)ntia/zw'), (121654, 'katalala/zw')])

Or, just get the frequencyList:

In [21]:
lem_feat[24].freqList()

(('e)laia/w', 10),
 ('a)/llacis', 1),
 ('a)po/,kata/-la/zomai', 1),
 ('dia/,a)po/-e)ra/w2', 1),
 ('dia/,peri/-i(/hmi', 1),
 ('e)n-a)ntia/zw', 1),
 ('katalala/zw', 1),
 ('paraine/w', 1))

Or just get all the wordNodes that have full_lem_bc='e)laia/w' and the 24th Morpheus analytic block populated.

In [22]:
lem_feat[24].s('e)laia/w')

(11994, 14204, 16185, 25434, 26821, 27825, 44695, 44807, 46274, 46880)

Checking what feature is actualy stored under the variable name 'lem_feat' is also easy: 

In [23]:
lem_feat[24].meta['description']

'Morpheus analytic block 24 :lem line - raw lemma (incl. homonym or pl-suffix) in betacode'

# 7 - Attribution and footnotes <a class="anchor" id="bullet7"></a>
##### [Back to ToC](#TOC)

The [N1904-TF dataset](https://centerblc.github.io/N1904/) is available under the [MIT licence](https://github.com/CenterBLC/N1904/blob/main/LICENSE.md), Copyright (c) 2025 Center of Biblical Languages and Computing (CBLC). Formal reference: Tony Jurg, Saulo de Oliveira Cantanhêde, & Oliver Glanz. (2024). *CenterBLC/N1904: Nestle 1904 Text-Fabric data*. Zenodo. DOI: [10.5281/zenodo.13117911](https://doi.org/10.5281/zenodo.13117910).

The N1904addons Text-Fabric dataset is published at [tonyjurg.github.io/N1904addons](https://tonyjurg.github.io/N1904addons/) and made available under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://github.com/tonyjurg/N1904addons/blob/main/LICENSE.md) license.

This Jupyter notebook is released under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://github.com/tonyjurg/N1904addons/blob/main/LICENSE.md).

# 8 - Notebook version<a class="anchor" id="bullet8"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.0</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>5 June 2025</td>
    </tr>
  </table>
</div>