# Use Case 3: Searching for Hidden Lemmas (Zeus)

## Table of content (ToC)<a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load TF with the N1904addons</a>
* <a href="#bullet3">3 - Analysis</a>
    * <a href="#bullet3x1">3.1 - Identify wordforms potentialy derived from multiple lemmas</a>
    * <a href="#bullet3x2">3.2 - Examine words with multiple potential lemmas assigned</a>
    * <a href="#bullet3x3">3.3 - Now look for 'hidden' instances of Zeus</a>
    * <a href="#bullet3x4">3.4 - Examine the acknowledged occurences of Zeus in the GNT</a>
    * <a href="#bullet3x5">3.5 - Take a closer look at the other Διὰ instances</a>
* <a href="#bullet4">4 - Required libraries</a>
* <a href="#bullet5">5 - Attribution and footnotes</a>
* <a href="#bullet6">6 - Notebook version</a>


#  1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

This notebook first demonstrates how to quickly identify wordforms in the N1904-TF dataset that allow for multiple lemmatizations.  It then examines a case that is a misidentification (διά as Zeus).

The trigger for this example is a blog post from the research group <a href="https://sdam-au.github.io/sdam-au/digital-pondering/2020/01/31/greek-texts.html" target="_blank">Social Dynamics in the Ancient Mediterranean (SDAM)</a> at Aarhus University. Theys reported that applying CLTK's <a href="https://docs.cltk.org/en/latest/cltk.lemmatize.html" targe="_blank">`lemmatizer.lemmatize()`</a> to Aristotle's works yielded over 5,000 instances of Zeus, a sharp contrast to the <a href="https://stephanus.tlg.uci.edu/" target="_blank">Thesaurus Linguae Graecae® (TLG)</a>, which lists only 48. They further concluded that this discrepancy is largely due to the lemmatizer mistakenly identifying the common preposition διά as a form of Zeus. <a href="#footnote1"><sup>1</sup></a>

# 2 - Load TF with the N1904addons <a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

The following codeblocks will load the N1904-TF dataset together with the standard N1904addons dataset.

In [1]:
# Load the autoreload extension to automatically reload modules before executing code
%load_ext autoreload
%autoreload 2

In [2]:
# Loading the Text-Fabric code
from tf.fabric import Fabric
from tf.app import use

In [3]:
# Load the N1904-TF app and data with the additional features
A = use ("CenterBLC/N1904", mod=["tonyjurg/N1904addons/tf/", "tonyjurg/N1904addons/detailed_set"], silence="terse", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
group,8945,7.01,46
clause,42506,8.36,258
wg,106868,6.88,533
phrase,69007,1.9,95
subphrase,116178,1.6,135
word,137779,1.0,100


Display is setup for viewtype [syntax-view](https://github.com/saulocantanhede/tfgreek2/blob/main/docs/syntax-view.md#start)

See [here](https://github.com/saulocantanhede/tfgreek2/blob/main/docs/viewtypes.md#start) for more information on viewtypes

In [4]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
A.dh(A.getCss())

# 3 - Analysis <a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

## 3.1 - Identify wordforms potentialy derived from multiple lemmas <a class="anchor" id="bullet3x1"></a> 

First step is to identify word nodes that can be derived from multiple lemmas. Obvious in reality a *specific* word instance is only derived from a single lemma. 

To identify the cases where a wordform can be derived form multiple lemmas, one can easily use feature <a href="https://tonyjurg.github.io/N1904addons/features/mm_num_lemmas.html" target="_blank">`mm_num_lemmas`</a>. Since this feature has datatype `int`, we can use numeric operators on its value. 

In [5]:
verseNodes=A.search("word mm_num_lemmas>1")

  0.12s 57331 results


## 3.2 - Examine words with multiple potential lemmas assigned<a class="anchor" id="bullet3x2"></a> 

Now first construct a list by concatenating “static” and “dynamic” elements using list comprehensions and string formatting (f-strings). This list will contain all the features that we want to show in the syntax tree.



In [6]:
number_of_ms_sets=11 # Python range(start, stop) is start-inclusive, stop-exclusive.
morphFeaturesList = (
    ['lemma','morph','betacode']
    + [f'ms{i}_lem_full_uc' for i in range(1, number_of_ms_sets)]
    + [f'ms{i}_morph'       for i in range(1, number_of_ms_sets)]
)

The effect of this list comprehension can be easily checked: 

In [7]:
morphFeaturesList

['lemma',
 'morph',
 'betacode',
 'ms1_lem_full_uc',
 'ms2_lem_full_uc',
 'ms3_lem_full_uc',
 'ms4_lem_full_uc',
 'ms5_lem_full_uc',
 'ms6_lem_full_uc',
 'ms7_lem_full_uc',
 'ms8_lem_full_uc',
 'ms9_lem_full_uc',
 'ms10_lem_full_uc',
 'ms1_morph',
 'ms2_morph',
 'ms3_morph',
 'ms4_morph',
 'ms5_morph',
 'ms6_morph',
 'ms7_morph',
 'ms8_morph',
 'ms9_morph',
 'ms10_morph']

Now we can use this list as argument to `extraFeatures`.

In [8]:
A.show(verseNodes,start=1,end=1, extraFeatures=morphFeaturesList,queryFeatures=False)

In this display all the yellow blocks are word nodes that have more than one lemma associated with it.

## 3.3 - Now look for 'hidden' instances of Zeus <a class="anchor" id="bullet3x3"></a> 

Since the SDAM blog post reported that 'dia' resulted in many hits for Zeus, the same can be tried on the New Testament to see if there are cases of 'hidden' Zues references.

A very simple way to retrieve all lemmas for διά in the current N1904-TF dataset is to use the <a href="https://centerblc.github.io/N1904/features/lemmatranslit.html" target="_blank">`lemmatranslit`</a> feature. However, this approach is simplistic, as it overlooks the relevance of accentual distinctions. In this case, it is used deliberately to illustrate how a flawed argument can arise.

In [9]:
zeusNodes=A.search("word lemmatranslit=dia")

  0.07s 667 results


One might be tempted to stop here and speculate about the number’s proximity to 666. However, a closer look at the first reported instances offers no support for any ‘hidden’ connection to Zeus.

In [10]:
A.show(zeusNodes,start=1,end=1, extraFeatures=morphFeaturesList,queryFeatures=False)

## 3.4 - Examine the acknowledged occurences of Zeus in the GNT <a class="anchor" id="bullet3x4"></a> 

For this we first examine the exact for Morpheus reported for Acts 14:12 (Δία) and Acts 14:13 (Διὸς), as these are the two acknowledged instances of lemma ζεύς in the N1904-TF. Since we know the verses, the query is very simple.

In [11]:
actsNodes=A.search('verse book=Acts chapter=14 verse=12|13')

  0.01s 2 results


Now display the results using the previous build list `morphFeaturesList`.

In [12]:
A.show(actsNodes,start=1,end=1, extraFeatures=morphFeaturesList,queryFeatures=False)

The next step is to conduct a block wise search for all instances where Morpheus identifies Ζεύς as a potential lemma for a given wordform. The code block below compiles a list of such cases, accompanied by a rather mechanical translation of the corresponding verse. Within each translation, the portion corresponding to the alternative Ζεύς is marked by curley brackets.

In [13]:
BLOCK_RANGE  = range(1, 25)
lem_feat     = {b: Fs(f"md{b}_lem_base_uc")  for b in BLOCK_RANGE}

numOccurences=0
for wordNode in F.otype.s('word'):
    for block in BLOCK_RANGE:
        if lem_feat[block].v(wordNode) == "Ζεύς":
            numOccurences+=1
            verseNode = L.u(wordNode, 'verse')[0]
            transWords = []
            t=''
            for w in L.d(verseNode, 'word'):
                if w == wordNode:
                    t += ' {'+F.trans.v(w)+' / Zeus?}'
                else:
                    t = F.trans.v(w) or ''
                transWords.append(t)
            trans = ' '.join(transWords)
            print(T.sectionFromNode(wordNode), f'{F.text.v(wordNode)} in {block=} of {F.mm_num_blocks.v(wordNode)} : {trans}')
print (f'Found {numOccurences} instances of Zues')

('Matthew', 6, 25) Διὰ in block=8 of 13 :  {Because of / Zeus?} this I say to you not be anxious about the life of you what you should eat or what you should drink nor the body of you what you should put on Not the life more is than the food and the body than clothing
('Matthew', 9, 11) Διὰ in block=8 of 13 : And having seen [it] the Pharisees said to disciples of Him of Him {Because of / Zeus?} why with the tax collectors and sinners eats the Teacher of you
('Matthew', 9, 14) Διὰ in block=8 of 13 : Then come to Him the disciples of John saying saying {Because of / Zeus?} why we and the Pharisees do fast the however disciples of You not fast
('Matthew', 12, 31) Διὰ in block=8 of 13 :  {Because of / Zeus?} this I say to you every sin and blasphemy will be forgiven - men - however against [the] Spirit blasphemy not will be forgiven
('Matthew', 13, 10) Διὰ in block=8 of 13 : And having come to [Him] the disciples said to Him to Him {Because of / Zeus?} why in parables speak You to them
('

This list of verses makes it clear that the standard interpretation is coherent, whereas substituting the marked parts with some reference to Zeus would render the sentences unintelligible. Obvious the two verses in Acts 14 are the exception, but they were already rendered as Zeus.

## 3.5 - Take a closer look at the other Διὰ instances <a class="anchor" id="bullet3x5"></a> 

Now let us examine the first case:

> ('Matthew', 6, 25) Διὰ in block=8 of 13 :  {Because of / Zeus?} this I say to you not be anxious about the life of you what you should eat or what you should drink nor the body of you what you should put on Not the life more is than the food and the body than clothing

So let us retrieve that verse:

In [14]:
zeusNodes=A.search('verse book=Matthew chapter=6 verse=25')

  0.00s 1 result


And show its syntax tree:

In [16]:
A.show(zeusNodes,start=1,end=1, extraFeatures=morphFeaturesList,queryFeatures=False)

Notably, all the “hidden Zeus” instances occur at the beginning of a sentence, where Διὰ is capitalised. This likely prompted the Morpheus analyser to identify it as a potential reference to the deity. Additionally, the accent patterns differ between the preposition and the proper noun, marking a meaningful distinction that further disambiguates their interpretation. Hence the conclusion is that there are no 'hidden' references to Zeus in the Greek New Testament.

# 4 - Required libraries<a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

Since the scripts in this notebook utilize Text-Fabric, [it requires currently (Apr 2025) Python >=3.9.0](https://pypi.org/project/text-fabric) together with the following libraries installed in the environment:

    none
    
You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.

# 5 - Attribution and footnotes <a class="anchor" id="bullet5"></a>
##### [Back to ToC](#TOC)

#### Attribution

The [N1904-TF dataset](https://centerblc.github.io/N1904/) is available under the [MIT licence](https://github.com/CenterBLC/N1904/blob/main/LICENSE.md). Formal reference: 
> Tony Jurg, Saulo de Oliveira Cantanhêde, & Oliver Glanz. (2024). *CenterBLC/N1904: Nestle 1904 Text-Fabric data*. Zenodo. DOI: [10.5281/zenodo.13117911](https://doi.org/10.5281/zenodo.13117910).

The N1904addons Text-Fabric dataset is published at [tonyjurg.github.io/N1904addons](https://tonyjurg.github.io/N1904addons/) and made available under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://github.com/tonyjurg/N1904addons/blob/main/LICENSE.md) license.

This Jupyter notebook is released under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://github.com/tonyjurg/N1904addons/blob/main/LICENSE.md).

#### Footnotes

<a class="anchor" id="footnote1"></a><sup>1</sup> The SDAM blog is found on [https://sdam-au.github.io/sdam-au/digital-pondering/2020/01/31/greek-texts.html](https://sdam-au.github.io/sdam-au/digital-pondering/2020/01/31/greek-texts.html).

# 6 - Notebook version<a class="anchor" id="bullet6"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.1</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>July 24, 2025</td>
    </tr>
  </table>
</div>