# Hapax legomena (Nestle1904GBI)

## Table of content <a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
    * <a href="#bullet1x1">1.1 - Why is this relevant?</a>
    * <a href="#bullet1x2">1.2 - Translating into Text-Fabric queries</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Performing the queries</a>
    * <a href="#bullet3x1">3.1 - Find the hapax legomena (words)</a>
* <a href="#bullet4">4 - Attribution and footnotes</a>
* <a href="#bullet5">5 - Required libraries</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to TOC](#TOC)

A *hapax legomenon* (plural: *hapax legomena*) is a term used in linguistics and literary analysis to refer to a word or idiomatic expression that appears only once in a specific corpus. In the context of the Bible, *hapax legomena* are words that occur only once in the entire biblical text.

## 1.1 - Why is this relevant? <a class="anchor" id="bullet1x1"></a>

*Hapax legomena*, being unique words in the context of a corpus, can pose challenges for translators and scholars because their meanings may not be evident from their context, as there are no other occurrences to provide insights.

Although a list of *hapax legomenon* can be computed rather easily, the usefullnes of such a list is limited, as there are various types of border cases. Hence, a list provided below has limited value and should be critically examined.

* Technically not a *hapax legomenon*, but in practice, it could be regarded as such. For example, the Greek word ἐπιούσιος is found only twice (so it is actualy a *dis legomenon*), in the Lord's Prayer (Matthew 6:11 and Luke 11:3), where its precise meaning is uncertain, leading to various interpretations such as "daily," "necessary," or "supernatural." <a href="#note1"><sup>1</sup></a>
* Technically a *hapax legomenon* for the New Testament, but found in the Septuagint (LXX). For example, a name (e.g., Δανιήλ) may not have any uncertain meaning. The meaning of other words may be clear from context or found in parallel corpora like the LXX.

## 1.2 - Translating into Text-Fabric queries <a class="anchor" id="#bullet1x2"></a>

For this investigation no standard type query will be used. Instead the build-in Text-Fabric function for 'feature frequency'  will be used.

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

In [3]:
# load the N1904 app and data
N1904 = use ("tonyjurg/Nestle1904GBI", version="0.4", hoist=globals())

**Locating corpus resources ...**

The requested app is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904GBI/app not found
rate limit is 5000 requests per hour, with 5000 left for this hour
	connecting to online GitHub repo tonyjurg/Nestle1904GBI ... connected
	app/README.md...downloaded
	app/config.yaml...downloaded
	app/static...directory
		app/static/display.css...downloaded
	OK


The requested data is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4 not found
rate limit is 5000 requests per hour, with 4984 left for this hour
	connecting to online GitHub repo tonyjurg/Nestle1904GBI ... connected
	tf/0.4/after.tf...downloaded
	tf/0.4/book.tf...downloaded
	tf/0.4/booknum.tf...downloaded
	tf/0.4/bookshort.tf...downloaded
	tf/0.4/case.tf...downloaded
	tf/0.4/chapter.tf...downloaded
	tf/0.4/clause.tf...downloaded
	tf/0.4/clauserule.tf...downloaded
	tf/0.4/clausetype.tf...downloaded
	tf/0.4/degree.tf...downloaded
	tf/0.4/formaltag.tf...downloaded
	tf/0.4/functionaltag.tf...downloaded
	tf/0.4/gloss.tf...downloaded
	tf/0.4/gn.tf...downloaded
	tf/0.4/lemma.tf...downloaded
	tf/0.4/lex_dom.tf...downloaded
	tf/0.4/ln.tf...downloaded
	tf/0.4/monad.tf...downloaded
	tf/0.4/mood.tf...downloaded
	tf/0.4/nodeID.tf...downloaded
	tf/0.4/normalized.tf...downloaded
	tf/0.4/nu.tf...downloaded
	tf/0.4/number.tf...downloaded
	tf/0.4/oslots.tf...downloaded
	

   |     0.17s T otype                from ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4
   |     1.79s T oslots               from ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4
   |     0.49s T after                from ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4
   |     0.50s T chapter              from ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4
   |     0.58s T book                 from ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4
   |     0.51s T verse                from ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4
   |     0.60s T word                 from ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4
   |      |     0.05s C __levels__           from otype, oslots, otext
   |      |     1.68s C __order__            from otype, oslots, __levels__
   |      |     0.07s C __rank__             from otype, __order__
   |      |     2.16s C __levUp__            from otype, oslots, __rank__
   |      |     1.4

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
sentence,5720,24.09,100
verse,7943,17.35,100
clause,16124,8.54,100
phrase,72674,1.9,100
word,137779,1.0,100


In [4]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())

# 3 - Performing the queries <a class="anchor" id="bullet3"></a>
##### [Back to TOC](#TOC)

## 3.1 - Find the hapax legomena (words)<a class="anchor" id="bullet3x1"></a>
##### [Back to TOC](#TOC)

The underlying principle of the script below is rather straightforward.  However, the primary challenge lies in determining the feature to be employed in the identification of hapax legomena. The two most obvious options are:

 * normalized: This is basicly a 'cleaned up' version of the surface text. It does take into account forms where inflections of verbs and declensions of nouns are considered as separate words. The normalization is required to account for variations in accentuation.

 * lemma: here the base or root form of words, known as lemmas, serves as the basis for frequency calculations. When based upon feature "lemma", there are a few instances reported which refer to a specific sense associated with that lemma. For example, lemma="βάτος (II)" is only found once (in Luke 16:6), while lemma="βάτος (I)" is found five times in the NT. 
 
Note that the latter approach represents the customary interpretation for ascertaining hapax legomena.

In [12]:
FeatureFrequenceLists=Fs("lemma").freqList()
for item, freq in FeatureFrequenceLists:
    if freq==1: print (item)

Αἰνών
Αὐγοῦστος
Βάαλ
Βαλάκ
Βαράκ
Βαραχίας
Βαριησοῦς
Βαριωνᾶ
Βαρτιμαῖος
Βελιάρ
Βεροιαῖος
Βηθζαθά
Βλάστος
Βοανηργές
Βοσόρ
Γάδ
Γάζα
Γαββαθά
Γαδαρηνός
Γαλάτης
Γεδεών
Γώγ
Δάμαρις
Δαλμανουθά
Δαλματία
Δαμασκηνός
Δανιήλ
Δερβαῖος
Διονύσιος
Διοτρέφης
Διόσκουροι
Δρούσιλλα
Εὐνίκη
Εὐοδία
Εὔβουλος
Εὔτυχος
Ζάρα
Ζηνᾶς
Θάρα
Θαμάρ
Θευδᾶς
Κάρπος
Κίς
Κανδάκη
Καῦδα
Κεδρών
Κλήμης
Κλαυδία
Κλεοπᾶς
Κλωπᾶς
Κνίδος
Κολοσσαί
Κούαρτος
Κρήσκης
Κυρήνη
Κυρήνιος
Κωσάμ
Κόρε
Κώς
Λάμεχ
Λίνος
Λαοδικεύς
Λασαία
Λευιτικός
Λιβερτῖνος
Λιβύη
Λυκία
Λυκαονία
Λυκαονιστί
Λυσανίας
Λωΐς
Μάαθ
Μάλχος
Μαγαδάν
Μαγώγ
Μαδιάμ
Μαθουσαλά
Μαλελεήλ
Μαναήν
Ματταθά
Μελίτη
Μελεά
Μεννά
Μιτυλήνη
Μνάσων
Μόλοχ
Μύρα
Μῆδος
Νάρκισσος
Νίγερ
Ναΐν
Ναγγαί
Ναθάμ
Ναιμάν
Ναούμ
Ναχώρ
Νηρί
Νηρεύς
Νικάνωρ
Νικόλαος
Νικόπολις
Νύμφα
Οὐρίας
Οὐρβανός
Πάρθος
Πάταρα
Πάτμος
Παρμενᾶς
Πατροβᾶς
Περσίς
Πισίδιος
Πισιδία
Ποντικός
Ποτίολοι
Πούδης
Πρόχορος
Πτολεμαΐς
Πόρκιος
Πύρρος
Σάμος
Σάπφιρα
Σάρεπτα
Σέργιος
Σήθ
Σήμ
Σαλαμίς
Σαλείμ
Σαλμώνη
Σαμοθρᾴκη
Σαμψών
Σαρών
Σεκοῦνδος
Σελεύκε

# 4 - Attribution and footnotes<a class="anchor" id="bullet4"></a>
##### [Back to TOC](#TOC)

#### Footnotes:
<a class="anchor" id="note1"></a><sup>1</sup> See the extensive discussion on ἐπιούσιος in: Brant Pitre, *Jesus and the Jewish Roots of the Eucharist, Unlocking the Secrets of the Last Supper* (New York: Doubleday, 2011), 93-96; *pasim*.

# 5 - Required libraries <a class="anchor" id="bullet5"></a>
##### [Back to TOC](#TOC)

The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:

    {none}

You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.