# Vocalisation of the Tetragrammaton

## Table of content <a class="anchor" id="TOC"></a>

* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Performing the queries</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to TOC](#TOC)

The Old Testament contains the how the Tetragrammaton יהוה written with different vowels, for example with the vowals of of אֲדֹנַי (Adonai, ETCBC transliteration: >:ADON@J).

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

In [None]:
%load_ext autoreload
%autoreload 2

In [2]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment.
from tf.fabric import Fabric
from tf.app import use

In [4]:
# load the BHS app and data
BHS = use ("etcbc/BHSA",hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots/node,% coverage
book,39,10938.21,100
chapter,929,459.19,100
lex,9230,46.22,100
verse,23213,18.38,100
half_verse,45179,9.44,100
sentence,63717,6.7,100
sentence_atom,64514,6.61,100
clause,88131,4.84,100
clause_atom,90704,4.7,100
phrase,253203,1.68,100


Note: Thefeature documentation can be found at [ETCBC GitHub](https://github.com/ETCBC/bhsa/blob/master/docs/features/0_home.md) 

In [65]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
BHS.dh(BHS.getCss())

# 3 - Performing the queries <a class="anchor" id="bullet3"></a>
##### [Back to TOC](#TOC)

## 3.1 - Get overview of all pointed versions <a class="anchor" id="bullet3x1"></a>
##### [Back to TOC](#TOC)

First get all occurances of the Tetragrammaton יהוה (so without vowel pointing and other diacritical marks). See also notes on [feature g_word](https://github.com/ETCBC/bhsa/blob/master/docs/features/g_word.md). 

In [118]:
JHWHQuery = '''
book
  chapter
     verse
       word g_cons=JHWH 
'''

JHWHResults = BHS.search(JHWHQuery)

  0.46s 6828 results


Now post process the results.

In [117]:
# Library to format table
from tabulate import tabulate
# library for regular expressions
import re

ResultDict = {}
for Item in JHWHResults:
    Node = Item[3]
    # Get the pointed representation of a word occurrence in BHSA transliteration.
    PointedWord = F.g_word.v(Node)
    HebrewWord = F.g_word_utf8.v(Node)
        
    # Remove cantilations in the BSHA (presented by digits)
    VocalizedWord = re.sub(r'\d', '', PointedWord)
    
    if VocalizedWord in ResultDict:
        # If it exists, add the count to the existing value
        ResultDict[VocalizedWord][0] += 1 # Increase frequency count
    else:
         # If it doesn't exist, initialize the count and store FirstOccurance
        FirstOccurance = T.sectionFromNode(Node)
        ResultDict[VocalizedWord] = [1, FirstOccurance,HebrewWord]  
        
# Convert the dictionary into a list of key-value pairs and sort it according to frequency
UnsortedTableData = [[key, value[0], value[1],value[2]] for key, value in ResultDict.items()]
TableData = sorted(UnsortedTableData, key=lambda row: row[1], reverse=True)

# Produce the table
headers = ["pointing", "frequency", "first occurance", "first hebrew word"]
print(tabulate(TableData, headers=headers, tablefmt='fancy_grid'))


╒════════════╤═════════════╤════════════════════════╤═════════════════════╕
│ pointing   │   frequency │ first occurance        │ first hebrew word   │
╞════════════╪═════════════╪════════════════════════╪═════════════════════╡
│ J:HW@H     │        5682 │ ('Genesis', 2, 4)      │ יְהוָ֥ה                │
├────────────┼─────────────┼────────────────────────┼─────────────────────┤
│ JHW@H      │         788 │ ('Genesis', 4, 3)      │ יהוָֽה                │
├────────────┼─────────────┼────────────────────────┼─────────────────────┤
│ J:HWIH     │         270 │ ('Deuteronomy', 3, 24) │ יְהוִ֗ה                │
├────────────┼─────────────┼────────────────────────┼─────────────────────┤
│ J:HOW@H    │          45 │ ('Genesis', 3, 14)     │ יְהֹוָ֨ה                │
├────────────┼─────────────┼────────────────────────┼─────────────────────┤
│ J:HOWIH    │          32 │ ('1_Kings', 2, 26)     │ יְהֹוִה֙                │
├────────────┼─────────────┼────────────────────────┼───────────────────

## Some other playing around

Add another condition to the query. This is to select for the wowels for adOnAi, translatiteratd as O and @, which should be around the Wav. The regexp inludes '.*' to allow for in-between cantilation marks.

In [166]:
AdonaiQuery = '''
word g_cons=JHWH g_word~O.*W.*@
'''

AdonaiResults = BHS.search(AdonaiQuery)

  0.29s 51 results


In [167]:
BHS.table(AdonaiResults, condensed=False, extraFeatures={'voc_lex'})

n,p,word
1,Genesis 3:14,יְהֹוָ֨ה
2,Genesis 9:26,יְהֹוָ֖ה
3,Genesis 18:17,יהֹוָ֖ה
4,Exodus 3:2,יְהֹוָ֥ה
5,Exodus 13:3,יְהֹוָ֛ה
6,Exodus 13:9,יְהֹוָ֖ה
7,Exodus 13:12,יהֹוָ֑ה
8,Exodus 13:15,יְהֹוָ֤ה
9,Exodus 14:1,יְהֹוָ֖ה
10,Exodus 14:8,יְהֹוָ֗ה


In [165]:
AdonaiQuery2 = '''
word lex=JHWH/ g_word~O.*W.*@
'''

AdonaiResults2 = BHS.search(AdonaiQuery2)

  0.33s 51 results


Print the features associated with word nodes that containing data

In [164]:
FeatureList=Fall()
for Item in AdonaiResults2:
    Node=Item[0]
    for Feature in FeatureList:
        FeatureValue=Fs(Feature).v(Node)
        if type(FeatureValue)!=type(None): print (Feature,'=',FeatureValue)
    break

freq_lex = 6828
g_cons = JHWH
g_cons_utf8 = יהוה
g_lex = J:HOW@H
g_lex_utf8 = יְהֹוָה
g_word = J:HOW@63H
g_word_utf8 = יְהֹוָ֨ה
gloss = YHWH
gn = m
language = Hebrew
lex = JHWH/
lex_utf8 = יהוה
ls = none
nametype = pers
nme = 
nu = sg
number = 1427
otype = word
pdp = nmpr
pfm = n/a
phono = [yᵊhôˌāh]
phono_trailer =  
prs = n/a
prs_gn = NA
prs_nu = NA
prs_ps = NA
ps = NA
rank_lex = 6
sp = nmpr
st = a
trailer =  
trailer_utf8 =  
uvf = absent
vbe = n/a
vbs = n/a
voc_lex = J:HW@H
voc_lex_utf8 = יְהוָה
vs = NA
vt = NA


In [121]:
# Library to format table
from tabulate import tabulate
# library for regular expressions
import re

ResultDict = {}
for Item in JHWHResults:
    Node = Item[3]
    # Get the pointed representation of a word occurrence in BHSA transliteration.
    PointedWord = F.g_word.v(Node)
    HebrewWord = F.g_word_utf8.v(Node)
        
    # Remove cantilations in the BSHA (presented by digits)
    VocalizedWord = re.sub(r'\d', '', PointedWord)
    
    if VocalizedWord in ResultDict:
        # If it exists, add the count to the existing value
        ResultDict[VocalizedWord][0] += 1 # Increase frequency count
    else:
         # If it doesn't exist, initialize the count and store FirstOccurance
        FirstOccurance = T.sectionFromNode(Node)
        ResultDict[VocalizedWord] = [1, FirstOccurance,HebrewWord]  
        
# Convert the dictionary into a list of key-value pairs and sort it according to frequency
UnsortedTableData = [[key, value[0], value[1],value[2]] for key, value in ResultDict.items()]
TableData = sorted(UnsortedTableData, key=lambda row: row[1], reverse=True)

# Produce the table
headers = ["pointing", "frequency", "first occurance", "first hebrew word"]
print(tabulate(TableData, headers=headers, tablefmt='fancy_grid'))

╒════════════╤═════════════╤════════════════════════╤═════════════════════╕
│ pointing   │   frequency │ first occurance        │ first hebrew word   │
╞════════════╪═════════════╪════════════════════════╪═════════════════════╡
│ J:HW@H     │        5682 │ ('Genesis', 2, 4)      │ יְהוָ֥ה                │
├────────────┼─────────────┼────────────────────────┼─────────────────────┤
│ JHW@H      │         788 │ ('Genesis', 4, 3)      │ יהוָֽה                │
├────────────┼─────────────┼────────────────────────┼─────────────────────┤
│ J:HWIH     │         270 │ ('Deuteronomy', 3, 24) │ יְהוִ֗ה                │
├────────────┼─────────────┼────────────────────────┼─────────────────────┤
│ J:HOW@H    │          45 │ ('Genesis', 3, 14)     │ יְהֹוָ֨ה                │
├────────────┼─────────────┼────────────────────────┼─────────────────────┤
│ J:HOWIH    │          32 │ ('1_Kings', 2, 26)     │ יְהֹוִה֙                │
├────────────┼─────────────┼────────────────────────┼───────────────────

In [151]:
QereQuery = '''
word qere_utf8 g_cons=JHWH
'''

QereResults = BHS.search(QereQuery)

  0.28s 0 results


In [152]:
for Item in QereResults:
    Node = Item[0]
    PointedWord = F.g_word.v(Node)
    QereWord =F.qere.v(Node)
    UncantQereWord=re.sub(r'\d', '', QereWord)
    print (PointedWord,QereWord,UncantQereWord)
    break