# Some system statistics (Nestle1904LFT)

**Work in progress!**

## Table of content <a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Performing the queries</a>
    * <a href="#bullet3x1">3.1 - Print the Text-Fabric version</a>
    * <a href="#bullet3x2">3.2 - Dump selection of header </a>    
    * <a href="#bullet3x3">3.3 - Memory footprint</a>    
    * <a href="#bullet3x4">3.4 - List loaded features</a> 
    * <a href="#bullet3x5">3.5 - Statistics on node types</a>
    * <a href="#bullet3x6">3.6 - Node number ranges</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to TOC](#TOC)

This Jupyter Notebook showcases several examples of statistical analysis performed on a Text-Fabric corpus.

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

In [1]:
%load_ext autoreload
%autoreload 2

In [1]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

In [2]:
# load the N1904 app and data
N1904 = use ("tonyjurg/Nestle1904LFT", version="0.6", hoist=globals())

**Locating corpus resources ...**

The requested app is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app not found


The requested data is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 not found


   |     0.21s T otype                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     2.28s T oslots               from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.49s T after                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.61s T unicode              from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.62s T normalized           from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.59s T wordunacc            from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.60s T word                 from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.59s T wordtranslit         from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.46s T verse                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.54s T chapter              from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7943,17.35,100
sentence,8011,17.2,100
wg,105430,6.85,524
word,137779,1.0,100


In [3]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())

In [4]:
# Set default view in a way to limit noise as much as possible.
N1904.displaySetup(condensed=True, multiFeatures=False, queryFeatures=False)

# 3 - Performing the queries <a class="anchor" id="bullet3"></a>
##### [Back to TOC](#TOC)

## 3.1 - Print the Text-Fabric version <a class="anchor" id="bullet3x1"></a>
##### [Back to TOC](#TOC)

Although this is somewhat trivial, this example does serve a purpose. We will print te version by means of calling the Text-Fabric parameter [VERSION](https://annotation.github.io/text-fabric/tf/parameters.html#tf.parameters.VERSION) which is fixed for the whole programm. To access any of these parameters in our notebook, it first needs to be imported from `tf.parameters`.

In [5]:
from tf.parameters import VERSION
print ('TextFabric version: {}'.format(VERSION))

TextFabric version: 12.1.5


Note that any other parameters can be dumped in similar manner.

In [6]:
N1904.showContext(...)

## 3.2 - Dump selection of header<a class="anchor" id="bullet3x2"></a>
##### [Back to TOC](#TOC)

In this example the header of the loaded Text-Fabric dataset is dumped. This is done by means of an API call to [`A.header()`](https://annotation.github.io/text-fabric/tf/advanced/links.html#tf.advanced.links.header). 

Please note that in the example below `A` is replaced by `N1904`. This is result of the method of incantation:
> N1904 = use (... *etc* ... )

The [`use`](https://annotation.github.io/text-fabric/tf/app.html#tf.app.use) function returns an oject whose attributes and methods constitute the advanced API. In the 




In [9]:
N1904.header(allMeta=False)

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7943,17.35,100
sentence,8011,17.2,100
wg,105430,6.85,524
word,137779,1.0,100


## 3.3 - Memory footprint <a class="anchor" id="bullet3x3"></a>
##### [Back to TOC](#TOC)

The following API call [`footprint`](https://annotation.github.io/text-fabric/tf/core/api.html#tf.core.api.Api.footprint) provides a nicely formatted overview of memory footprint for each of the features in the Text_fabric corpus.

In [8]:
TF.footprint()

                                                


# 65 features

feature | members | size in bytes
--- | --- | ---
__levUp__ | 259,450 | 41,713,080
ref | 137,779 | 17,322,488
reference | 137,779 | 17,322,488
oslots | 3 | 15,801,460
__levDown__ | 121,671 | 14,355,792
monad | 137,779 | 12,951,416
nodeID | 137,779 | 12,695,390
__boundary__ | 2 | 12,320,324
unicode | 137,779 | 11,836,548
word | 137,779 | 11,208,988
normalized | 137,779 | 11,109,174
wordunacc | 137,779 | 11,064,793
gloss | 137,779 | 10,313,083
wordtranslit | 137,779 | 10,122,641
containedclause | 137,779 | 9,884,154
lemma | 137,779 | 9,669,633
book | 154,020 | 9,557,063
chapter | 153,993 | 9,554,764
ln | 137,779 | 9,532,539
subj_ref | 137,779 | 9,454,578
strongs | 137,779 | 9,382,659
sentence | 145,790 | 9,350,252
__order__ | 259,450 | 9,340,240
verse | 145,722 | 9,323,176
lex_dom | 137,779 | 9,198,777
morph | 137,779 | 9,160,456
wordlevel | 137,779 | 9,108,780
roleclausedistance | 137,779 | 9,108,670
bookshort | 137,806 | 9,102,960
booknumber | 137,806 | 9,101,528
sp_full | 137,779 | 9,101,394
sp | 137,779 | 9,101,351
wordrolelong | 137,779 | 9,101,348
type | 137,779 | 9,101,345
wordrole | 137,779 | 9,101,278
mood | 137,779 | 9,101,174
tense | 137,779 | 9,101,160
case | 137,779 | 9,101,108
markafter | 137,779 | 9,101,102
after | 137,779 | 9,101,054
markbefore | 137,779 | 9,101,052
voice | 137,779 | 9,101,049
punctuation | 137,779 | 9,101,048
markorder | 137,779 | 9,101,021
gn | 137,779 | 9,100,991
person | 137,779 | 9,100,984
degree | 137,779 | 9,100,941
nu | 137,779 | 9,100,933
number | 137,779 | 9,100,933
wgnum | 105,430 | 8,645,128
wgrule | 105,430 | 8,243,720
wglevel | 105,430 | 8,199,368
wgrole | 105,430 | 8,195,670
wgrolelong | 105,430 | 8,195,576
wgclass | 105,430 | 8,195,569
clausetype | 105,430 | 8,195,219
wgtype | 105,430 | 8,195,162
junction | 105,430 | 8,195,108
__structure__ | 6 | 3,095,470
__rank__ | 259,450 | 1,077,884
otype | 4 | 973,860
__sections__ | 2 | 570,276
headverse | 8,011 | 523,269
__characters__ | 5 | 99,055
__levels__ | 6 | 1,300
TOTAL | 7,967,669 | 613,490,794

## 3.4 - List loaded features <a class="anchor" id="bullet3x4"></a>
##### [Back to TOC](#TOC)

The API call [`A.isLoaded()`](https://annotation.github.io/text-fabric/tf/core/api.html#tf.core.api.Api.isLoaded) will show information about loaded features.

In [10]:
N1904.isLoaded()

__boundary__         computed  
__characters__       computed  
__levDown__          computed  
__levUp__            computed  
__levels__           computed  
__order__            computed  
__rank__             computed  
__sections__         computed  
__structure__        computed  
after                node (str) ✅ Characters (eg. punctuations) following the word
book                 node (str) ✅ Book name (in English language)
booknumber           node (int) ✅ NT book number (Matthew=1, Mark=2, ..., Revelation=27)
bookshort            node (str) ✅ Book name (abbreviated)
case                 node (str) ✅ Gramatical case (Nominative, Genitive, Dative, Accusative, Vocative)
chapter              node (int) ✅ Chapter number inside book
clausetype           node (str) ✅ Clause type details (e.g. Verbless, Minor)
containedclause      node (str) 🆗 Contained clause (WG number)
degree               node (str) ✅ Degree (e.g. Comparitative, Superlative)
gloss                node (str) ✅ Eng

## 3.5 - Statistics on node types<a class="anchor" id="bullet3x5"></a>
##### [Back to TOC](#TOC)

This example will show various statistics on node types. The call to [`C.levels.data`](https://annotation.github.io/text-fabric/tf/core/prepare.html#tf.core.prepare.levels) results in list of ordered tuples which will be nicely displayed using the tabulate function.

In [11]:
# Library to format table
from tabulate import tabulate
headers = ["Node", "Avarage # of slots","first","last"]
ResultList= C.levels.data
print(tabulate(ResultList, headers=headers, tablefmt='fancy_grid'))

╒══════════╤══════════════════════╤═════════╤════════╕
│ Node     │   Avarage # of slots │   first │   last │
╞══════════╪══════════════════════╪═════════╪════════╡
│ book     │           5102.93    │  137780 │ 137806 │
├──────────┼──────────────────────┼─────────┼────────┤
│ chapter  │            529.919   │  137807 │ 138066 │
├──────────┼──────────────────────┼─────────┼────────┤
│ verse    │             17.346   │  146078 │ 154020 │
├──────────┼──────────────────────┼─────────┼────────┤
│ sentence │             17.1987  │  138067 │ 146077 │
├──────────┼──────────────────────┼─────────┼────────┤
│ wg       │              6.85239 │  154021 │ 259450 │
├──────────┼──────────────────────┼─────────┼────────┤
│ word     │              1       │       1 │ 137779 │
╘══════════╧══════════════════════╧═════════╧════════╛


## 3.6 - Node number ranges <a class="anchor" id="bullet3x6"></a>
##### [Back to TOC](#TOC)

In [12]:
for NodeType in F.otype.all:
    print (NodeType, F.otype.sInterval(NodeType))

book (137780, 137806)
chapter (137807, 138066)
verse (146078, 154020)
sentence (138067, 146077)
wg (154021, 259450)
word (1, 137779)


Note that the ranges shown as output of this command are (except, possibly with repect to order) the same as found in file `otype.tf`:
>
```@node
@TextFabric version=11.4.10
...
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2023-06-19T16:21:20Z

1-137779	word
137780-137806	book
137807-138066	chapter
138067-154190	clause
154191-226864	phrase
226865-232584	sentence
232585-240527	verse
```