# Some system statistics (Nestle1904LFT)

**Work in progress!**

## Table of content <a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Performing the queries</a>
    * <a href="#bullet3x1">3.1 - Print the Text-Fabric version</a>
    * <a href="#bullet3x2">3.2 - Dump selection of header </a>    
    * <a href="#bullet3x3">3.3 - Memory footprint</a>    
    * <a href="#bullet3x4">3.4 - List loaded features</a> 
    * <a href="#bullet3x5">3.5 - Statistics on node types</a>
    * <a href="#bullet3x6">3.6 - Node number ranges</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to TOC](#TOC)

This Jupyter Notebook showcases several examples of statistical analysis performed on a Text-Fabric corpus.

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

In [1]:
%load_ext autoreload
%autoreload 2

In [1]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

In [2]:
# load the N1904 app and data
N1904 = use ("tonyjurg/Nestle1904LFT", version="0.3", hoist=globals())

**Locating corpus resources ...**

The requested app is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app not found


findAppClass: invalid syntax (~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app/app.py, line 5)


findAppClass: Api for "tonyjurg/Nestle1904LFT" not loaded
The requested data is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3 not found


Name,# of nodes,# slots/node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7943,17.35,100
sentence,8011,17.2,100
wg,114879,7.6,633
word,137779,1.0,100


In [None]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())

# 3 - Performing the queries <a class="anchor" id="bullet3"></a>
##### [Back to TOC](#TOC)

## 3.1 - Print the Text-Fabric version <a class="anchor" id="bullet3x1"></a>
##### [Back to TOC](#TOC)

Although this is somewhat trivial, this example does serve a purpose. We will print te version by means of calling the Text-Fabric parameter [VERSION](https://annotation.github.io/text-fabric/tf/parameters.html#tf.parameters.VERSION) which is fixed for the whole programm. To access any of these parameters in our notebook, it first needs to be imported from `tf.parameters`.

In [3]:
from tf.parameters import VERSION
print ('TextFabric version: {}'.format(VERSION))

TextFabric version: 11.4.10


Note that any other parameters can be dumped in similar manner.

In [4]:
N1904.showContext(...)

<details open><summary><b>tonyjurg/Nestle1904LFT</b> <i>app context</i></summary>

</details>


## 3.2 - Dump selection of header<a class="anchor" id="bullet3x2"></a>
##### [Back to TOC](#TOC)

In this example the header of the loaded Text-Fabric dataset is dumped. This is done by means of an API call to [`A.header()`](https://annotation.github.io/text-fabric/tf/advanced/links.html#tf.advanced.links.header). 

Please note that in the example below `A` is replaced by `N1904`. This is result of the method of incantation:
> N1904 = use (... *etc* ... )

The [`use`](https://annotation.github.io/text-fabric/tf/app.html#tf.app.use) function returns an oject whose attributes and methods constitute the advanced API. In the 




In [5]:
N1904.header(allMeta=False)

Name,# of nodes,# slots/node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7943,17.35,100
sentence,8011,17.2,100
wg,114879,7.6,633
word,137779,1.0,100


## 3.3 - Memory footprint <a class="anchor" id="bullet3x3"></a>
##### [Back to TOC](#TOC)

The following API call [`footprint`](https://annotation.github.io/text-fabric/tf/core/api.html#tf.core.api.Api.footprint) provides a nicely formatted overview of memory footprint for each of the features in the Text_fabric corpus.

In [6]:
TF.footprint()

                                                


# 60 features

feature | members | size in bytes
--- | --- | ---
__levUp__ | 268,899 | 53,887,912
ref | 137,779 | 17,322,496
oslots | 3 | 16,892,788
__boundary__ | 2 | 16,430,640
__levDown__ | 131,120 | 15,130,128
monad | 137,779 | 12,951,424
orig_order | 137,779 | 12,951,424
nodeID | 137,779 | 12,695,400
wgnum | 114,879 | 11,493,464
unicode | 137,779 | 11,383,536
sentence | 137,806 | 11,087,044
word | 137,779 | 10,862,812
normalized | 137,779 | 10,773,392
gloss | 137,779 | 10,312,734
containedclause | 137,779 | 9,888,346
__order__ | 268,899 | 9,680,404
lemma | 137,779 | 9,581,098
chapter | 153,939 | 9,553,260
verse | 153,706 | 9,546,736
ln | 137,779 | 9,532,549
subj_ref | 137,779 | 9,454,588
strongs | 137,779 | 9,382,667
lex_dom | 137,779 | 9,198,787
morph | 137,779 | 9,160,464
wordlevel | 137,779 | 9,108,840
roleclausedistance | 137,779 | 9,108,678
bookshort | 137,806 | 9,102,968
book_long | 137,779 | 9,102,323
booknumber | 137,806 | 9,101,536
sp_full | 137,779 | 9,101,404
sp | 137,779 | 9,101,359
wordrolelong | 137,779 | 9,101,358
type | 137,779 | 9,101,355
wordrole | 137,779 | 9,101,288
mood | 137,779 | 9,101,184
tense | 137,779 | 9,101,170
after | 137,779 | 9,101,136
case | 137,779 | 9,101,118
voice | 137,779 | 9,101,059
gn | 137,779 | 9,101,001
person | 137,779 | 9,100,994
degree | 137,779 | 9,100,951
nu | 137,779 | 9,100,943
number | 137,779 | 9,100,943
rule | 114,879 | 8,509,105
wglevel | 114,879 | 8,464,004
wgrole | 114,879 | 8,460,401
wgclass | 114,879 | 8,460,214
wgrolelong | 114,879 | 8,460,158
appos | 114,879 | 8,460,076
wgtype | 114,879 | 8,460,076
clausetype | 114,879 | 8,459,801
junction | 114,879 | 8,459,750
__structure__ | 6 | 4,023,786
__rank__ | 268,899 | 1,142,896
otype | 4 | 1,049,452
__sections__ | 2 | 573,560
__characters__ | 1 | 30,405
book | 27 | 3,475
__levels__ | 6 | 1,300
TOTAL | 7,354,428 | 584,214,160

## 3.4 - List loaded features <a class="anchor" id="bullet3x4"></a>
##### [Back to TOC](#TOC)

The API call [`A.isLoaded()`](https://annotation.github.io/text-fabric/tf/core/api.html#tf.core.api.Api.isLoaded) will show information about loaded features.

In [7]:
N1904.isLoaded()

__boundary__         computed  
__characters__       computed  
__levDown__          computed  
__levUp__            computed  
__levels__           computed  
__order__            computed  
__rank__             computed  
__sections__         computed  
__structure__        computed  
after                node (str) Characters (eg. punctuations) following the word
appos                node (str) Apposition details
book                 node (str) Book
book_long            node (str) Book name (fully spelled out)
booknumber           node (int) NT book number (Matthew=1, Mark=2, ..., Revelation=27)
bookshort            node (str) Book name (abbreviated)
case                 node (str) Gramatical case (Nominative, Genitive, Dative, Accusative, Vocative)
chapter              node (int) Chapter number inside book
clausetype           node (str) Clause type details
containedclause      node (str) Contained clause (WG number)
degree               node (str) Degree (e.g. Comparitative, Super

## 3.5 - Statistics on node types<a class="anchor" id="bullet3x5"></a>
##### [Back to TOC](#TOC)

This example will show various statistics on node types. The call to [`C.levels.data`](https://annotation.github.io/text-fabric/tf/core/prepare.html#tf.core.prepare.levels) results in list of ordered tuples which will be nicely displayed using the tabulate function.

In [8]:
# Library to format table
from tabulate import tabulate
headers = ["Node", "Avarage # of slots","first","last"]
ResultList= C.levels.data
print(tabulate(ResultList, headers=headers, tablefmt='fancy_grid'))

╒══════════╤══════════════════════╤═════════╤════════╕
│ Node     │   Avarage # of slots │   first │   last │
╞══════════╪══════════════════════╪═════════╪════════╡
│ book     │           5102.93    │  137780 │ 137806 │
├──────────┼──────────────────────┼─────────┼────────┤
│ chapter  │            529.919   │  137807 │ 138066 │
├──────────┼──────────────────────┼─────────┼────────┤
│ verse    │             17.346   │  146078 │ 154020 │
├──────────┼──────────────────────┼─────────┼────────┤
│ sentence │             17.1987  │  138067 │ 146077 │
├──────────┼──────────────────────┼─────────┼────────┤
│ wg       │              7.59739 │  154021 │ 268899 │
├──────────┼──────────────────────┼─────────┼────────┤
│ word     │              1       │       1 │ 137779 │
╘══════════╧══════════════════════╧═════════╧════════╛


## 3.6 - Node number ranges <a class="anchor" id="bullet3x6"></a>
##### [Back to TOC](#TOC)

In [9]:
for NodeType in F.otype.all:
    print (NodeType, F.otype.sInterval(NodeType))

book (137780, 137806)
chapter (137807, 138066)
verse (146078, 154020)
sentence (138067, 146077)
wg (154021, 268899)
word (1, 137779)


Note that the ranges shown as output of this command are (except, possibly with repect to order) the same as found in file `otype.tf`:
>
```@node
@TextFabric version=11.4.10
...
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2023-06-19T16:21:20Z

1-137779	word
137780-137806	book
137807-138066	chapter
138067-154190	clause
154191-226864	phrase
226865-232584	sentence
232585-240527	verse
```