# [Doc4TF/tools/versionMapping](https://github.com/tonyjurg/Doc4TF/tools/versionMapping.ipynb)
#### *Mapping nodes changes between two Text-Fabric datasets*

Version: 0.1 (May. 13, 2024).

## Table of content <a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Setting up the environment</a>
* <a href="#bullet3">3 - Load Text-Fabric data</a>
* <a href="#bullet4">4 - Creation of the dataset</a>
    * <a href="#bullet4x1">4.1 - Setting up some global variables</a>
    * <a href="#bullet4x2">4.2 -  Store all relevant data into a dictionary</a>
* <a href="#bullet5">5 - Create the documentation pages</a>
    * <a href="#bullet5x1">5.1 - Create the set of feature pages</a>
    * <a href="#bullet5x2">5.2 - Create the index pages</a>
* <a href="#bullet6">6 - Licence</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to TOC](#TOC)

This notebook is utilizing the module [tf.dataset.nodemaps](https://annotation.github.io/text-fabric/tf/dataset/nodemaps.html). See also the description provided with the module.

# 2. Setting up the environment<a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

Your environment should (for obvious reasons) include the Python package `Text-Fabric`. If not installed yet, it can be installed using `pip`. Further it is required to be able to invoke the Text-Fabric data sets (either from an online resource, or from a localy stored copy).

# 3 - Load Text-Fabric data <a class="anchor" id="bullet3"></a>
##### [Back to TOC](#TOC)

See also notebook [map.ipynb](https://nbviewer.org/github/clariah/wp6-missieven/blob/master/programs/map.ipynb).

See [dataset.Versions](https://annotation.github.io/text-fabric/tf/dataset/nodemaps.html#tf.dataset.nodemaps.Versions) in the Text-Fabric documentation.

In [7]:
%load_ext autoreload
%autoreload 2

In [25]:
# Loading the Text-Fabric code and module versions
from tf.fabric import Fabric
from tf.dataset import Versions
from tf.app import use

va = "0.5.6" # 
vb = "0.5.7" #

for v in (va, vb):
    TF[v] = Fabric(locations=TF_DIR, modules=v)
    api[v] = TF[v].load(features[v])

NameError: name 'TF_DIR' is not defined

In [11]:
# also required: module marimo
!pip install marimo

Collecting marimo
  Downloading marimo-0.5.2-py3-none-any.whl.metadata (26 kB)
Collecting pymdown-extensions<11,>=9.0 (from marimo)
  Downloading pymdown_extensions-10.8.1-py3-none-any.whl.metadata (3.0 kB)
Collecting tomlkit>=0.12.0 (from marimo)
  Downloading tomlkit-0.12.5-py3-none-any.whl.metadata (2.7 kB)
Collecting uvicorn>=0.22.0 (from marimo)
  Downloading uvicorn-0.29.0-py3-none-any.whl.metadata (6.3 kB)
Collecting starlette!=0.36.0,>=0.26.1 (from marimo)
  Downloading starlette-0.37.2-py3-none-any.whl.metadata (5.9 kB)
Collecting websockets<13.0.0,>=10.0.0 (from marimo)
  Downloading websockets-12.0-cp311-cp311-win_amd64.whl.metadata (6.8 kB)
Collecting docutils>=0.17.0 (from marimo)
  Downloading docutils-0.21.2-py3-none-any.whl.metadata (2.8 kB)
Collecting black (from marimo)
  Downloading black-24.4.2-cp311-cp311-win_amd64.whl.metadata (77 kB)
     ---------------------------------------- 0.0/77.1 kB ? eta -:--:--
     ---------------------------------------- 77.1/77.1 kB 

In [68]:
# Load the app and data from the first version in the set for comparison
A1 = use ("saulocantanhede/tfgreek2", version="0.5.7")

**Locating corpus resources ...**

The requested app is not available offline
	~/text-fabric-data/github/saulocantanhede/tfgreek2/app not found


   |     0.56s T otype                from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7
   |     6.64s T oslots               from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7
   |     0.00s T before               from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7
   |     1.16s T lemma                from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7
   |     1.00s T punctuation          from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7
   |     0.46s T verse                from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7
   |     1.11s T lemmatranslit        from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7
   |     1.03s T after                from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7
   |     1.15s T translit             from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7
   |     1.22s T text                 from ~/text-fabric-data/github/saul

KeyboardInterrupt: 

In [70]:
# Load the app and data from the second version in the set for comparison
A2 = use ("saulocantanhede/tfgreek2", version="0.5.6")

**Locating corpus resources ...**

The requested app is not available offline
	~/text-fabric-data/github/saulocantanhede/tfgreek2/app not found


The requested data is not available offline
	~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6 not found
rate limit is 5000 requests per hour, with 4999 left for this hour
	connecting to online GitHub repo saulocantanhede/tfgreek2 ... connected
	tf/0.5.6/after.tf...downloaded
	tf/0.5.6/appositioncontainer.tf...downloaded
	tf/0.5.6/articular.tf...downloaded
	tf/0.5.6/before.tf...downloaded
	tf/0.5.6/book.tf...downloaded
	tf/0.5.6/bookshort.tf...downloaded
	tf/0.5.6/case.tf...downloaded
	tf/0.5.6/chapter.tf...downloaded
	tf/0.5.6/clausetype.tf...downloaded
	tf/0.5.6/cls.tf...downloaded
	tf/0.5.6/cltype.tf...downloaded
	tf/0.5.6/criticalsign.tf...downloaded
	tf/0.5.6/crule.tf...downloaded
	tf/0.5.6/degree.tf...downloaded
	tf/0.5.6/discontinuous.tf...downloaded
	tf/0.5.6/domain.tf...downloaded
	tf/0.5.6/frame.tf...downloaded
	tf/0.5.6/framespec.tf...downloaded
	tf/0.5.6/function.tf...downloaded
	tf/0.5.6/gender.tf...downloaded
	tf/0.5.6/gloss.tf...downloaded
	tf/0.5.6/id.tf...dow

   |     0.58s T otype                from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6
   |     6.26s T oslots               from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6
   |     0.00s T before               from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6
   |     1.18s T lemma                from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6
   |     1.03s T punctuation          from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6
   |     0.47s T verse                from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6
   |     1.04s T after                from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6
   |     1.18s T translit             from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6
   |     1.26s T text                 from ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.6
   |     0.54s T chapter              from ~/text-fabric-data/github/saul

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,19767,13.79,198
group,8964,7.02,46
clause,30479,7.19,159
wg,106868,6.88,533
phrase,69403,1.91,96
subphrase,116034,1.6,135
word,137779,1.0,100


App config error(s) in word:
	features: feature sp not loaded


Display is setup for viewtype [syntax-view](https://github.com/saulocantanhede/tfgreek2/blob/main/docs/syntax-view.md#start)

See [here](https://github.com/saulocantanhede/tfgreek2/blob/main/docs/viewtypes.md#start) for more information on viewtypes

In [48]:
A2.zipAll()


Data to be zipped:


fatal: not a git repository (or any of the parent directories): .git


	OK       app                      (v?? ??)            : ~/github/saulocantanhede/tfgreek2/app


fatal: not a git repository (or any of the parent directories): .git


	OK       main data                (v?? ??)            : ~/github/saulocantanhede/tfgreek2/tf/0.5.6
Writing zip file ...


'~/Downloads/github/saulocantanhede/tfgreek2/complete.zip'

In [9]:
from tf.advanced.helpers import dm
from tf.advanced.repo import checkoutRepo

In [56]:
def do(task):
    md = f"""
commit | release | local | base | subdir
--- | --- | --- | --- | ---
`{task[0]}` | `{task[1]}` | `{task[2]}` | `{task[3]}` | `{task[4]}`
"""
    dm(md)

In [58]:
do(checkoutRepo(backend='github', org="saulocantanhede", repo="tfgreek2", folder="tf", version="0.5.6", checkout=""))


commit | release | local | base | subdir
--- | --- | --- | --- | ---
`77a9118c7cd97145e8b6d32d23cbf227b3d70727` | `0.5.6` | `local` | `C:/Users/tonyj/text-fabric-data/github` | `saulocantanhede/tfgreek2/tf`


In [62]:
do(checkoutRepo(backend='github', org="saulocantanhede", repo="tfgreek2", folder="tf", version="0.5.7", checkout=""))


commit | release | local | base | subdir
--- | --- | --- | --- | ---
`352af50c8ce86edd8a0e2d58519453a8f53ee084` | `None` | `local` | `C:/Users/tonyj/text-fabric-data/github` | `saulocantanhede/tfgreek2/tf`


# 4 - Creation of the dataset<a class="anchor" id="bullet4"></a>

## 4.1 - Setting up some global variables<a class="anchor" id="bullet4x1"></a>
##### [Back to TOC](#TOC)

In [4]:
# The version number of the script
scriptVersion="0.1"
scriptDate="May. 12, 2024"

## 4.2 - Store all relevant data into a dictionary<a class="anchor" id="bullet4x2"></a>
##### [Back to TOC](#TOC)

The following will create a dictionary containing all relevant information for the loaded node and edge features.

In [5]:
# Initialize an empty dictionary to store feature data
featureDict = {}
import time
overallTime = time.time()

def getFeatureDescription(metaData):
    """
    This function looks for the 'description' key in the metadata dictionary. If the key is found,
    it returns the corresponding description. If the key is not present, it returns a default 
    message indicating that no description is available.

    Parameters:
       metaData (dict): A dictionary containing metadata about a feature.

    Returns:
       str: The description of the feature if available, otherwise a default message.
    """
    return metaData.get('description', "No feature description")

def setDataType(metaData):
    """
    This function checks for the 'valueType' key in the metadata. If the key is present, it
    returns 'String' if the value is 'str', and 'Integer' for other types. If the 'valueType' key
    is not present, it returns 'Unknown'.

    Parameters:
       metaData (dict): A dictionary containing metadata, including the 'valueType' of a feature.

    Returns:
       str: A string indicating the determined data type ('String', 'Integer', or 'Unknown').
    """
    if 'valueType' in metaData:
        return "String" if metaData["valueType"] == 'str' else "Integer"
    return "Unknown"


def processFeature(feature, featureType, featureMethod):
    """
    Processes a given feature by extracting metadata, description, and data type, and then
    compiles frequency data for different node types in a feature dictionary. Certain features
    are skipped based on their type. The processed data is added to a global feature dictionary.

    Parameters:
       feature (str): The name of the feature to be processed.
       featureType (str): The type of the feature ('Node' or 'Edge').
       featureMethod (function): A function to obtain feature data.

    Returns:
       None: The function updates a global dictionary with processed feature data and does not return anything.
    """
    
    # Obtain the meta data
    featureMetaData = featureMethod(feature).meta
    featureDescription = getFeatureDescription(featureMetaData)
    dataType = setDataType(featureMetaData)

    # Initialize dictionary to store feature frequency data
    featureFrequencyDict = {}

    # Skip for specific features based on type
    if not (featureType == 'Node' and feature == 'otype') and not (featureType == 'Edge' and feature == 'oslots'):
        for nodeType in F.otype.all:
            frequencyLists = featureMethod(feature).freqList(nodeType)
            if not isinstance(frequencyLists, int):
                if len(frequencyLists)!=0:
                    featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': frequencyLists[:tableLimit]}
            elif isinstance(frequencyLists, int):
                if frequencyLists != 0:
                    featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': [("Link", frequencyLists)]}

    # Add processed feature data to the main dictionary
    featureDict[feature] = {'name': feature, 'descr': featureDescription, 'type': featureType, 'datatype': dataType, 'freqlist': featureFrequencyDict}
    
########################################################
#                     MAIN FUNCTION                    #
########################################################

########################################################
#             Gather general information               #
########################################################

print('Gathering generic details')

# Initialize default values
corpusName = A.appName
liveName = ''
versionName = A.version

# Trying to locate corpus information
if A.provenance:
    for parts in A.provenance[0]: 
        if isinstance(parts, tuple):
            key, value = parts[0], parts[1]
            if verbose: print (f'General info: {key}={value}')
            if key == 'corpus': corpusName = value
            if key == 'version': versionName = value
            # value for live is a tuple
            if key == 'live': liveName=value[1]
if liveName is not None and len(liveName)>1:
    # an URL was found
    pageTitleMD = f'Doc4TF pages for [{corpusName}]({liveName}) (version {versionName})'
    pageTitleHTML = f'<h1>Doc4TF pages for <a href="{liveName}">{corpusName}</a> (version {versionName})</h1>'
else:
    # No URL found
    pageTitleMD = f'Doc4TF pages for {corpusName} (version {versionName})'
    pageTitleHTML = f'<h1>Doc4TF pages for {corpusName} (version {versionName})</h1>'

# Overwrite in case user provided a title
if 'customPageTitleMD_' in globals():
    pageTitleMD = customPageTitleMD
if 'customPageTitleHTML' in globals():
    pageTitleMD = customPageTitleHTML

    
########################################################
#             Processing node features                 #
########################################################

print('Analyzing Node Features: ', end='')
for nodeFeature in Fall():
    if not verbose: print('.', end='')  # Progress indicator
    processFeature(nodeFeature, 'Node', Fs)
    if verbose: print(f'\nFeature {nodeFeature} = {featureDict[nodeFeature]}\n')  # Print feature data if verbose

########################################################
#             Processing edge features                 #
########################################################

print('\nAnalyzing Edge Features: ', end='')
for edgeFeature in Eall():
    if not verbose: print('.', end='')  # Progress indicator
    processFeature(edgeFeature, 'Edge', Es)
    if verbose: print(f'\nFeature {edgeFeature} = {featureDict[edgeFeature]}\n')  # Print feature data if verbose

print(f'\nFinished in {time.time() - overallTime:.2f} seconds.')

Gathering generic details
Analyzing Node Features: ..................................................
Analyzing Edge Features: ....
Finished in 12.62 seconds.


# 6 - License<a class="anchor" id="bullet6"></a>
##### [Back to TOC](#TOC)

Licenced under [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://github.com/tonyjurg/Doc4TF/blob/main/LICENCE.md)