# Create additional features (LXX)

## Table of content <a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Import raw data</a>
* <a href="#bullet3">3 - Load Text-Fabric app and data</a>
* <a href="#bullet4">4 - Adding Features to currently loaded TF</a>
    * <a href="#bullet4x1">4.1 - Prepare metadata</a>
    * <a href="#bullet4x2">4.2 - Prepare featuredata</a>
    * <a href="#bullet4x3">4.3 - Link metadata to the featuredata</a>
    * <a href="#bullet4x4">4.4 - Save the features to files</a>
    * <a href="#bullet4x5">4.5 - Reload the features and check</a>
    * <a href="#bullet4x6">4.6 - Move files to proper location</a>
* <a href="#bullet5">5 - Required libraries</a>
* <a href="#bullet6">6 - Notebook details</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to TOC](#TOC)

Jupuyter Notebook to create additional features for the CenterBLC/LXX Text-Fabric dataset.

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

In [4]:
%load_ext autoreload
%autoreload 2

In [2]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

In [3]:
# load the BHSA app and data
LXX = use ('CenterBLC/LXX', hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,57,10941.98,100
chapter,1192,523.23,100
verse,30371,20.54,100
subverse,30419,20.5,100
word,623693,1.0,100


# 4 - Adding Features to currently loaded TF<a class="anchor" id="bullet4"></a>
##### [Back to TOC](#TOC)

Description!

## 4.1 - Prepare metadata<a class="anchor" id="bullet4x1"></a>

Make a dictionary with metadata that will be visible at top of the new feature file. 

In [16]:
# Common metadata template function
def createMetadata(description,type):
    return {
        'author': 'Rahlfs',
        'Source': 'https://github.com/eliranwong/LXX-Rahlfs-1935',
        'convertedBy': 'Tony Jurg',
        'website': 'https://github.com/tonyjurg/BHSaddons', 
        'description': description,
        'coreData': 'LXX',
        'coreDataUrl': 'https://github.com/CenterBLC/LXX',
        'provenance': 'jupyter Notebook (TBA)',
        'version': '0.1',
        'license': 'Creative Commons Attribution-NonCommercial 4.0 International License',
        'licenseUrl': 'http://creativecommons.org/licenses/by-nc/4.0/',
        'valueType': type
    }

# Create metadata dictionaries using the function
lemmaMetadata = createMetadata('lemma of the word','str')

In [12]:
# Initialize dictionaries
lemmaData = {}

# Looping over verses and populating dictionaries
for word in F.otype.s('word'):
  lemmaPart=F.bol_lexeme_dict.v(word).split(",")[0]
  lemmaData[word]=lemmaPart

## 4.3 - Link metadata to the featuredata<a class="anchor" id="bullet4x3"></a>

Give the new feature a name, and connect it with the data dictionary and the metadata dictionary.

In [17]:
nodedata = {'lemma': lemmaData}
metadata = {'lemma': lemmaMetadata}

## 4.4 - Save the features to files<a class="anchor" id="bullet4x4"></a>

Now we save the new features to their file. If you do not specify a location, it will be saved in the folder where the presently loaded files are.

In [18]:
TF.save(nodeFeatures=nodedata, metaData=metadata)

  0.00s Exporting 1 node and 0 edge and 0 configuration features to ~/text-fabric-data/github/CenterBLC/LXX/tf/1935:
   |     0.71s T lemma                to ~/text-fabric-data/github/CenterBLC/LXX/tf/1935
  0.71s Exported 1 node features and 0 edge features and 0 config features to ~/text-fabric-data/github/CenterBLC/LXX/tf/1935


True

## 4.5 - Reload the features and check<a class="anchor" id="bullet4x5"></a>

In [19]:
# load the LXX app and data
BHSA = use ('CenterBLC/LXX', hoist=globals())

**Locating corpus resources ...**

   |     2.53s T lemma                from ~/text-fabric-data/github/CenterBLC/LXX/tf/1935


Name,# of nodes,# slots / node,% coverage
book,57,10941.98,100
chapter,1192,523.23,100
verse,30371,20.54,100
subverse,30419,20.5,100
word,623693,1.0,100


## 4.6 - Move files to proper location<a class="anchor" id="bullet4x6"></a>

Now move the newly created files to their proper location.  taking into account the various versions (c, 2021, etc)

# 5 - Required libraries<a class="anchor" id="bullet5"></a>
##### [Back to TOC](#TOC)

The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:

    {none}

You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.

# 6 - Notebook details<a class="anchor" id="bullet6"></a>
##### [Back to TOC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.0</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>8 November 2024</td>
    </tr>
  </table>
</div>