# Create markdown term list and vocabularies

Config is separated from the scripts and is in YAML, so it is easy for people to edit. One of the nice things about YAML is that you can have multiple documents in a single file, which means that the configs for all vocabularies can be in the same file. I have chosen to have as little configuration as possible and get as much as possible from the term lists, so the script checks for the `type` of the term and whether certain columns are there. The term lists in this case are also in YAML, but could just as easily have been in CSV or JSON, as the term lists are turned into dictionaries and then into data frames.

In [1]:
import yaml

configYaml = """
# config.yaml
---
headerFileName: termlist-header.md
footerFileName: termlist-footer.md
outFileName: ../master/README.md
termLists:
  - filename: tcs
    vann_preferredNamespacePrefix: tcs
    vann_preferredNamespaceUri: http://rs.tdwg.org/tcs/terms/
  - filename: dwc-for-tcs
    vann_preferredNamespacePrefix: dwc
    vann_preferredNamespaceUri: http://rs.tdwg.org/dwc/terms/
categories:
  - namespace: 'http://rs.tdwg.org/tcs/terms/TaxonConcept'
    label: Taxon Concept
    comments: ''
    display_id: taxonConcept
  - namespace: 'http://rs.tdwg.org/tcs/terms/TaxonRelationship'
    label: Taxon Relationship
    comments: ''
    display_id: taxonRelationship
  - namespace: 'http://rs.tdwg.org/tcs/terms/TaxonName'
    label: Taxon Name
    comments: ''
    display_id: taxonName
  - namespace: 'http://rs.tdwg.org/tcs/terms/NomenclaturalType'
    label: Nomenclatural Type
    comments: ''
    display_id: nomenclaturalType
---
headerFileName: taxon-concept-category-vocabulary-header.md
footerFileName: termlist-footer.md
outFileName: ../master/taxon-concept-category-vocabulary.md
termLists:
  - filename: tcsTaxonConceptCategory
    vann_preferredNamespacePrefix: tcscategory
    vann_preferredNamespaceUri: http://rs.tdwg.org/tcs-taxon-concept-category/values/
categories:
  - namespace: http://rs.tdwg.org/tcs-taxon-concept-category/values/
    label: Taxon Concept Category
---
headerFileName: taxon-relationship-type-vocabulary-header.md
footerFileName: termlist-footer.md
outFileName: ../master/taxon-relationship-vocabulary.md
termLists:
  - filename: tcsTaxonRelationshipType
    vann_preferredNamespacePrefix: tcsreltype
    vann_preferredNamespaceUri: http://rs.tdwg.org/tcs-taxon-relationship-type/values/
categories:
  - namespace: http://rs.tdwg.org/tcs-taxon-relationship-type/values/
    label: Taxon Relationship Type
---
headerFileName: nomenclatural-code-vocabulary-header.md
footerFileName: termlist-footer.md
outFileName: ../master/nomenclatural-code-vocabulary.md
termLists:
  - filename: tcsTaxonRelationshipType
    vann_preferredNamespacePrefix: tcsnomcode
    vann_preferredNamespaceUri: http://rs.tdwg.org/tcs-nomenclatural-code/values/
categories:
  - namespace: http://rs.tdwg.org/tcs-nomenclatural-code/values/
    label: Nomenclatural Code
---
headerFileName: nomenclatural-status-vocabulary-header.md
footerFileName: termlist-footer.md
outFileName: ../master/nomenclatural-status-vocabulary.md
termLists:
  - filename: tcsNomenclaturalStatus
    vann_preferredNamespacePrefix: tcsnomstat
    vann_preferredNamespaceUri: http://rs.tdwg.org/tcs-nomenclatural-status/values/
categories:
  - namespace: http://rs.tdwg.org/tcs-nomenclatural-status/values/
    label: Nomenclatural Status
---
headerFileName: rank-vocabulary-header.md
footerFileName: termlist-footer.md
outFileName: ../master/rank-vocabulary.md
termLists:
  - filename: tcsRank
    vann_preferredNamespacePrefix: tcsrank
    vann_preferredNamespaceUri: http://rs.tdwg.org/tcs-rank/values/
categories:
  - namespace: http://rs.tdwg.org/tcs-rank/values/
    label: Rank
---
headerFileName: type-of-type-vocabulary-header.md
footerFileName: termlist-footer.md
outFileName: ../master/type-of-type-vocabulary.md
termLists:
  - filename: tcsTypeOfType
    vann_preferredNamespacePrefix: tcstypeoftype
    vann_preferredNamespaceUri: http://rs.tdwg.org/tcs-type-of-type/values/
categories:
  - namespace: http://rs.tdwg.org/tcs-type-of-type/values/
    label: Type of Type
"""

All functions are in a separate `vocab_build_tools.py` file, so they can be shared.

In [4]:
# vocab_build_tools.py

# #!/usr/bin/env python

# import yaml
import pandas as pd
import markdown

# create dictionary with terms from YAML file


def dict_from_yaml(termLists):
    dicts = []
    for index, list in enumerate(termLists):
        f = open('../master/{filename}.yaml'.format(filename=list['filename']), newline='')
        data = yaml.load(f, Loader=yaml.FullLoader)
        data = [dict(item, namespace=list['vann_preferredNamespaceUri']) for item in data]
        data = [dict(item, namespaceAlias=list['vann_preferredNamespacePrefix']) for item in data]
        dicts += data
    return dicts
    
# create data frame from dictionary


def create_df(termLists):
    dict = dict_from_yaml(termLists)
    return pd.DataFrame.from_dict(dict)

# create index of terms


def create_index(config, merged_df):
    text = '### Index of terms\n\n'

    if len(config['categories']) > 1:
        text += '**classes**\n\n'

        items = []
        for index, row in merged_df[merged_df['type'].str.contains('Class')].iterrows():
            label = '{namespaceAlias}:{localName}'.format(
                namespaceAlias=row['namespaceAlias'], localName=row['localName'])
            anchor = '#{namespaceAlias}_{localName}'.format(
                namespaceAlias=row['namespaceAlias'], localName=row['localName'])
            item = '[{label}]({anchor})'.format(label=label, anchor=anchor)
            items.append(item)
        text += ' | '.join(items) + '\n\n'

    for category in config['categories']:
        if len(config['categories']) > 1:
            text += '**{label}**\n\n'.format(label=category['label'])
            filtered_df = merged_df[merged_df['organizedInClass']
                                    == category['namespace']]
        else:
            filtered_df = merged_df

        items = []
        for index, row in filtered_df.iterrows():
            if 'Class' in row['type'] or 'Property' in row['type']:
                label = '{namespaceAlias}:{localName}'.format(
                    namespaceAlias=row['namespaceAlias'],
                    localName=row['localName']
                )
            else:
                label = '{label}'.format(label=row['label'])

            anchor = '#{namespaceAlias}_{localName}'.format(
                namespaceAlias=row['namespaceAlias'], localName=row['localName'])

            if 'Class' not in row['type']:
                item = '[{label}]({anchor})'.format(label=label, anchor=anchor)
                items.append(item)
        text += ' | '.join(items) + '\n\n'

    return text

# create table cell


def table_cell(content, celltype='td', colspan=1):
    if colspan == 1:
        return '\t\t\t<{celltype}>{content}</{celltype}>'.format(content=content, celltype=celltype)
    else:
        return '\t\t\t<{celltype} colspan="{colspan}">{content}</{celltype}>'.format(content=content, celltype=celltype, colspan=colspan)

# create table row


def table_row(cells):
    return '\t\t<tr>\n{cells}\n\t\t</tr>\n'.format(cells='\n'.join(cells))

# create term table


def term_table(term):
    text = '<table>\n'

    # table header
    curie = '{namespaceAlias}:{localName}'.format(
        namespaceAlias=term['namespaceAlias'], localName=term['localName'])
    curieAnchor = curie.replace(':', '_')
    term_type = term['type'][term['type'].find('#')+1:]
    if term_type == 'Concept':
        tableHeader = '<a id="{anchor}"></a>{term_type} {curie} ({label})'.format(
            curie=curie, anchor=curieAnchor, term_type=term_type, label=term['label'])
    else:
        tableHeader = '<a id="{anchor}"></a>{term_type} {curie}'.format(
            curie=curie, anchor=curieAnchor, term_type=term_type)
    text += '\t<thead>\n'
    text += table_row([table_cell(tableHeader, celltype='th', colspan=2)])
    text += '\t</thead>\n'

    text += '\t<tbody>\n'

    # Term IRI
    uri = '{namespace}{localName}'.format(
        namespace=term['namespace'], localName=term['localName'])
    text += table_row([
        table_cell('Term IRI'),
        table_cell(uri)
    ])

    # Type
    text += table_row([
        table_cell('Type'),
        table_cell(term['type'])
    ])

    # Label
    text += table_row([
        table_cell('Label'),
        table_cell(term['label'])
    ])

    # Attributes
    if 'Property' in term['type']:
        required = "Yes" if term['required'] else "No"
        repeatable = "Yes" if term['repeatable'] else "No"
        attrs = '<b>required:</b> {required} — <b>repeatable:</b> {repeatable}'.format(
            required=required, repeatable=repeatable)
        text += table_row([
            table_cell(''),
            table_cell(attrs)
        ])

    # Definition
    definition = term['definition'] if term['definition'] else ""
    text += table_row([
        table_cell('Definition'),
        table_cell(markdown.markdown(definition))])

    # Usage
    usage = term['usage'] if term['usage'] else ""
    text += table_row([
        table_cell('Usage'),
        table_cell(markdown.markdown(usage))
    ])

    # Comments/Notes
    comments = term['notes'] if term['notes'] else ""
    text += table_row([
        table_cell('Comments'),
        table_cell(markdown.markdown(comments))
    ])
    
    # Examples
    if 'examples' in term:
        examples = term['examples'] if term['examples'] else ""
        text += table_row([
            table_cell('Examples'),
            table_cell(markdown.markdown(examples))
        ])
        

    # Controlled term
    if 'Concept' in term['type']:
        text += table_row([
            table_cell('Controlled value'),
            table_cell(term['controlled_value_string'])
        ])
    
    # Github issue
    if 'github' in term and term['github']:
        text += table_row([
            table_cell('GitHub issue'),
            table_cell('https://github.com/tdwg/tcs2/issues/{github}'.format(github=term['github']))
        ])

    text += '\t</tbody>\n'
    text += '</table>\n\n'
    return text

# create vocabulary


def create_vocab(config, merged_df):
    vocab = '### Vocabulary\n\n'
    for category in config['categories']:
        if len(config['categories']) > 1:
            vocab += '#### {label}\n\n'.format(label=category['label'])
            filtered_df = merged_df[merged_df['organizedInClass']
                                    == category['namespace']]
        else:
            filtered_df = merged_df
        for index, row in filtered_df.iterrows():
            vocab += term_table(row)
    return vocab


This is all the code you need to create the markdown once everything else is in place:

In [5]:
# create markdown for term list and vocabularies
for config in yaml.load_all(configYaml, Loader=yaml.FullLoader):
#     print(config['outFileName'])
    
    merged_df = create_df(config['termLists'])
    term_index = create_index(config, merged_df)
    vocab = create_vocab(config, merged_df)
    text = term_index + vocab

    headerObject = open(config['headerFileName'], 'rt', encoding='utf-8')
    header = headerObject.read()
    headerObject.close()

    footerObject = open(config['footerFileName'], 'rt', encoding='utf-8')
    footer = footerObject.read()
    footerObject.close()

    output = header + text + footer
    outputObject = open(config['outFileName'], 'wt', encoding='utf-8')
    outputObject.write(output)
    outputObject.close()