# Morphkit usage examples (Greek and Latin)

## Table of content (ToC)<a class="anchor" id="TOC"></a>

* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Setting up the environment</a>
* <a href="#bullet3">3 - Usage examples (Greek)</a>    
   * <a href="#bullet3x1">3.1 - Obtain Morpheus Analytic blocks for a Greek word</a> 
   * <a href="#bullet3x2">3.2 - Get the compact analysis results</a>
   * <a href="#bullet3x3">3.3 - Split the result in blocks</a>
   * <a href="#bullet3x4">3.4 - Add Part of Speech</a> 
   * <a href="#bullet3x5">3.5 - Add SP morph tag</a>
   * <a href="#bullet3x6">3.6 - Perform a full analysis</a>
* <a href="#bullet4">4 - Usage examples (Latin)</a>
   * <a href="#bullet4x1">4.1 - Obtain Morpheus Analytic blocks for a Latin word</a>
   * <a href="#bullet4x2">4.2 - Split the result and store in a dictionary</a>
   * <a href="#bullet4x3">4.3 - Or all in one go</a>
* <a href="#bullet5">5 - Additional features</a>
   * <a href="#bullet5x1">5.1 - Decode morphological tag</a>
   * <a href="#bullet5x2">5.2 - Compare morphological tags</a>
   * <a href="#bullet5x3">5.3 - Sort Morpheus blocks based on reference tag/lemma</a>
* <a href="#bullet6">6 - Attribution and footnotes</a>
* <a href="#bullet7">7 - Notebook version</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

This notebook demonstrates how to use the Morphkit Python package to interface with the Morpheus API. It shows how to obtain the raw analytic blocks for a given word, and how the results can be parsed and stored in a structured dictionary format. It also introduces the orchestrating function [`analyse_word_with_morpheus`](https://tonyjurg.github.io/morphkit/api/autogen/morphkit.analyse_word_with_morpheus.html) that performs the entire process in one step. which performs the entire process in a single step. While Latin support is limited to basic output, Greek analyses include detailed morphological and part-of-speech tagging. The final section demonstrates a few auxiliary functions: [`decode_tag`](https://tonyjurg.github.io/morphkit/api/autogen/morphkit.decode_tag.html), [`compare_tags`](https://tonyjurg.github.io/morphkit/api/autogen/morphkit.compare_tags.html), and [`annotate_and_sort_analyses`](https://tonyjurg.github.io/morphkit/api/autogen/morphkit.annotate_and_sort_analyses.html).

# 2 - Setting up the environment <a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

In [1]:
import sys
sys.path.insert(0, "../")    # relative to notebook dir
import morphkit

morphkit loaded


We also need to load the betacode library:

In [2]:
import beta_code

# 3 - Usage examples (Greek) <a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

Assume we would like to decode the Greek word του. Since Morpheus needs the betacode encoded word as input, we need to first convert it using the beta_code library. 

## 3.1 - Obtain Morpheus Analytic blocks for a Greek word<a class="anchor" id="bullet3x1"></a>

In [3]:
# convert unicode greek to betacode
bc_word=beta_code.greek_to_beta_code(u'του') 
print (bc_word)

tou


Now we need to define the API endpoint which provides us the Morpheus service. 

In [4]:
api_endpoint="10.0.1.156:1315"

Once defined we can combine it with the betacode word and run the `get_word_blocks` function.

In [5]:
print(morphkit.get_word_blocks(bc_word,api_endpoint))


:raw tou

:workw tou=
:lem o(
:prvb 				
:aug1 				
:stem tou=			indeclform	
:suff 				
:end 	 masc/neut gen sg		indeclform	article

:raw tou

:workw tou=
:lem ti/s
:prvb 				
:aug1 				
:stem tou=			indeclform	
:suff 				
:end 	 gen sg	attic	indeclform	indecl

:raw tou

:workw tou
:lem tis
:prvb 				
:aug1 				
:stem tou			enclitic indeclform	
:suff 				
:end 	 gen sg	attic	enclitic indeclform	indef



## 3.2 - Get the compact analysis results <a class="anchor" id="bullet3x2"></a>

If we would like to have the results in a more compact format, we can also use the option

In [6]:
print(morphkit.get_word_blocks(bc_word,api_endpoint,output="compact"))

tou
<NL>N tou=,o(  masc/neut gen sg		indeclform	article</NL><NL>N tou=,ti/s  gen sg	attic	indeclform	indecl</NL><NL>N tis  gen sg	attic	enclitic indeclform	indef</NL>



## 3.3 - Split the result in blocks <a class="anchor" id="bullet3x3"></a>

If we want to turn the received blocks into a Python dictionairy, we first need to split the received blocks and then parse these using function `parse_word_block`. 

In [7]:
import pprint as pp
raw_text=morphkit.get_word_blocks("tou",api_endpoint,language="greek")
blocks=morphkit.split_into_raw_blocks(raw_text)
all_parses = []
for block in blocks:
    raw_beta, parses = morphkit.parse_word_block(block)
    all_parses.append(parses)
    pp.pprint(parses)

[{'case': 'gen',
  'end_codes': ['article'],
  'end_flags': ['indeclform'],
  'gender': ['masc', 'neut'],
  'lem_base_bc': 'o(',
  'lem_base_uc': 'ὁ',
  'lem_full_bc': 'o(',
  'lem_full_uc': 'ὁ',
  'number': 'sg',
  'raw_bc': 'tou',
  'raw_uc': 'του',
  'stem_bc': 'tou=',
  'stem_flags': ['indeclform'],
  'stem_uc': 'τοῦ',
  'workw_bc': 'tou=',
  'workw_uc': 'τοῦ'}]
[{'case': 'gen',
  'dialects': ['attic'],
  'end_codes': ['indecl'],
  'end_flags': ['indeclform'],
  'lem_base_bc': 'ti/s',
  'lem_base_uc': 'τίς',
  'lem_full_bc': 'ti/s',
  'lem_full_uc': 'τίς',
  'number': 'sg',
  'raw_bc': 'tou',
  'raw_uc': 'του',
  'stem_bc': 'tou=',
  'stem_flags': ['indeclform'],
  'stem_uc': 'τοῦ',
  'workw_bc': 'tou=',
  'workw_uc': 'τοῦ'}]
[{'case': 'gen',
  'dialects': ['attic'],
  'end_codes': ['indef'],
  'end_flags': ['enclitic', 'indeclform'],
  'lem_base_bc': 'tis',
  'lem_base_uc': 'τις',
  'lem_full_bc': 'tis',
  'lem_full_uc': 'τις',
  'number': 'sg',
  'pron_type': 'indefinite',
  'raw

## 3.4 - Add part of speech <a class="anchor" id="bullet3x4"></a>

Now we can add part of speech details to the created parse dictionairies. Note the additon of a 'pos' entry:

In [8]:
import pprint as pp
for [parse] in all_parses:  #[] since list of dictionaries
    parse['pos']=(morphkit.analyse_pos(parse))
    pp.pprint(parse)

{'case': 'gen',
 'end_codes': ['article'],
 'end_flags': ['indeclform'],
 'gender': ['masc', 'neut'],
 'lem_base_bc': 'o(',
 'lem_base_uc': 'ὁ',
 'lem_full_bc': 'o(',
 'lem_full_uc': 'ὁ',
 'number': 'sg',
 'pos': 'article',
 'raw_bc': 'tou',
 'raw_uc': 'του',
 'stem_bc': 'tou=',
 'stem_flags': ['indeclform'],
 'stem_uc': 'τοῦ',
 'workw_bc': 'tou=',
 'workw_uc': 'τοῦ'}
{'case': 'gen',
 'dialects': ['attic'],
 'end_codes': ['indecl'],
 'end_flags': ['indeclform'],
 'lem_base_bc': 'ti/s',
 'lem_base_uc': 'τίς',
 'lem_full_bc': 'ti/s',
 'lem_full_uc': 'τίς',
 'number': 'sg',
 'pos': 'proper noun indeclinable',
 'raw_bc': 'tou',
 'raw_uc': 'του',
 'stem_bc': 'tou=',
 'stem_flags': ['indeclform'],
 'stem_uc': 'τοῦ',
 'workw_bc': 'tou=',
 'workw_uc': 'τοῦ'}
{'case': 'gen',
 'dialects': ['attic'],
 'end_codes': ['indef'],
 'end_flags': ['enclitic', 'indeclform'],
 'lem_base_bc': 'tis',
 'lem_base_uc': 'τις',
 'lem_full_bc': 'tis',
 'lem_full_uc': 'τις',
 'number': 'sg',
 'pos': 'indefinite pro

## 3.5 - Add SP morph tag <a class="anchor" id="bullet3x5"></a>

Now we can add the morphological tag to the parse dictionairies that we just augmented with part of speech data. Note the additon of a 'morph' entry:

In [9]:
import pprint as pp
for [parse] in all_parses:  #[] since list of dictionaries
    parse['morph']=(morphkit.analyse_morph_tag(parse))
    pp.pprint(parse)

{'case': 'gen',
 'end_codes': ['article'],
 'end_flags': ['indeclform'],
 'gender': ['masc', 'neut'],
 'lem_base_bc': 'o(',
 'lem_base_uc': 'ὁ',
 'lem_full_bc': 'o(',
 'lem_full_uc': 'ὁ',
 'morph': 'T-GSM/T-GSN',
 'number': 'sg',
 'pos': 'article',
 'raw_bc': 'tou',
 'raw_uc': 'του',
 'stem_bc': 'tou=',
 'stem_flags': ['indeclform'],
 'stem_uc': 'τοῦ',
 'workw_bc': 'tou=',
 'workw_uc': 'τοῦ'}
{'case': 'gen',
 'dialects': ['attic'],
 'end_codes': ['indecl'],
 'end_flags': ['indeclform'],
 'lem_base_bc': 'ti/s',
 'lem_base_uc': 'τίς',
 'lem_full_bc': 'ti/s',
 'lem_full_uc': 'τίς',
 'morph': 'N-PRI-ATT',
 'number': 'sg',
 'pos': 'proper noun indeclinable',
 'raw_bc': 'tou',
 'raw_uc': 'του',
 'stem_bc': 'tou=',
 'stem_flags': ['indeclform'],
 'stem_uc': 'τοῦ',
 'workw_bc': 'tou=',
 'workw_uc': 'τοῦ'}
{'case': 'gen',
 'dialects': ['attic'],
 'end_codes': ['indef'],
 'end_flags': ['enclitic', 'indeclform'],
 'lem_base_bc': 'tis',
 'lem_base_uc': 'τις',
 'lem_full_bc': 'tis',
 'lem_full_uc':

## 3.6 - Perform a full analysis <a class="anchor" id="bullet3x4"></a>

The whole workflow for Greek words can be done in just one function call to `morphkit.analyze_word_with_morpheus`:

In [10]:
import pprint as pp
api_endpoint="10.0.1.156:1315"
bc_word=beta_code.greek_to_beta_code(u'του') 
pp.pprint(morphkit.analyse_word_with_morpheus(bc_word,api_endpoint))

{'analyses': [{'case': 'gen',
               'end_codes': ['article'],
               'end_flags': ['indeclform'],
               'gender': ['masc', 'neut'],
               'lem_base_bc': 'o(',
               'lem_base_uc': 'ὁ',
               'lem_full_bc': 'o(',
               'lem_full_uc': 'ὁ',
               'morph': 'T-GSM/T-GSN',
               'number': 'sg',
               'pos': 'article',
               'raw_bc': 'tou',
               'raw_uc': 'του',
               'stem_bc': 'tou=',
               'stem_flags': ['indeclform'],
               'stem_uc': 'τοῦ',
               'workw_bc': 'tou=',
               'workw_uc': 'τοῦ'},
              {'case': 'gen',
               'dialects': ['attic'],
               'end_codes': ['indecl'],
               'end_flags': ['indeclform'],
               'lem_base_bc': 'ti/s',
               'lem_base_uc': 'τίς',
               'lem_full_bc': 'ti/s',
               'lem_full_uc': 'τίς',
               'morph': 'N-PRI-ATT',
            

# 4 - Usage examples (Latin) <a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

Morphkit also has **basic support** for Latin. This is limited to the functions that allow to read the Morpheus Analytic blocks and store its content to a Python dictionairy.

## 4.1 - Obtain Morpheus Analytic blocks for a Latin word <a class="anchor" id="bullet4x1"></a>

In [11]:
print(morphkit.get_word_blocks("vini",api_endpoint,language="latin"))


:raw vini

:workw vi_ni_
:lem vinum
:prvb 				
:aug1 				
:stem vi_n	 neut			us_i
:suff 				
:end i_	 neut gen sg			us_i

:raw vini

:workw vi_ni_
:lem vinus
:prvb 				
:aug1 				
:stem vi_n	 masc			us_i
:suff 				
:end i_	 masc nom/voc pl			us_i

:raw vini

:workw vi_ni_
:lem vinus
:prvb 				
:aug1 				
:stem vi_n	 masc			us_i
:suff 				
:end i_	 masc gen sg			us_i



## 4.2 - Split the result in blocks and store in dictionary<a class="anchor" id="bullet4x2"></a>

In [12]:
import pprint as pp
raw_text=morphkit.get_word_blocks("dico",api_endpoint,language="latin")
blocks=morphkit.split_into_raw_blocks(raw_text)
all_parses = []
for block in blocks:
    raw_beta, parses = morphkit.parse_word_block(block,"latin")
    all_parses.append(parses)
    pp.pprint(parses)

[{'end': 'o_',
  'end_codes': ['conj1'],
  'lem_base': 'dico#',
  'lem_full': 'dico#1',
  'lem_homonym': 1,
  'mood': 'ind',
  'number': 'sg',
  'person': '1st',
  'raw': 'dico',
  'stem': 'dic',
  'stem_codes': ['conj1', 'are_vb'],
  'tense': 'pres',
  'voice': 'act',
  'workw': 'dico_'}]
[{'end': 'o_',
  'end_codes': ['conj3'],
  'lem_base': 'dico#',
  'lem_full': 'dico#2',
  'lem_homonym': 2,
  'mood': 'ind',
  'number': 'sg',
  'person': '1st',
  'raw': 'dico',
  'stem': 'di_c',
  'stem_codes': ['conj3'],
  'tense': 'pres',
  'voice': 'act',
  'workw': 'di_co_'}]


## 4.3 - Or all in one go <a class="anchor" id="bullet4x3"></a>

In [13]:
import pprint as pp
api_endpoint="10.0.1.156:1315"
pp.pprint(morphkit.analyse_word_with_morpheus("puella",api_endpoint,language="latin"))

{'analyses': [{'case': 'abl',
               'end': 'a_',
               'end_codes': ['a_ae'],
               'gender': 'fem',
               'lem_base': 'puella',
               'lem_full': 'puella',
               'number': 'sg',
               'raw': 'puella',
               'stem': 'puell',
               'stem_codes': ['a_ae'],
               'stem_gender': 'fem',
               'workw': 'puella_'},
              {'case': ['nom', 'voc'],
               'end': 'a',
               'end_codes': ['a_ae'],
               'gender': 'fem',
               'lem_base': 'puella',
               'lem_full': 'puella',
               'number': 'sg',
               'raw': 'puella',
               'stem': 'puell',
               'stem_codes': ['a_ae'],
               'stem_gender': 'fem',
               'workw': 'puella'}],
 'blocks': 2,
 'raw_bc': 'puella'}


This allows for further processing using Python.

# 5 - Additional features<a class="anchor" id="bullet5"></a>
##### [Back to ToC](#TOC)

The Morphkit package also contains a few auxiliary functions: 
 - [`decode_tag`](https://tonyjurg.github.io/morphkit/api/autogen/morphkit.decode_tag.html)
 - [`compare_tags`](https://tonyjurg.github.io/morphkit/api/autogen/morphkit.compare_tags.html)
 - [`annotate_and_sort_analyses`](https://tonyjurg.github.io/morphkit/api/autogen/morphkit.annotate_and_sort_analyses.html)

Below these functions will be demonstrated. 

## 5.1 - Decode morphological tag <a class="anchor" id="bullet5x1"></a>

This function is creating a dictionairy with morphological properties from a single morphological tag.

In [26]:
pp.pprint(morphkit.decode_tag('V-PAI-3S'))

{'Mood': 'Indicative',
 'Number': 'Singular',
 'Part of Speech': 'Verb',
 'Person': 'Third Person',
 'Tense': 'Present',
 'Voice': 'Active'}


## 5.2 - Compare morphological tags <a class="anchor" id="bullet5x2"></a>

The following function computes the weighted similarity between two morphological tags. It generates a composite score that reflects how closely the items match across their individual morphological features. Note that its primary purpose is to support sorting, not to provide an exact measure of similarity for the underlying word forms in their syntactic contexts. Any such estimate necessarily remains an approximation.

In [27]:
pp.pprint(morphkit.compare_tags('N-VSM-A', 'N-GSM'))

{'details': {'Case': {'similarity': 0.2,
                      'tag1': 'Vocative',
                      'tag2': 'Genitive',
                      'weight': 4},
             'Gender': {'similarity': 1.0,
                        'tag1': 'Masculine',
                        'tag2': 'Masculine',
                        'weight': 2},
             'Mood': {'similarity': 0, 'tag1': '', 'tag2': '', 'weight': 0},
             'Number': {'similarity': 1.0,
                        'tag1': 'Singular',
                        'tag2': 'Singular',
                        'weight': 3},
             'Part of Speech': {'similarity': 1.0,
                                'tag1': 'Noun',
                                'tag2': 'Noun',
                                'weight': 10},
             'Person': {'similarity': 0, 'tag1': '', 'tag2': '', 'weight': 0},
             'Suffix': {'similarity': 0, 'tag1': '', 'tag2': '', 'weight': 0},
             'Tense': {'similarity': 0, 'tag1': '', 'tag2': '', 'weigh

## 5.3 - Sort Morpheus blocks based on reference tag/lemma <a class="anchor" id="bullet5x3"></a>

Now for the most intricate functon, the sorting of the Morpheus results, we need to provide both a morphological tag and an lemma as the criteria for sorting.

In [25]:
full_analysis=morphkit.analyse_word_with_morpheus(beta_code.greek_to_beta_code(u'δία'),api_endpoint)
print ('Before sorting:')
pp.pprint (full_analysis)
N1904_morph="PREP"
N1904_lemma="διά"

full_analysis = morphkit.annotate_and_sort_analyses(
    full_analysis,
    reference_morph = N1904_morph,
    reference_lemma = N1904_lemma
)
print ('\nAfter sorting:')
pp.pprint (full_analysis)

Before sorting:
{'analyses': [{'case': ['nom', 'voc'],
               'dialects': ['epic'],
               'end_codes': ['os_h_on'],
               'end_flags': ['indeclform'],
               'gender': 'fem',
               'lem_base_bc': 'di=os',
               'lem_base_uc': 'δῖος',
               'lem_full_bc': 'di=os',
               'lem_full_uc': 'δῖος',
               'morph': 'N-PRI',
               'number': 'sg',
               'pos': 'proper noun indeclinable',
               'raw_bc': 'di/a',
               'raw_uc': 'δία',
               'stem_bc': 'di=a',
               'stem_flags': ['indeclform'],
               'stem_uc': 'δῖα',
               'workw_bc': 'di=a',
               'workw_uc': 'δῖα'},
              {'case': ['nom', 'voc', 'acc'],
               'end_bc': 'a',
               'end_codes': ['os_h_on'],
               'end_uc': 'α',
               'gender': 'neut',
               'lem_base_bc': 'di=os',
               'lem_base_uc': 'δῖος',
               'lem

# 6 - Attribution and footnotes <a class="anchor" id="bullet6"></a>
##### [Back to ToC](#TOC)



This Jupyter notebook is released under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://github.com/tonyjurg/N1904addons/blob/main/LICENSE.md).

# 7 - Notebook version<a class="anchor" id="bullet7"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.1</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>5 July 2025</td>
    </tr>
  </table>
</div>