# The Problem of Loanwords: detection and remedies

# Detect Transliterated Greek
### In this notebook you will:
1. ETL (Extract-Transform-Load) data
    1. Extract data using a Corpus Reader object
    1. Transform data using a reusable, composable Scikit-Learn Pipeline object
    1. Load the data into a data matrix and a classifier
1. Train several classifiers and select the best algorithm for the data
1. Use GridSearch to tune hyperparameters to achieve the best algorithm performance
1. Use the classifier to predict whether or not a word is transliterated Greek
1. Examine some unseen, untrained data to discover how well the classifier generalizes to unseen data
1. Save the classifier for reuse
1. Record the trained classifier model's provenance
## Why? 
### If you desire to make a high quality word embedding, you will probably want to filter out Greek words which may appear transliterated into Latin. Often Latin authors will quote transliterated Greek words (sometimes transliterated Greek words are valid Latin words, but most often they can pollute a computational view of the language). 
#### The data sets are: the works of Vergil, Eutropius, and Plato's Apologia transliterated into Latin. We will use our classifier to examine the corpus of Cicero to detect the use of transliterated Greek words, and we'll assess the classifier's effectiveness on the entire latin library corpus

In [2]:
%load_ext autoreload
%autoreload 2
%doctest_mode on
%matplotlib inline
import warnings
warnings.simplefilter('ignore') # quiet warnings for presentation purposes only

Exception reporting mode: Plain
Doctest mode is: ON


### Imports

In [12]:
import datetime
import glob
import json
import logging
import multiprocessing
import os
from copy import deepcopy

from tqdm import tqdm
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn.ensemble import BaggingClassifier, ExtraTreesClassifier, RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.utils.extmath import density
from sklearn.model_selection import cross_val_score
from sklearn.dummy import DummyClassifier
from sklearn.externals.joblib import dump, load
import sklearn
from sklearn.preprocessing import FunctionTransformer, LabelEncoder
from cltk.prosody.latin.scansion_constants import ScansionConstants
from cltk.prosody.latin.string_utils import remove_punctuation_dict
from cltk.corpus.readers import get_corpus_reader
from cltk.utils.featurization import word_to_features
from cltk.utils.file_operations import md5
from cltk.utils.matrix_corpus_fun import (
    distinct_words,
    separate_camel_cases,
    drop_empty_lists,
    drop_non_lower,
    drop_arabic_numeric,
    drop_all_caps,
    drop_empty_strings,
    jv_transform,
    splice_hyphens,
    accept_editorial,
    profile_chars,
    demacronize,
    drop_enclitics,
    drop_fringe_punctuation,
    divide_separate_words,
    drop_all_punctuation)
from building_language_model.aeoe_replacer import aeoe_transform

### Turn on logging, primarily so that library methods may report warnings

In [13]:
LOG = logging.getLogger('make_model')
LOG.addHandler(logging.NullHandler())
logging.basicConfig(level=logging.INFO)

### Define a corpus reader and remap the _fileids to the text directory containing the corpus of Livy

In [18]:
reader = get_corpus_reader('latin_text_latin_library', language='latin')
ALL_FILE_IDS = list(reader.fileids())
good_files = [ file for file in ALL_FILE_IDS
              if 'vergil' in file or
              'eutropius' in file]

LOG.info('available good files %s', len(good_files))
reader._fileids = good_files
good_files
                   

INFO:make_model:available good files 36


['eutropius/eutropius1.txt', 'eutropius/eutropius10.txt', 'eutropius/eutropius2.txt', 'eutropius/eutropius3.txt', 'eutropius/eutropius4.txt', 'eutropius/eutropius5.txt', 'eutropius/eutropius6.txt', 'eutropius/eutropius7.txt', 'eutropius/eutropius8.txt', 'eutropius/eutropius9.txt', 'vergil/aen1.txt', 'vergil/aen10.txt', 'vergil/aen11.txt', 'vergil/aen12.txt', 'vergil/aen2.txt', 'vergil/aen3.txt', 'vergil/aen4.txt', 'vergil/aen5.txt', 'vergil/aen6.txt', 'vergil/aen7.txt', 'vergil/aen8.txt', 'vergil/aen9.txt', 'vergil/ec1.txt', 'vergil/ec10.txt', 'vergil/ec2.txt', 'vergil/ec3.txt', 'vergil/ec4.txt', 'vergil/ec5.txt', 'vergil/ec6.txt', 'vergil/ec7.txt', 'vergil/ec8.txt', 'vergil/ec9.txt', 'vergil/geo1.txt', 'vergil/geo2.txt', 'vergil/geo3.txt', 'vergil/geo4.txt']

### Define a custom Scikit-learn Pipeline, and call the CorpusReader `sents()` method to process the texts
#### The functions used in the pipelines are doctest documented in the `corpus_cleaning` module
#### The functions used and their order was developed iteratively by running the pipelines on actual data and carefully inspecting the results.

In [20]:
process_text_model = Pipeline([
    ('separate_camel_cases', FunctionTransformer(separate_camel_cases, validate=False)),
    ('splice_hyphens', FunctionTransformer(splice_hyphens, validate=False)),
    ('jv_transform', FunctionTransformer(jv_transform, validate=False)),
    ('aeoe_transform', FunctionTransformer(aeoe_transform, validate=False)),
    ('accept_editorial', FunctionTransformer(accept_editorial, validate=False)),
    ('drop_enclitics', FunctionTransformer(drop_enclitics, validate=False)),
    ('drop_fringe_punctuation', FunctionTransformer(drop_fringe_punctuation, validate=False)),
    ('drop_all_punctuation', FunctionTransformer(drop_all_punctuation, validate=False)),
    ('drop_non_lower', FunctionTransformer(drop_non_lower, validate=False)),
    ('drop_arabic_numeric', FunctionTransformer(drop_arabic_numeric, validate=False)),
    ('drop_all_caps', FunctionTransformer(drop_all_caps, validate=False)),
    ('divide_separate_words', FunctionTransformer(divide_separate_words, validate=False)),
    ('drop_empty_lists', FunctionTransformer(drop_empty_lists, validate=False)),
    ('drop_empty_strings', FunctionTransformer(drop_empty_strings, validate=False))])

X = process_text_model.fit_transform(tqdm(list(reader.sents())))

100%|██████████| 6580/6580 [00:00<00:00, 38023.29it/s] | 3897/6580 [00:00<00:00, 38966.14it/s]


### Analyze the resulting matrix, by profiling the character occurences and go back and adjust the pipeline as necessary, and turn the output into a distinct set of words

In [22]:
char_count = profile_chars(X)
print('Character distribution profile, total chars:', sum(char_count.values()))
print(char_count)
distinct_good_latin = distinct_words(X)
print('Number of distinct words in Eutropius/Vergil sample', len(distinct_good_latin))

Character distribution profile, total chars: 590072
Counter({'e': 69370, 'i': 59205, 'u': 56178, 'a': 55052, 't': 47491, 's': 44518, 'r': 39859, 'n': 35666, 'm': 30375, 'o': 29541, 'c': 22855, 'l': 19290, 'p': 15088, 'd': 14245, 'q': 9674, 'b': 8559, 'g': 7453, 'f': 5935, 'h': 4656, 'x': 2791, 'A': 1653, 'y': 1152, 'T': 1088, 'P': 987, 'C': 915, 'I': 898, 'M': 690, 'S': 685, 'R': 543, 'D': 530, 'L': 511, 'H': 508, 'N': 383, 'E': 348, 'U': 290, 'G': 267, 'O': 222, 'B': 198, 'F': 159, 'Q': 148, 'z': 39, 'Z': 22, 'K': 13, 'X': 12, 'k': 10})
Number of distinct words in Eutropius/Vergil sample 22899


### Load the transliterated Greek examples, profile the raw character counts, and go back and tune the pipeline if necessary

In [23]:
transliterated_greek_file = 'greek.transliterated.plato.apologia.txt'   
greek_transliterated_X = []

with open(transliterated_greek_file, 'rt') as reader:
    greek_transliterated_X.append(reader.read().split())

print('Unprocessed character distribution profile of transliterated Greek',
      profile_chars(greek_transliterated_X))

Unprocessed character distribution profile of transliterated Greek Counter({'i': 4480, 'a': 4459, 'e': 4386, 'o': 4298, 't': 3851, 'n': 3647, 's': 2523, 'h': 2279, 'u': 2200, 'ō': 1544, 'p': 1518, 'k': 1503, 'ē': 1447, 'm': 1435, 'l': 1292, 'r': 1181, 'd': 1092, 'g': 836, ',': 805, '.': 263, "'": 205, 'b': 170, ':': 138, 'x': 126, 'z': 110, ';': 82, 'A': 74, '—': 53, '’': 43, 'M': 36, '‘': 32, 'S': 24, 'D': 11, 'K': 11, 'H': 11, 'P': 10, 'T': 9, 'L': 7, 'E': 3, 'O': 3, 'G': 1, 'Ē': 1, 'N': 1, 'R': 1})


### Process the transliterated Greek examples with another custom Scikit-learn Pipeline, analyze character profiles for tuning, create a set distinct words, with and without macrons

In [24]:
process_greek_transliterated_model = Pipeline([
    ('splice_hyphens', FunctionTransformer(splice_hyphens, validate=False)),
    ('divide_separate_words', FunctionTransformer(divide_separate_words, validate=False)),
    ('drop_fringe_punctuation', FunctionTransformer(drop_fringe_punctuation, validate=False)),
    ('drop_all_punctuation', FunctionTransformer(drop_all_punctuation, validate=False)),
    ('drop_empty_strings', FunctionTransformer(drop_empty_strings, validate=False)),
    ('drop_empty_lists', FunctionTransformer(drop_empty_lists, validate=False))])

greek_X = process_greek_transliterated_model.fit_transform(greek_transliterated_X)
print('Character distribution profile of transliterated Greek', profile_chars(greek_X))
distinct_transliterated_greek_examples = distinct_words(greek_X)
print('distinct_transliterated_greek_examples', len(distinct_transliterated_greek_examples))
demarcronized_greek_X = demacronize(greek_X)
distinct_transliterated_greek_examples = distinct_transliterated_greek_examples | distinct_words(
    demarcronized_greek_X)
print('distinct_transliterated_greek_examples with & w/out macrons',
      len(distinct_transliterated_greek_examples))

Character distribution profile of transliterated Greek Counter({'i': 4480, 'a': 4459, 'e': 4386, 'o': 4298, 't': 3851, 'n': 3647, 's': 2523, 'h': 2279, 'u': 2200, 'ō': 1544, 'p': 1518, 'k': 1503, 'ē': 1447, 'm': 1435, 'l': 1292, 'r': 1181, 'd': 1092, 'g': 836, 'b': 170, 'x': 126, 'z': 110, 'A': 74, 'M': 36, 'S': 24, 'D': 11, 'K': 11, 'H': 11, 'P': 10, 'T': 9, 'L': 7, 'E': 3, 'O': 3, 'G': 1, 'Ē': 1, 'N': 1, 'R': 1})
distinct_transliterated_greek_examples 2406
distinct_transliterated_greek_examples with & w/out macrons 3356


### Remove any words from the transliterated Greek words which have also appear in the Latin corpus

In [27]:
shared_words = distinct_transliterated_greek_examples & distinct_livy
print('Shared_words:', shared_words)
only_greek_transliterated = distinct_transliterated_greek_examples - shared_words
print('Number of distinct transliterated Greek without matching words in the Latin corpus',
      len(only_greek_transliterated))
# compare with Livy, should probably add some, e.g.: eis

Shared_words: {'lego', 'Orphei', 'dia', 'ito', 'arista', 'heroas', 'hoste', 'ei', 'par', 'esto', 'tonde', 'deo', 'para', 'pater', 'Rhadamanthus', 'polo', 'ara', 'de', 'te', 'Troia', 'meta', 'o', 'iste', 'se', 'tot', 'ergo', 'hostis', 'en', 'duo', 'ne', 'hora', 'ex', 'no', 'porro', 'aedes', 'asto', 'pateras', 'pro', 'omen', 'Di', 'horas', 'an', 'dei', 'este', 'lege', 'di', 'ego', 'hos', 'Salamina', 'Minos', 'ea', 'me', 'nux'}
Number of distinct transliterated Greek without matching words in the Latin corpus 3303


### Create a simple data matrix of the single words, transliterated Greek examples followed by the Latin words

In [28]:
X = [list(only_greek_transliterated) + list(distinct_good_latin)]
len(X[0])

26202

### Before we transform our data matrix into a data feature matrix, let's check on the max word lengths

In [30]:
def get_max_word_len(col):
    max_word = 0
    for word in col:
        max_word = max(len(word), max_word)
    return max_word

print('Max word length in Eutropius/Vergil sample', get_max_word_len(distinct_livy))
print('Max word length in transliterated Greek sample', get_max_word_len(only_greek_transliterated))
max_len = max(20, max(get_max_word_len(only_greek_transliterated), get_max_word_len(distinct_livy)))
# for testing the feature making function below set 
# max_len=17

Max word length in Eutropius/Vergil sample 17
Max word length in transliterated Greek sample 18


In [31]:
# This method is included in corpus_cleaning.py but we include it here for easy reference
def word_to_features(word, max_word_length=20):
    """
    Convert a single word into an array of numbers based on character ordinals, with padding
    :param word: a single word
    :param max_word_length: the maximum word length for the feature array
    :return: a list of integers padded to the max word length

    >>> word_to_features('far', 20)
    [116, 114, 97, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32]
    """
    if len(word) > max_word_length:
        LOG.warning('Excessive word length {} for {}, truncating to {}'.format(len(word), word,
                                                                               max_word_length))
        word = word[:max_word_length]
    word = list(word)
    word.reverse() #: encourage aligning on word endings if possible
    return [ord(c) for c in "".join(word).ljust(max_word_length, ' ')]


In [32]:
all_y = np.array([1] * len(only_greek_transliterated) + [0] * len(distinct_livy), dtype=float)
print('y shape', all_y.shape)
# We use a label encoder to automatically capture the range of values for provenance
label_encoder = LabelEncoder()
label_encoder.fit(all_y)
all_words = list(only_greek_transliterated) + list(distinct_livy)
all_X = np.array([word_to_features(word, max_len) for word in all_words])
print('X shape', all_X.shape)
num_samples = all_y.shape[0] # to be used later by model provenance
num_features = all_X.shape[1] # to be used later by model provenance

y shape (26202,)
X shape (26202, 20)


### Train a DummyClassifier to show the baseline which we must improve above

In [33]:
dummy = DummyClassifier(strategy='stratified', random_state=0)
features_train, features_test, target_train, target_test = train_test_split(all_X, all_y,
                                                                            random_state=0)
dummy.fit(features_train, target_train)
dummy_score = dummy.score(features_test, target_test)
print('Dummy classifier: {}'.format(dummy_score))

Dummy classifier: 0.7742329415356434


### Train and classify the data using several classifiers, printing out the cross validation score results

In [35]:
classifiers = [
    MultinomialNB,
    GaussianNB,
    KNeighborsClassifier,
    BaggingClassifier,
    ExtraTreesClassifier,
    RandomForestClassifier
]
for cls in tqdm(classifiers):
    scores = cross_val_score(cls(), all_X, all_y,
                             scoring='accuracy',
                             n_jobs=multiprocessing.cpu_count(),
                             cv=5)
    print('{} {} {}'.format(str(cls), scores.mean(), scores))

  0%|          | 0/6 [00:00<?, ?it/s] 33%|███▎      | 2/6 [00:00<00:00, 14.13it/s]

<class 'sklearn.naive_bayes.MultinomialNB'> 0.7741388912184812 [0.778859   0.76874642 0.78172105 0.7730916  0.76827639]
<class 'sklearn.naive_bayes.GaussianNB'> 0.8919550476176541 [0.89105133 0.89047892 0.89601221 0.88244275 0.89979004]


 50%|█████     | 3/6 [00:01<00:01,  2.47it/s]

<class 'sklearn.neighbors.classification.KNeighborsClassifier'> 0.9197773125067078 [0.92081664 0.92043503 0.91566495 0.91946565 0.92250429]


 67%|██████▋   | 4/6 [00:02<00:01,  1.97it/s]

<class 'sklearn.ensemble.bagging.BaggingClassifier'> 0.956949732157588 [0.95401641 0.95745087 0.96126693 0.95629771 0.95571674]


 83%|████████▎ | 5/6 [00:02<00:00,  2.34it/s]

<class 'sklearn.ensemble.forest.ExtraTreesClassifier'> 0.9439356081672641 [0.94123259 0.94428544 0.94504865 0.94561069 0.94350067]


100%|██████████| 6/6 [00:02<00:00,  2.59it/s]

<class 'sklearn.ensemble.forest.RandomForestClassifier'> 0.9520266214577967 [0.95000954 0.95153597 0.95306239 0.95343511 0.95209009]





#### Note: Training on 10% of Livy yielded a dummy classifier of 50%, and a BaggingClassifier with 90%. By using more data, the dummy heuristic goes up, but so does the ceiling of the classifier's ultimate score.
### Run Grid Search to optimize one of the best classifiers

In [36]:
grids = GridSearchCV(cv=5, error_score='raise',
                     estimator=BaggingClassifier(
                         base_estimator=DecisionTreeClassifier(class_weight=None,
                                                               criterion='gini',
                                                               max_depth=None,
                                                               max_features=None,
                                                               max_leaf_nodes=None,
                                                               min_samples_leaf=1,
                                                               min_samples_split=2,
                                                               min_weight_fraction_leaf=0.0,
                                                               presort=False,
                                                               random_state=1)), n_jobs=-1,
                     param_grid={
                         # Here's how to test variations on the super class estimator
                         #     'base_estimator__criterion': ['gini'], # 'entropy'],
                         'n_estimators': [10, 100, 200],
                         # Here are some other ranges tested
                     #  150, 250, 300], # 50, 70, 100, 150, 200, 250, 300],
                         'max_features': [0.2, 0.4, 0.7, 1.0],
                         'max_samples': [0.5, 1.0]  # 0.6, 0.8,
                     })
grids.fit(all_X, all_y)
print('Best score: %s', grids.best_score_)
print('Best params %s', grids.best_params_)

# Best score: %s 0.9589360709286048
# Best params %s {'base_estimator__criterion': 'gini', 'max_features': 0.4, 'max_samples': 1.0, 'n_estimators': 200}


Best score: %s 0.9667964277536066
Best params %s {'max_features': 0.7, 'max_samples': 1.0, 'n_estimators': 100}


### Using the best parameters from GridSearch, build the optimal classifier

In [37]:
# First let's copy the parameters for the provenance file
mdl_params = deepcopy(grids.best_params_)
# Let's also remove the base_estimator parameters, since they aren't honored by the constructor, unlike GridSearch
if 'base_estimator__criterion' in mdl_params:
    del mdl_params['base_estimator__criterion']
classifier = BaggingClassifier(**mdl_params)
classifier.fit(all_X, all_y)

BaggingClassifier(base_estimator=None, bootstrap=True,
         bootstrap_features=False, max_features=0.7, max_samples=1.0,
         n_estimators=100, n_jobs=None, oob_score=False, random_state=None,
         verbose=0, warm_start=False)

### Let's look at Cicero for transliterated Greek words 

In [42]:
greek_in_cicero = set()
reader = get_corpus_reader(corpus_name='latin_text_latin_library', language='latin')

cicero = [ file for file in reader.fileids()
         if 'cicero' in file ]
for file in tqdm(cicero):
    reader = get_corpus_reader(corpus_name='latin_text_latin_library', language='latin')
    reader._fileids = [file]
    unseen_X = process_text_model.fit_transform(list(reader.sents()))
    distinct_unseen = distinct_words(unseen_X)
    print('Checking: {} : {:,} words'.format(file[file.rfind('/') + 1:], len(distinct_unseen)))
    unseen_words = list(distinct_unseen)
    if unseen_words:
        arr = classifier.predict(
            np.array([word_to_features(word, max_len) for word in unseen_words]))
        marks = arr.tolist()
        found_greek = [unseen_words[idx]
                       for idx, point in enumerate(marks)
                       if point == 1]
        greek_in_cicero |= set(found_greek)
        print('found: {:,} transliterated Greek words: {}'.format(np.count_nonzero(arr), found_greek))
print('Number of Greek words not in training data: {:,}'.format(len(greek_in_cicero - only_greek_transliterated)))

  0%|          | 0/138 [00:00<?, ?it/s]  1%|          | 1/138 [00:00<02:10,  1.05it/s]

Checking: acad.txt : 2,057 words
found: 3 transliterated Greek words: ['eis', 'eita', 'philosophandi']


  1%|▏         | 2/138 [00:02<02:15,  1.00it/s]

Checking: adbrutum1.txt : 2,885 words
found: 4 transliterated Greek words: ['ep', 'coniunctiori', 'emphatikôtero', 'med']


  2%|▏         | 3/138 [00:02<02:02,  1.10it/s]

Checking: adbrutum2.txt : 944 words
found: 1 transliterated Greek words: ['discrimen']


  3%|▎         | 4/138 [00:03<02:14,  1.01s/it]

Checking: amic.txt : 3,358 words
found: 2 transliterated Greek words: ['eis', 'dissimilitudo']


  4%|▎         | 5/138 [00:04<02:04,  1.07it/s]

Checking: arch.txt : 1,509 words
found: 1 transliterated Greek words: ['eis']


  4%|▍         | 6/138 [00:05<02:13,  1.01s/it]

Checking: att1.txt : 3,642 words
found: 41 transliterated Greek words: ['tauta', 'periodoi', 'anabolai', 'hystero', 'eis', 'aretes', 'kataskeuai', 'oude', 'aner', 'Moysai', 'kai', 'to', 'epi', 'eisi', 'ta', 'dissimilitudo', 'kampai', 'chrestomathe', 'hepei', 'onar', 'ouch', 'pakei', 'Amaltheioi', 'melei', 'aphelestatos', 'admurmurante', 'lekythous', 'hoppos', 'moi', 'topothesiai', 'soloika', 'istorika', 'politikois', 'oud', 'anathema', 'tois', 'constituisti', 'agona', 'politikos', 'pros', 'oyde']


  5%|▌         | 7/138 [00:07<02:17,  1.05s/it]

Checking: att10.txt : 2,415 words
found: 38 transliterated Greek words: ['aphraktoi', 'aretÍ', 'ho', 'perscripsisti', 'ma', 'akimoan', 'ero', 'ethous', 'times', 'kai', 'tou', 'eti', 'to', 'achnumenoi', 'akleios', 'genesomenou', 'thumikotero', 'ta', 'Korukaiioi', 'protetuchthai', 'puthesthai', 'Lakonike', 'ha', 'anethopoieto', 'politikotato', 'mega', 'sumpatheia', 'malista', 'accommodaturi', 'politikou', 'toi', 'zelotupiai', 'polles', 'dein', 'ginesthai', 'Íthos', 'eikastes', 'ge']


  6%|▌         | 8/138 [00:08<02:21,  1.09s/it]

Checking: att11.txt : 2,098 words
found: 6 transliterated Greek words: ['ero', 'cistophoro', 'Epheso', 'Ephesi', 'intellexisti', 'med']


  7%|▋         | 9/138 [00:09<02:34,  1.20s/it]

Checking: att12.txt : 2,807 words
found: 29 transliterated Greek words: ['lesxh', 'krhnh', 'panta', 'politikoi', 'ero', 'mhlwsh', 'eis', 'loghqh', 'meqarmosomai', 'ktopismoi', 'Qeopompou', 'nereuqestero', 'perantologiai', 'gerontikwtero', 'bebiwtai', 'au', 'hdou=i', 'proi', 'makarwnnhsoi', 'Alfeiou', 'melei', 'Peirhnh', 'timh', 'gar', 'poi', 'med', 'Skeyai', 'tetufw=sqai', 'oi']


  7%|▋         | 10/138 [00:11<02:44,  1.29s/it]

Checking: att13.txt : 2,952 words
found: 62 transliterated Greek words: ['genesteroi', 'meroi', 'dunhai', 'spoudazei', 'zhlotupei=sqai', 'Arxidhmou', 'Emetikh', 'so', 'ke', 'eis', 'parakinduneuei', 'yuxh=i', 'poxh', 'kekepfwmai', 'kai', 'skoliai=i', 'skordou', 'tei=xoi', 'perioxai', 'to', 'Pronoiai', 'Panaitiou', 'intellexisti', 'kekrika', 'mou', 'ta', 'meiligma', 'ktenestero', 'ou', 'doi', 'au', 'logoisi', 'pote', 'logo', 'proi', 'filologwtera', 'deinoi', 'spoudh', 'filostorgotero', 'zhlotupei=i', 'th', 'moi', 'op', 'deomai', 'pompeu=sai', 'ambulatiuncula', 'autoi=i', 'kata', 'dika', 'gar', 'polugrafwtatoi', 'mh', 'med', 'nooi', 'difqerai', 'toi=i', 'Faidrou', 'probolh', 'Akadhmikh', 'safesteroi', 'oi', 'filaitioi']


  8%|▊         | 11/138 [00:12<02:39,  1.25s/it]

Checking: att14.txt : 2,596 words
found: 33 transliterated Greek words: ['ph=ma', 'naqewrhsii', 'pepinwmenai', 'gh', 'polesqai', 'dedotai', 'newterismou', 'su', 'kalh=i', 'politeuesqai', 'sorowntes', 'sithsai', 'animaduertendi', 'doih=', 'ou', 'lih', 'bebiwtai', 'Lh=roi', 'logo', 'Akolasia', 'mega', 'phratou', 'posoloika', 'phnemioi', 'Futeolaizo', 'toi', 'kata', 'pepoliteumeqa', 'pinoi', 'mh', 'sqloi=i', 'furmoi', 'telou=i']


  9%|▊         | 12/138 [00:13<02:36,  1.24s/it]

Checking: att15.txt : 2,422 words
found: 33 transliterated Greek words: ['toutou', 'nantifwnhsia', 'speisasqai', 'quous', 'Persikh', 'metewroi', 'prokoph', 'perattikoi', 'kai', 'tou', 'paregxeirhsii', 'semnw=i', 'to', 'tis', 'ta', 'congrediendi', 'dunatai', 'kaqhkontoi', 'filoi', 'ou', 'pepinwmenwi', 'doi', 'au', 'Skopoi', 'Epoxh', 'htoreuousi', 'th', 'contumelioso', 'plou=i', 'dein', 'soi', 'koi', 'oi']


  9%|▉         | 13/138 [00:14<02:32,  1.22s/it]

Checking: att16.txt : 2,558 words
found: 31 transliterated Greek words: ['mbrotoi', 'ep', 'rmainontei', 'peirazesqai', 'po', 'ero', 'eis', 'oph', 'nqh', 'tou', 'rmainonta', 'suntacomai', 'dolesxoi', 'ta', 'dunatai', 'kaqhkontoi', 'swqeih', 'sunagwgh', 'podexqai', 'doi', 'Lh=roi', 'nhsou', 'klogai', 'moi', 'skhptomai', 'kata', 'soi', 'toioutou', 'ge', 'discrimen', 'met']


 10%|█         | 14/138 [00:16<02:32,  1.23s/it]

Checking: att2.txt : 3,314 words
found: 72 transliterated Greek words: ['phusai', 'suskeuazetai', 'dibaphoi', 'helikta', 'egoge', 'auliskois', 'hupo', 'ho', 'elencheie', 'politeiai', 'heis', 'contempsisti', 'anaphainesthai', 'ouden', 'eis', 'sophisteuei', 'apospasmatia', 'cistophoro', 'Sokratikos', 'utpote', 'kratounton', 'all', 'kai', 'kaloi', 'prospepontha', 'eti', 'opithe', 'aristokratikotatos', 'aselgous', 'to', 'ainigmois', 'smikroisi', 'epi', 'tis', 'epangellomai', 'he', 'aspazetai', 'Amaltheiai', 'idesthai', 'homologoumenos', 'ou', 'enturanneisthai', 'philosophei', 'prosthe', 'antherographeisthai', 'tas', 'apamunesthai', 'agathe', 'aphilodoxo', 'moi', 'toi', 'out', 'esophizeto', 'Theophrastou', 'dunamai', 'hote', 'anathesei', 'aideomai', 'kata', 'theioi', 'gar', 'politikoteros', 'amunesthai', 'Kurou', 'kat', 'akkizometha', 'ka', 'adikaiarchoi', 'semnoteros', 'politeuomai', 'hes', 'glukerotero']


 11%|█         | 15/138 [00:17<02:24,  1.17s/it]

Checking: att3.txt : 2,050 words
found: 4 transliterated Greek words: ['ep', 'ero', 'eis', 'Epheso']


 12%|█▏        | 16/138 [00:18<02:21,  1.16s/it]

Checking: att4.txt : 2,778 words
found: 32 transliterated Greek words: ['phthimenoisi', 'exoterikous', 'tauta', 'ho', 'politeiai', 'hupothesei', 'trisareiopagitas', 'houtos', 'philos', 'kai', 'Epheso', 'tode', 'empazeto', 'pote', 'ouk', 'Antiphonti', 'tagoi', 'eie', 'sumpatheia', 'ouch', 'nomenclatori', 'Phokulidou', 'emphilosophesai', 'porpapumnai', 'moi', 'toi', 'eukairotero', 'med', 'eidenai', 'opadoi', 'mepo', 'politikos']


 12%|█▏        | 17/138 [00:19<02:20,  1.16s/it]

Checking: att5.txt : 2,753 words
found: 30 transliterated Greek words: ['hoiaper', 'glukupikro', 'sumphilodoxousi', 'spoudaiotero', 'panika', 'ero', 'eis', 'dialogous', 'endomuchoi', 'hodou', 'utpote', 'tou', 'Epheso', 'akroteleutio', 'Ariobarzanes', 'to', 'Ephesi', 'tis', 'pephusiomai', 'ta', 'he', 'politikotero', 'kena', 'sumpatheia', 'adorodoketo', 'dein', 'polemou', 'med', 'discrimen', 'erdoi']


 13%|█▎        | 18/138 [00:20<02:16,  1.13s/it]

Checking: att6.txt : 2,709 words
found: 112 transliterated Greek words: ['Sipountioi', 'eiko', 'purous', 'anenasthai', 'tauta', 'dieulutesthai', 'hupo', 'anantiphoneto', 'ho', 'metaichmioi', 'halos', 'panta', 'parephthengeto', 'elathe', 'ero', 'so', 'huparchonto', 'prosanatrephomene', 'eis', 'touto', 'exasphalisai', 'akoinonoetos', 'schediazonta', 'katalogoi', 'pephurakenai', 'kai', 'tou', 'Epheso', 'noumeniai', 'paredoke', 'houxeleutheros', 'proekkeimenes', 'oistha', 'hemeras', 'Ariobarzanes', 'Phlious', 'palingenesia', 'edoxe', 'to', 'ainigmois', 'hemas', 'Ephesi', 'historikotatos', 'mou', 'mustikotero', 'periskepsamenos', 'kleronomesai', 'ta', 'apeleuthero', 'polukleous', 'onta', 'apognous', 'deuterou', 'turannoktonou', 'dissimilitudo', 'sphodra', 'ou', 'battarizo', 'lelethotos', 'Peloponneso', 'autou', 'Opountioi', 'hina', 'autika', 'ouk', 'opheilonta', 'emoi', 'hupodechthai', 'tas', 'pollou', 'hupomempsimoirous', 'akoinonoeto', 'opheilethento', 'depou', 'soizetai', 'autois', 'moi'

 14%|█▍        | 19/138 [00:21<02:21,  1.19s/it]

Checking: att7.txt : 3,058 words
found: 59 transliterated Greek words: ['kalou', 'philosophotero', 'ho', 'elencheie', 'enscholazo', 'pou', 'eis', 'houtos', 'logou', 'times', 'meizo', 'hodou', 'hupekthemenos', 'all', 'kai', 'megiste', 'tou', 'apotripsai', 'phainoprosopei', 'phusike', 'dike', 'to', 'adulescentuli', 'achnumenoi', 'neotero', 'epi', 'tode', 'ta', 'antipoliteuomenou', 'aenigma', 'protetuchthai', 'ou', 'prosthe', 'oupote', 'episkopo', 'zetema', 'epeuthometha', 'emo', 'kako', 'Aideomai', 'theo', 'spondeiazonta', 'moi', 'kathodous', 'mede', 'praetermisisse', 'skaphos', 'aideomai', 'soi', 'gar', 'tois', 'empoliteuomai', 'sunapothanein', 'politikos', 'ametameletos', 'pros', 'discrimen', 'katathesei', 'tes']


 14%|█▍        | 20/138 [00:22<02:13,  1.13s/it]

Checking: att8.txt : 2,151 words
found: 22 transliterated Greek words: ['sundiemereuome', 'ep', 'skopos', 'ho', 'ero', 'apolitikotato', 'kai', 'kaloi', 'to', 'kalo', 'Prothespizo', 'tauth', 'emoi', 'aprosphonetous', 'tektainestho', 'toi', 'aideomai', 'gar', 'emou', 'pros', 'polla', 'met']


 15%|█▌        | 21/138 [00:24<02:17,  1.18s/it]

Checking: att9.txt : 2,875 words
found: 92 transliterated Greek words: ['anankais', 'kinduneusei', 'aparrÍsiasto', 'Korinthoi', 'pelorou', 'sunkinduneuteo', 'hupo', 'pantos', 'ho', 'meth', 'potmos', 'ero', 'aphemenoi', 'commeministi', 'su', 'oisth', 'philois', 'polemo', 'touto', 'oude', 'times', 'politikai', 'pos', 'all', 'kai', 'kinduneuseie', 'tou', 'huper', 'kairoi', 'dokimazonta', 'memigmenai', 'tropoi', 'to', 'tis', 'polemoi', 'mellei', 'hoti', 'airetai', 'he', 'logoi', 'esti', 'elapize', 'heautou', 'epeita', 'prourgou', 'autika', 'kindunou', 'pote', 'patho', 'logos', 'ouk', 'autoi', 'chanoi', 'aristois', 'Sophisteuo', 'dokosi', 'politike', 'tas', 'hetoimos', 'Byzantio', 'homos', 'turannoumene', 'anachoresanta', 'alaluktemai', 'hesuchazei', 'epei', 'hetairoi', 'moi', 'toi', 'prosptuxomai', 'eidos', 'politikois', 'soi', 'tous', 'polemou', 'bebouleusthai', 'gar', 'tois', 'kteinomenoi', 'poi', 'epamunai', 'megala', 'hupothesetai', 'ka', 'poieisthai', 'turannoumenes', 'pros', 'Hektora

 16%|█▌        | 22/138 [00:24<02:07,  1.10s/it]

Checking: balbo.txt : 2,546 words
found: 4 transliterated Greek words: ['O', 'eis', 'dissimilitudo', 'discrimen']


 17%|█▋        | 23/138 [00:27<03:00,  1.57s/it]

Checking: brut.txt : 6,779 words
found: 19 transliterated Greek words: ['suauiloquenti', 'po', 'exclamationib', 'Hippias', 'pote', 'tropous', 'intellegenti', 'Gorgias', 'ma', 'ede', 'e', 'consideranti', 'Peitho', 'pop', 'dein', 'eis', 'arcessiuisti', 'con', 'instrumento']


 17%|█▋        | 24/138 [00:28<02:35,  1.37s/it]

Checking: caecilium.txt : 2,097 words
found: 1 transliterated Greek words: ['discrimen']


 18%|█▊        | 25/138 [00:29<02:26,  1.30s/it]

Checking: caecina.txt : 3,114 words
found: 4 transliterated Greek words: ['eis', 'praetereuntes', 'dissimilitudo', 'a']


 19%|█▉        | 26/138 [00:30<02:21,  1.26s/it]

Checking: cael.txt : 3,395 words
found: 3 transliterated Greek words: ['constituitote', 'excluditote', 'percelebrata']


 20%|█▉        | 27/138 [00:31<02:05,  1.13s/it]

Checking: cat1.txt : 1,622 words
found: 3 transliterated Greek words: ['times', 'transtulisti', 'discripsisti']


 20%|██        | 28/138 [00:32<01:52,  1.03s/it]

Checking: cat2.txt : 1,551 words
found: 1 transliterated Greek words: ['deprehendero']


 21%|██        | 29/138 [00:33<01:43,  1.05it/s]

Checking: cat3.txt : 1,493 words
found: 1 transliterated Greek words: ['inflammatos']


 22%|██▏       | 30/138 [00:33<01:36,  1.12it/s]

Checking: cat4.txt : 1,431 words
found: 1 transliterated Greek words: ['discrimen']


 22%|██▏       | 31/138 [00:35<02:07,  1.19s/it]

Checking: cluentio.txt : 5,784 words
found: 2 transliterated Greek words: ['praetermitti', 'eis']


 23%|██▎       | 32/138 [00:36<01:53,  1.07s/it]

Checking: compet.txt : 1,764 words
found: 0 transliterated Greek words: []


 24%|██▍       | 33/138 [00:37<01:37,  1.07it/s]

Checking: consulatu.txt : 399 words
found: 0 transliterated Greek words: []


 25%|██▍       | 34/138 [00:38<01:34,  1.10it/s]

Checking: deio.txt : 1,849 words
found: 4 transliterated Greek words: ['animaduertisti', 'eis', 'praestitisti', 'discrimen']


 25%|██▌       | 35/138 [00:39<01:50,  1.07s/it]

Checking: divinatione1.txt : 5,124 words
found: 19 transliterated Greek words: ['sapientissumi', 'poÎtae', 'Ueios', 'superstitiosi', 'direxÏt', 'Proi', 'mantikh', 'Lacedaemonia', 'Lacedaemonii', 'poÎtam', 'melancholici', 'poÎtarum', 'Peloponneso', 'App', 'dein', 'discrimen', 'BoÎthum', 'integrÏ', 'uigÏlantes']


 26%|██▌       | 36/138 [00:41<02:04,  1.22s/it]

Checking: divinatione2.txt : 5,013 words
found: 22 transliterated Greek words: ['coŒrcenda', 'Ueios', 'philosophandi', 'sapientipotentes', 'superstitiosi', 'animaduertisti', 'poŒta', 'BoŒthus', 'zwdiakoi', 'filippizei', 'hippocentauri', 'Caune˝s', 'Ptol⁄maeus', 'dissimilitudo', 'Dissimilitudo', 'App', 'pot', 'yeudomeno', 'eis', 'poŒma', 'rizontei', 'Pythia']


 27%|██▋       | 37/138 [00:42<02:07,  1.26s/it]

Checking: domo.txt : 4,890 words
found: 8 transliterated Greek words: ['contempsisti', 'ero', 'transtulisti', 'interroganti', 'dein', 'praegustatori', 'constituisti', 'discrimen']


 28%|██▊       | 38/138 [00:43<01:56,  1.17s/it]

Checking: fam1.txt : 2,655 words
found: 3 transliterated Greek words: ['eis', 'praetermitti', 'to']


 28%|██▊       | 39/138 [00:44<01:57,  1.19s/it]

Checking: fam10.txt : 3,643 words
found: 5 transliterated Greek words: ['ep', 'ero', 'eis', 'upote', 'discrimen']


 29%|██▉       | 40/138 [00:45<01:53,  1.16s/it]

Checking: fam11.txt : 2,597 words
found: 6 transliterated Greek words: ['ep', 'ero', 'sxiamaxai', 'dissimilitudo', 'philosophoumena', 'ñrgano']


 30%|██▉       | 41/138 [00:46<01:52,  1.16s/it]

Checking: fam12.txt : 3,040 words
found: 4 transliterated Greek words: ['contempsisti', 'eyurrhmon∞sterow', 'ßm', 'discrimen']


 30%|███       | 42/138 [00:48<02:03,  1.29s/it]

Checking: fam13.txt : 3,208 words
found: 38 transliterated Greek words: ['Õxeow', 'ep', 'Mulaseïw', 'Éiaw', 'çcig≥nu', 'ﬂxìluce', 'treïw', 'ﬂn', 'sof≥w', 'Õllu', 'nefÉlh', 'Ephesi', 'Alabandeïw', 'pomnhmatismé', 'spoudeß', 'pote', 'sofistÊ', 'Öna', 'stêyessi', 'èss', 'xlei¥w', 'Ûstiw', 'peßroxo', 'πw', 'e', 'puyÉsyai', 'yumé', 'dioßxhsi', 'Õlximow', 'ge', 'ﬂssomÉnoisi', 'dioixêseiw', 'a', 'poloßmh', 'ﬂmé', 'èmmenai', 'èpeiye', 'hone']


 31%|███       | 43/138 [00:49<01:50,  1.16s/it]

Checking: fam14.txt : 1,351 words
found: 2 transliterated Greek words: ['xolÿ', 'Íxrato']


 32%|███▏      | 44/138 [00:50<01:44,  1.12s/it]

Checking: fam15.txt : 2,733 words
found: 15 transliterated Greek words: ['æd∞uw', 'communicauisti', 'Ètaraia', 'Èd∞spotoi', 'Epheso', 'Ariobarzanis', 'Èntimuxthrsai', 'Ariobarzanem', 'filxaloi', 'filodxaioi', 'dianohtixÂw', 'fil∆donoi', 'Íneu', 'ædonÿ', 'xal´w']


 33%|███▎      | 45/138 [00:51<01:38,  1.06s/it]

Checking: fam16.txt : 1,946 words
found: 16 transliterated Greek words: ['spoudÿ', 'diarr∆dh', 'xroniâtera', 'Alyzia', 'Èxopa', 'suiht∆sei', 'yorubopoiei', 'Èfwmlhsa', 'Íxuro', 'Èxnduna', 'tr\uf8ffci', 'polemiâtato', 'uerumtamen', 'xanê', 'filolagai', 'æmÁw']


 33%|███▎      | 46/138 [00:52<01:36,  1.05s/it]

Checking: fam2.txt : 2,286 words
found: 9 transliterated Greek words: ['politiktero', 'ero', 'times', 'kumikéw', 'Ariobarzanis', 'Ariobarzanem', 'ambulatiuncula', 'med', 'discrimen']


 34%|███▍      | 47/138 [00:53<01:32,  1.02s/it]

Checking: fam3.txt : 2,541 words
found: 12 transliterated Greek words: ['ero', 'ke', 'kai', 'Epheso', 'timÍsousi', 'to', 'Ephesi', 'hoi', 'beneuolentiori', 'emoige', 'malista', 'alloi']


 35%|███▍      | 48/138 [00:54<01:28,  1.01it/s]

Checking: fam4.txt : 2,340 words
found: 3 transliterated Greek words: ['proposuero', 'eis', 'esthai']


 36%|███▌      | 49/138 [00:55<01:29,  1.01s/it]

Checking: fam5.txt : 3,042 words
found: 10 transliterated Greek words: ['ep', 'ero', 'Ephesi', 'assentatiuncula', 'Homero', 'pote', 'praestitisti', 'praetermisisse', 'dein', 'discrimen']


 36%|███▌      | 50/138 [00:56<01:28,  1.01s/it]

Checking: fam6.txt : 2,688 words
found: 6 transliterated Greek words: ['retÉw', 'comprehendisti', 'ero', 'dr´ta', 'peregrinator', 'discrimen']


 37%|███▋      | 51/138 [00:57<01:30,  1.04s/it]

Checking: fam7.txt : 2,910 words
found: 17 transliterated Greek words: ['di≠rroia', 'xt∆sei', 'coŒmisse', 'parÎ', 'xr∆sei', 'iureconsulti', 'straggourixÎ', 'dusenterixÎ', 'the', 'e', 'praetermisisse', 'esyai', 'par≠gramma', 'sard≠nio', 'potious', 'discrimen', 'filoy∞uro']


 38%|███▊      | 52/138 [00:58<01:27,  1.02s/it]

Checking: fam8.txt : 2,623 words
found: 1 transliterated Greek words: ['perscripsisti']


 38%|███▊      | 53/138 [00:59<01:28,  1.05s/it]

Checking: fam9.txt : 3,039 words
found: 20 transliterated Greek words: ['eyurrhmon∆sei', 'ﬂulow', 'sumbiutÿ', 'prolegom∞naw', 'poprohgm∞no', 'animaduertendi', 'sophia', 'ßrxom∞nu', 'sof⁄w', 'parembeblhm∞noi', 'Ÿcimaye\uf8ffw', 'didasx≠lu', 'xatÎ', 'introduxisti', 'rou', 'dein', 'Ÿbeliei', 'discrimen', 'pofyegm≠tu', 'pollo']


 39%|███▉      | 54/138 [01:00<01:25,  1.01s/it]

Checking: fato.txt : 1,713 words
found: 6 transliterated Greek words: ['eis', 'Logike', 'dissimilitudo', 'sq', 'anteposuisti', 'dissentienti']


 40%|███▉      | 55/138 [01:01<01:23,  1.01s/it]

Checking: fin1.txt : 2,732 words
found: 5 transliterated Greek words: ['ero', 'eis', 'logikh', 'Homero', 'dein']


 41%|████      | 56/138 [01:02<01:33,  1.14s/it]

Checking: fin2.txt : 3,983 words
found: 15 transliterated Greek words: ['Gorgias', 'sofÒw', 'interrogandi', 'so', 'eis', 'times', 'sofÚw', 'Homero', 'chrysizo', 'instrumento', 'e', 'lapathe', 'dein', 'skoteinÒw', 'discrimen']


 41%|████▏     | 57/138 [01:03<01:29,  1.11s/it]

Checking: fin3.txt : 2,549 words
found: 23 transliterated Greek words: ['žmolog¤a', 'édiãforo', 'époprohgmšno', 'prohgmšna', 'kak¤a', 'apoproegmenis', 'katÒryusi', 'žrmÆ', 'dusxrhstÆmata', 'felÆmata', 'prohgmšno', 'dissimilitudo', 'poihtikã', 'blãmmata', 'éi¤a', 'e', 'fšlhma', 'kayƒkon', 'katÒryuma', 'pigennhmatikÒ', 'katalÆceiw', 'discrimen', 'telikã']


 42%|████▏     | 58/138 [01:04<01:29,  1.12s/it]

Checking: fin4.txt : 2,825 words
found: 7 transliterated Greek words: ['rmh', 'aequalÌ', 'ratÌo', 'fallaciloquae', 'poÎtarum', 'praeposÏtum', 'dissimilitudo']


 43%|████▎     | 59/138 [01:06<01:33,  1.18s/it]

Checking: fin5.txt : 3,659 words
found: 18 transliterated Greek words: ['poÎtice', 'rmh', 'enÌm', 'poÎtae', 'incommodatura', 'Calliphonti', 'offÌcium', 'times', 'amicÌtiae', 'aliquÌd', 'perdidiceriti', 'Ìnquam', 'gelastoi', 'inuestigazione', 'dissimilitudo', 'instrumento', 'e', 'ast˙tia']


 43%|████▎     | 60/138 [01:07<01:33,  1.20s/it]

Checking: flacco.txt : 4,245 words
found: 6 transliterated Greek words: ['Lacedaemonii', 'eis', 'applicauisti', 'rhetor', 'deprehendisti', 'discrimen']


 44%|████▍     | 61/138 [01:08<01:24,  1.09s/it]

Checking: fonteio.txt : 2,073 words
found: 4 transliterated Greek words: ['eis', 'apograph', 'intertrimenti', 'disceptatori']


 45%|████▍     | 62/138 [01:09<01:22,  1.08s/it]

Checking: fratrem1.txt : 2,941 words
found: 8 transliterated Greek words: ['Žsfal´w', 'Žll', 'Phaethonti', 'Ephesi', 'administranti', 'pai', 'praecipiendi', 'ŸryÎ']


 46%|████▌     | 63/138 [01:10<01:18,  1.05s/it]

Checking: fratrem2.txt : 2,371 words
found: 16 transliterated Greek words: ['ero', 'Sofoxl∞ouw', 'pragmatix´w', 'quicun', 'filalhy´w', 'Õperbolix´w', 'proüxonomhs≠mh', 'lixrin¢w', 'octophoro', 'e', 'mousop≠taxtow', 'ierg≠zetai', 'xaraxtÿr', 'Õpyesi', 'a', 'tl∆mu']


 46%|████▋     | 64/138 [01:11<01:15,  1.02s/it]

Checking: fratrem3.txt : 2,313 words
found: 36 transliterated Greek words: ['êmat', 'deut∞raw', 'oimôzetô', 'gnôthi', 'emmenai', 'ho', 'krinôsi', 'chalepênêi', 'elasôsi', 'mainetai', 'kai', 'llü', 'yetixâtero', 'gn´yi', 'to', 'hudôr', 'hoi', 'kotessamenos', 'par∞rgü', 'ouk', 'aristeuei', 'anektôs', 'chanoi', 'agorêi', 'moi', 'biêi', 'theô', 'allô', 'hote', 'opôrinôi', 'ek', 'alegontes', 'pn∞uw', 'ampôeis', 'kr∞ow', 'dikê']


 47%|████▋     | 65/138 [01:12<01:14,  1.02s/it]

Checking: haruspicum.txt : 3,189 words
found: 6 transliterated Greek words: ['deminutioq', 'circumsaepti', 'ßywa', 'pollutosq', 'a', 'discrimen']


 48%|████▊     | 66/138 [01:13<01:14,  1.04s/it]

Checking: imp.txt : 2,499 words
found: 7 transliterated Greek words: ['eis', 'peradulescenti', 'Ariobarzanes', 'Ariobarzanis', 'uestrßa', 'querimonias', 'discrimen']


 49%|████▊     | 67/138 [01:15<01:31,  1.29s/it]

Checking: inventione1.txt : 4,123 words
found: 6 transliterated Greek words: ['Gorgias', 'eis', 'rhetor', 'quinquepertita', 'dein', 'praecipiendi']


 49%|████▉     | 68/138 [01:17<01:40,  1.43s/it]

Checking: inventione2.txt : 4,258 words
found: 13 transliterated Greek words: ['sermocinandi', 'oportebitDefensor', 'uindicataRemotio', 'accusationemDefensor', 'rhetor', 'accommodabiturSemper', 'feceritSi', 'permansioTemperantia', 'absolutaContra', 'oportebitConcessio', 'inferendaConsuetudine', 'praecipiendi', 'facultatesBeneficia']


 50%|█████     | 69/138 [01:18<01:33,  1.35s/it]

Checking: leg1.txt : 2,547 words
found: 4 transliterated Greek words: ['eis', 'intellegendi', 'tis', 'pote']


 51%|█████     | 70/138 [01:19<01:28,  1.30s/it]

Checking: leg2.txt : 3,182 words
found: 7 transliterated Greek words: ['intercalandi', 'obtemperanto', 'mbon', 'pos', 'coluntoÎ', 'Amphiarai', 'discrimen']


 51%|█████▏    | 71/138 [01:20<01:21,  1.22s/it]

Checking: leg3.txt : 2,088 words
found: 6 transliterated Greek words: ['po', 'intercessori', 'nomofulakoi', 'obtemperandi', 'suntoÎ', 'discrimen']


 52%|█████▏    | 72/138 [01:21<01:10,  1.07s/it]

Checking: legagr1.txt : 1,275 words
found: 2 transliterated Greek words: ['eis', 'emes']


 53%|█████▎    | 73/138 [01:22<01:10,  1.08s/it]

Checking: legagr2.txt : 3,765 words
found: 11 transliterated Greek words: ['eis', 'praetereuntes', 'fraudulenti', 'dissimilitudo', 'Ueios', 'Pseudophilippo', 'conquisituri', 'emes', 'ßxuir', 'percensuisti', 'discrimen']


 54%|█████▎    | 74/138 [01:22<01:00,  1.06it/s]

Checking: legagr3.txt : 724 words
found: 1 transliterated Greek words: ['eis']


 54%|█████▍    | 75/138 [01:23<00:56,  1.11it/s]

Checking: lig.txt : 1,502 words
found: 2 transliterated Greek words: ['exterminandi', 'eis']


 55%|█████▌    | 76/138 [01:24<00:53,  1.16it/s]

Checking: marc.txt : 1,331 words
found: 2 transliterated Greek words: ['eis', 'M']


 56%|█████▌    | 77/138 [01:25<00:59,  1.02it/s]

Checking: milo.txt : 3,826 words
found: 8 transliterated Greek words: ['eis', 'times', 'P', 'exprompsisti', 'uidemusfingite', 'intermortuae', 'dein', 'prop']


 57%|█████▋    | 78/138 [01:27<01:03,  1.07s/it]

Checking: murena.txt : 3,983 words
found: 5 transliterated Greek words: ['po', 'Lacedaemonii', 'eis', 'Pseudophilippo', 'discrimen']


 57%|█████▋    | 79/138 [01:28<01:06,  1.13s/it]

Checking: nd1.txt : 3,864 words
found: 8 transliterated Greek words: ['inflammatos', 'Atheos', 'mediterranei', 'eita', 'philosophandi', 'mantike', 'doxas', 'tantun']


 58%|█████▊    | 80/138 [01:29<01:12,  1.26s/it]

Checking: nd2.txt : 5,441 words
found: 6 transliterated Greek words: ['Fainwn', 'superstitiosi', 'gh', 'Engonasi', 'hyei', 'dein']


 59%|█████▊    | 81/138 [01:31<01:10,  1.24s/it]

Checking: nd3.txt : 3,511 words
found: 7 transliterated Greek words: ['animaduertisti', 'eis', 'intellegendi', 'Sabazia', 'synpatheia', 'Melete', 'discrimen']


 59%|█████▉    | 82/138 [01:32<01:16,  1.37s/it]

Checking: off1.txt : 4,765 words
found: 7 transliterated Greek words: ['belligerantes', 'sophia', 'katheko', 'obtemperantes', 'philosophandi', 'praecipienti', 'discrimen']


 60%|██████    | 83/138 [01:33<01:12,  1.31s/it]

Checking: off2.txt : 3,351 words
found: 8 transliterated Greek words: ['Lacedaemonii', 'emas', 'circumueniri', 'App', 'dein', 'fraudulentos', 'pathe', 'discrimen']


 61%|██████    | 84/138 [01:35<01:11,  1.33s/it]

Checking: off3.txt : 3,913 words
found: 2 transliterated Greek words: ['eis', 'Troezene']


 62%|██████▏   | 85/138 [01:36<01:00,  1.14s/it]

Checking: optgen.txt : 831 words
found: 2 transliterated Greek words: ['eis', 'dithyrambici']


 62%|██████▏   | 86/138 [01:37<01:09,  1.33s/it]

Checking: orator.txt : 5,626 words
found: 21 transliterated Greek words: ['ethiko', 'Homero', 'uersutiloquas', 'Gorgias', 'e', 'eita', 'hypallage', 'logodaidalous', 'quadringenti', 'circumscripti', 'paean', 'praetereuntes', 'intellegendi', 'adulescentuli', 'rhetor', 'dissimilitudo', 'thesi', 'dein', 'dialogoi', 'eis', 'praecipiendi']


 63%|██████▎   | 87/138 [01:39<01:11,  1.40s/it]

Checking: oratore1.txt : 5,312 words
found: 12 transliterated Greek words: ['times', 'confitenture', 'stipulatiuncula', 'eita', 'praetereuntes', 'intellegendi', 'pragmatikoi', 'physikous', 'circumdedisti', 'discrimen', 'eis', 'instrumento']


 64%|██████▍   | 88/138 [01:41<01:22,  1.66s/it]

Checking: oratore2.txt : 7,130 words
found: 10 transliterated Greek words: ['interroganti', 'circumueniri', 'eita', 'dialektike', 'ero', 'intellegendi', 'Byzantii', 'discrimen', 'eis', 'emas']


 64%|██████▍   | 89/138 [01:43<01:19,  1.62s/it]

Checking: oratore3.txt : 5,468 words
found: 15 transliterated Greek words: ['efflorescenti', 'Hippias', 'eiti', 'uersutiloquas', 'Gorgias', 'transferendi', 'paean', 'intellegendi', 'dissimilitudo', 'commeministi', 'eis', 'lexeis', 'interpunctas', 'heroi', 'praecipiendi']


 65%|██████▌   | 90/138 [01:43<01:06,  1.38s/it]

Checking: paradoxa.txt : 2,011 words
found: 17 transliterated Greek words: ['êfru', 'times', 'égayÒ', 'leÊyerow', 'éretØ', 'parãdoia', 'ýsa', 'prÚw', 'kalÚ', 'sofÚw', 'concupiscenti', 'èmartÆmata', 'kaÐ', 'douloi', 'ploÊsiow', 'ma¤netai', 'discrimen']


 66%|██████▌   | 91/138 [01:45<01:02,  1.32s/it]

Checking: partitione.txt : 3,207 words
found: 4 transliterated Greek words: ['eis', 'animaduertendi', 'obtemperandi', 'praecipiendi']


 67%|██████▋   | 92/138 [01:45<00:53,  1.17s/it]

Checking: phil1.txt : 1,757 words
found: 1 transliterated Greek words: ['tis']


 67%|██████▋   | 93/138 [01:46<00:47,  1.05s/it]

Checking: phil10.txt : 1,365 words
found: 2 transliterated Greek words: ['interposuisti', 'discrimen']


 68%|██████▊   | 94/138 [01:47<00:43,  1.01it/s]

Checking: phil11.txt : 1,983 words
found: 2 transliterated Greek words: ['ero', 'eis']


 69%|██████▉   | 95/138 [01:48<00:40,  1.07it/s]

Checking: phil12.txt : 1,623 words
found: 3 transliterated Greek words: ['ero', 'rp', 'Seios']


 70%|██████▉   | 96/138 [01:49<00:39,  1.07it/s]

Checking: phil13.txt : 2,512 words
found: 1 transliterated Greek words: ['discrimen']


 70%|███████   | 97/138 [01:50<00:36,  1.12it/s]

Checking: phil14.txt : 1,665 words
found: 2 transliterated Greek words: ['eis', 'discrimen']


 71%|███████   | 98/138 [01:51<00:42,  1.05s/it]

Checking: phil2.txt : 4,277 words
found: 6 transliterated Greek words: ['conlocauisti', 'septemuiratu', 'reprehendisti', 'perstrinxisti', 'adscripsisti', 'obstrinxisti']


 72%|███████▏  | 99/138 [01:52<00:38,  1.02it/s]

Checking: phil3.txt : 1,814 words
found: 2 transliterated Greek words: ['eis', 'discrimen']


 72%|███████▏  | 100/138 [01:53<00:33,  1.13it/s]

Checking: phil4.txt : 783 words
found: 0 transliterated Greek words: []


 73%|███████▎  | 101/138 [01:53<00:33,  1.11it/s]

Checking: phil5.txt : 2,342 words
found: 4 transliterated Greek words: ['eis', 'utpote', 'Antesignanos', 'discrimen']


 74%|███████▍  | 102/138 [01:54<00:30,  1.19it/s]

Checking: phil6.txt : 986 words
found: 1 transliterated Greek words: ['discrimen']


 75%|███████▍  | 103/138 [01:55<00:28,  1.23it/s]

Checking: phil7.txt : 1,187 words
found: 1 transliterated Greek words: ['discrimen']


 75%|███████▌  | 104/138 [01:56<00:27,  1.23it/s]

Checking: phil8.txt : 1,509 words
found: 2 transliterated Greek words: ['adulescentuli', 'discrimen']


 76%|███████▌  | 105/138 [01:56<00:25,  1.28it/s]

Checking: phil9.txt : 898 words
found: 0 transliterated Greek words: []


 77%|███████▋  | 106/138 [01:58<00:29,  1.08it/s]

Checking: piso.txt : 4,384 words
found: 9 transliterated Greek words: ['contempsisti', 'delectamenta', 'eis', 'times', 'intellegenti', 'Byzantii', 'oichetai', 'dein', 'pronuntiauisti']


 78%|███████▊  | 107/138 [01:59<00:33,  1.07s/it]

Checking: plancio.txt : 3,823 words
found: 3 transliterated Greek words: ['eis', 'reprehendisti', 'discrimen']


 78%|███████▊  | 108/138 [02:00<00:30,  1.02s/it]

Checking: postreditum.txt : 1,943 words
found: 2 transliterated Greek words: ['quadringenti', 'discrimen']


 79%|███████▉  | 109/138 [02:01<00:27,  1.06it/s]

Checking: postreditum2.txt : 1,296 words
found: 0 transliterated Greek words: []


 80%|███████▉  | 110/138 [02:02<00:26,  1.07it/s]

Checking: prov.txt : 2,210 words
found: 3 transliterated Greek words: ['ero', 'Ariobarzanes', 'Byzantii']


 80%|████████  | 111/138 [02:03<00:26,  1.00it/s]

Checking: quinc.txt : 2,913 words
found: 2 transliterated Greek words: ['eis', 'circumueniri']


 81%|████████  | 112/138 [02:04<00:24,  1.07it/s]

Checking: rabirio.txt : 1,650 words
found: 2 transliterated Greek words: ['eis', 'discrimen']


 82%|████████▏ | 113/138 [02:04<00:22,  1.11it/s]

Checking: rabiriopost.txt : 1,862 words
found: 1 transliterated Greek words: ['eis']


 83%|████████▎ | 114/138 [02:06<00:23,  1.02it/s]

Checking: repub1.txt : 3,071 words
found: 6 transliterated Greek words: ['Lacedaemonii', 'so', 'Philolai', 'eita', 'dein', 'zerstört']


 83%|████████▎ | 115/138 [02:07<00:22,  1.01it/s]

Checking: repub2.txt : 2,600 words
found: 8 transliterated Greek words: ['po', 'ma', 'so', 'labefactandi', 'eis', 'duodeuiginti', 'Fremdzitat', 'latrocinandi']


 84%|████████▍ | 116/138 [02:07<00:20,  1.08it/s]

Checking: repub3.txt : 1,340 words
found: 1 transliterated Greek words: ['Lacedaemonii']


 85%|████████▍ | 117/138 [02:08<00:17,  1.22it/s]

Checking: repub4.txt : 171 words
found: 1 transliterated Greek words: ['Lacedaemonii']


 86%|████████▌ | 118/138 [02:09<00:15,  1.33it/s]

Checking: repub5.txt : 278 words
found: 0 transliterated Greek words: []


 86%|████████▌ | 119/138 [02:09<00:14,  1.35it/s]

Checking: repub6.txt : 1,182 words
found: 2 transliterated Greek words: ['quous', 'Homero']


 87%|████████▋ | 120/138 [02:10<00:13,  1.29it/s]

Checking: rosccom.txt : 1,860 words
found: 1 transliterated Greek words: ['eis']


 88%|████████▊ | 121/138 [02:11<00:13,  1.26it/s]

Checking: scauro.txt : 1,493 words
found: 3 transliterated Greek words: ['eis', 'tis', 'comperendinasti']


 88%|████████▊ | 122/138 [02:12<00:14,  1.08it/s]

Checking: senectute.txt : 3,401 words
found: 3 transliterated Greek words: ['Gorgias', 'eis', 'adulescentuli']


 89%|████████▉ | 123/138 [02:14<00:16,  1.12s/it]

Checking: sestio.txt : 5,454 words
found: 8 transliterated Greek words: ['transferendi', 'ero', 'gazis', 'discrimen', 'circumscribi', 'praecipitanti', 'deliberatori', 'nou']


 90%|████████▉ | 124/138 [02:15<00:17,  1.22s/it]

Checking: sex.rosc.txt : 4,214 words
found: 5 transliterated Greek words: ['interrogandi', 'proscriptos', 'eis', 'praetereuntes', 'discrimen']


 91%|█████████ | 125/138 [02:16<00:15,  1.22s/it]

Checking: sulla.txt : 3,348 words
found: 4 transliterated Greek words: ['eis', 'praecipitanti', 'inuestigasti', 'App']


 91%|█████████▏| 126/138 [02:18<00:14,  1.18s/it]

Checking: topica.txt : 2,498 words
found: 22 transliterated Greek words: ['étšxnouw', 'sterhtikã', 'eiti', 'dialektikó', 'suzug¤a', 'pagugó', 'eis', 'sxæmata', 'tumolog¤a', 'rhetor', 'dissimilitudo', 'Íperboló', 'dialektikæ', 'Ípòyesi', 'topikó', 'pròlhci', 'épofatikå', 'xaraktƒra', 'nyumæmata', 'nyÊmhma', 'krinòmeno', 'stãsiw']


 92%|█████████▏| 127/138 [02:19<00:14,  1.29s/it]

Checking: tusc1.txt : 4,467 words
found: 18 transliterated Greek words: ['imminetpropter', 'moriunturfortuna', 'apokartero', 'Clazomenas', 'Euthynous', 'Lacedaemonii', 'Hesiodo', 'suntalteros', 'gloriaetiamsi', 'Homero', 'rhetor', 'faciuntsomni', 'interroganti', 'dissentienti', 'confeceritreliquos', 'dein', 'ratiocinandi', 'endelecheia']


 93%|█████████▎| 128/138 [02:20<00:12,  1.23s/it]

Checking: tusc2.txt : 2,960 words
found: 2 transliterated Greek words: ['laudabiliora', 'eis']


 93%|█████████▎| 129/138 [02:21<00:11,  1.26s/it]

Checking: tusc3.txt : 3,315 words
found: 12 transliterated Greek words: ['luph', 'poŒta', 'paqoi', 'eis', 'timwroumenoi', 'paqh', 'Peloponneso', 'Bellerophonte', 'confœci', 'dein', 'offœcium', 'perfricuisti']


 94%|█████████▍| 130/138 [02:23<00:10,  1.27s/it]

Checking: tusc4.txt : 3,220 words
found: 21 transliterated Greek words: ['eflœcitur', 'paqoi', 'poŒtae', 'poŒtas', 'eis', 'uehementœus', 'paqh', 'signiiœcat', 'kathgorhmata', 'poŒticam', 'logika', 'obtemperantes', 'poŒtam', 'misanqrwpoi', 'grauidinosos', 'consideranti', 'poŒsis', 'recentœs', 'dein', 'insanœa', 'effœcienda']


 95%|█████████▍| 131/138 [02:24<00:09,  1.32s/it]

Checking: tusc5.txt : 4,054 words
found: 12 transliterated Greek words: ['poŒtae', 'Lacedaemonii', 'eis', 'poŒsi', 'Homero', 'sofoi', 'pote', 'interclusisti', 'poŒtam', 'tas', 'e', 'philosophandi']


 96%|█████████▌| 132/138 [02:25<00:07,  1.18s/it]

Checking: vatin.txt : 1,959 words
found: 3 transliterated Greek words: ['to', 'e', 'recognoscendi']


 96%|█████████▋| 133/138 [02:26<00:05,  1.10s/it]

Checking: ver1.txt : 2,015 words
found: 2 transliterated Greek words: ['interrogandi', 'eis']


 97%|█████████▋| 134/138 [02:28<00:05,  1.27s/it]

Checking: verres.2.1.txt : 5,069 words
found: 9 transliterated Greek words: ['Ephesi', 'circumueniri', 'pp', 'ero', 'nauigaueruntO', 'intertrimento', 'querimonias', 'discrimen', 'eis']


 98%|█████████▊| 135/138 [02:30<00:04,  1.48s/it]

Checking: verres.2.2.txt : 5,470 words
found: 6 transliterated Greek words: ['priuatirn', 'e', 'ciuitatestota', 'dein', 'querimonias', 'discrimen']


 99%|█████████▊| 136/138 [02:32<00:03,  1.67s/it]

Checking: verres.2.3.txt : 6,563 words
found: 16 transliterated Greek words: ['Leontinos', 'praetermitti', 'existimaturi', 'substituisti', 'Ephesi', 'adulescentuli', 'ßiil', 'med', 'constituisti', 'querimonias', 'discrimen', 'administrasti', 'eis', 'reprehendisti', 'instrumento', 'eme']


 99%|█████████▉| 137/138 [02:33<00:01,  1.69s/it]

Checking: verres.2.4.txt : 5,515 words
found: 9 transliterated Greek words: ['comperendinato', 'Archagatho', 'peruenissentne', 'circumueniri', 'peripetasmatis', 'querimonias', 'peripetasmata', 'intellexisti', 'instrumento']


100%|██████████| 138/138 [02:35<00:00,  1.77s/it]

Checking: verres.2.5.txt : 5,915 words
found: 6 transliterated Greek words: ['conlocauisti', 'octaphoro', 'pronuntiasti', 'discrimen', 'eis', 'interrogasti']
Number of Greek words not in training data: 844





### Note: ML is not failsafe like set intersection; a classifier predicts based on features that are aggregates of the sample space; e.g. notice the classifier is marking `non` as transliterated Greek even when it's not mismatched in the samples.

In [43]:
print('non' in distinct_good_latin)
print('non' in distinct_transliterated_greek_examples)
# TODO find a better example

True
False


### ...With a larger training set, this minor inconsistency would probably disappear.
#### However, the `-on` ending is common noun and adjective accusative declension in Greek, and the Latin word `non` is an anomaly.
#### Also, 98.5% accuracy still means 1.5% inaccuracy so there is a cost for AI assessments.
### Save the classifier for future use using the sklearn joblib library

In [44]:

def rle(inarray):
        """Run length encoding. Partial credit to R rle function. 
            Multi datatype arrays catered for including non Numpy
            returns: tuple (runlengths, startpositions, values) """
        ia = np.asarray(inarray)                  # force numpy
        n = len(ia)
        if n == 0: 
            return (None, None, None)
        else:
            y = np.array(ia[1:] != ia[:-1])     # pairwise unequal (string safe)
            i = np.append(np.where(y), n - 1)   # must include last element posi
            z = np.diff(np.append(-1, i))       # run lengths
            p = np.cumsum(np.append(0, z))[:-1] # positions
            return(z, p, ia[i])
            
# cribbed from stack overflow, needs annotations, and it's bit overkill but many possible uses

In [47]:
word='agathon'
classifier.predict(
            np.array([word_to_features(word, max_len)]))

array([1.])

### Find which texts of livy have the least greek

In [56]:
 
latin_files= glob.glob(
    os.path.expanduser('~/cltk_data/latin/text/latin_text_latin_library//livy/*'))


In [65]:
corpus_files = {}  
for full_path in tqdm(latin_files) :
    filename = full_path[full_path.rfind('/') + 1:]
    reader = get_corpus_reader(corpus_name='latin_text_latin_library', language='latin')
    reader._fileids =    [full_path]
    unseen_X = process_text_model.fit_transform(list(reader.sents()))
    distinct_unseen = distinct_words(unseen_X)    
    unseen_words = list(distinct_unseen)    
    total_words =[word
                for sentence in unseen_X
                for word in sentence]    
    arr = classifier.predict(
            np.array([word_to_features(word, max_len) for word in total_words]))
    total_greek_words = np.count_nonzero(arr)
    marks = arr.tolist()
    if marks:
        found_greek = [total_words[idx]
                       for idx, point in enumerate(marks)
                       if point == 1]
        corpus_files [filename]=  ( len(total_words), total_greek_words)
        print('file: {} total words: {:,} with {:,} transliterated Greek words: {}'.format(
            filename, len(total_words), total_greek_words, found_greek ))        





  0%|          | 0/178 [00:00<?, ?it/s][A[A[A[A



  1%|          | 1/178 [00:00<01:45,  1.67it/s][A[A[A[A

file: liv.per67.txt total words: 171 with 0 transliterated Greek words: []






  1%|          | 2/178 [00:01<01:42,  1.71it/s][A[A[A[A

file: liv.per128.txt total words: 50 with 0 transliterated Greek words: []






  2%|▏         | 3/178 [00:01<01:42,  1.71it/s][A[A[A[A

file: liv.per73.txt total words: 127 with 0 transliterated Greek words: []






  2%|▏         | 4/178 [00:02<01:41,  1.72it/s][A[A[A[A

file: liv.per114.txt total words: 92 with 0 transliterated Greek words: []






  3%|▎         | 5/178 [00:02<01:40,  1.73it/s][A[A[A[A

file: liv.per100.txt total words: 68 with 0 transliterated Greek words: []






  3%|▎         | 6/178 [00:03<01:39,  1.73it/s][A[A[A[A

file: liv.per98.txt total words: 116 with 0 transliterated Greek words: []






  4%|▍         | 7/178 [00:04<01:38,  1.74it/s][A[A[A[A

file: liv.per99.txt total words: 103 with 0 transliterated Greek words: []






  4%|▍         | 8/178 [00:04<01:37,  1.75it/s][A[A[A[A

file: liv.per101.txt total words: 77 with 0 transliterated Greek words: []






  5%|▌         | 9/178 [00:05<01:36,  1.76it/s][A[A[A[A

file: liv.per115.txt total words: 77 with 0 transliterated Greek words: []






  6%|▌         | 10/178 [00:05<01:35,  1.77it/s][A[A[A[A

file: liv.per72.txt total words: 70 with 0 transliterated Greek words: []






  6%|▌         | 11/178 [00:06<01:34,  1.77it/s][A[A[A[A

file: liv.per129.txt total words: 80 with 0 transliterated Greek words: []






  7%|▋         | 12/178 [00:06<01:32,  1.79it/s][A[A[A[A

file: liv.per66.txt total words: 41 with 0 transliterated Greek words: []






  7%|▋         | 13/178 [00:07<01:31,  1.80it/s][A[A[A[A

file: liv.per136-7.txt total words: 4 with 0 transliterated Greek words: []






  8%|▊         | 14/178 [00:07<01:31,  1.79it/s][A[A[A[A

file: liv.per70.txt total words: 151 with 1 transliterated Greek words: ['Ariobarzanes']






  8%|▊         | 15/178 [00:08<01:31,  1.78it/s][A[A[A[A

file: liv.per64.txt total words: 111 with 0 transliterated Greek words: []






  9%|▉         | 16/178 [00:09<01:32,  1.75it/s][A[A[A[A

file: liv.per103.txt total words: 140 with 0 transliterated Greek words: []






 10%|▉         | 17/178 [00:09<01:32,  1.75it/s][A[A[A[A

file: liv.per58.txt total words: 198 with 0 transliterated Greek words: []






 10%|█         | 18/178 [00:10<01:31,  1.76it/s][A[A[A[A

file: liv.per117.txt total words: 125 with 0 transliterated Greek words: []






 11%|█         | 19/178 [00:10<01:31,  1.75it/s][A[A[A[A

file: liv.per116.txt total words: 182 with 0 transliterated Greek words: []






 11%|█         | 20/178 [00:11<01:31,  1.74it/s][A[A[A[A

file: liv.per59.txt total words: 339 with 0 transliterated Greek words: []






 12%|█▏        | 21/178 [00:11<01:29,  1.75it/s][A[A[A[A

file: liv.per102.txt total words: 97 with 0 transliterated Greek words: []






 12%|█▏        | 22/178 [00:12<01:28,  1.76it/s][A[A[A[A

file: liv.per65.txt total words: 72 with 0 transliterated Greek words: []






 13%|█▎        | 23/178 [00:13<01:27,  1.78it/s][A[A[A[A

file: liv.per71.txt total words: 83 with 0 transliterated Greek words: []






 13%|█▎        | 24/178 [00:13<01:26,  1.77it/s][A[A[A[A

file: liv.per106.txt total words: 111 with 0 transliterated Greek words: []






 14%|█▍        | 25/178 [00:14<01:28,  1.73it/s][A[A[A[A

file: liv.per49.txt total words: 706 with 0 transliterated Greek words: []






 15%|█▍        | 26/178 [00:14<01:27,  1.73it/s][A[A[A[A

file: liv.per112.txt total words: 129 with 0 transliterated Greek words: []






 15%|█▌        | 27/178 [00:15<01:27,  1.73it/s][A[A[A[A

file: liv.per75.txt total words: 96 with 0 transliterated Greek words: []






 16%|█▌        | 28/178 [00:15<01:26,  1.73it/s][A[A[A[A

file: liv.per61.txt total words: 150 with 0 transliterated Greek words: []






 16%|█▋        | 29/178 [00:16<01:26,  1.73it/s][A[A[A[A

file: liv.per60.txt total words: 206 with 0 transliterated Greek words: []






 17%|█▋        | 30/178 [00:17<01:24,  1.75it/s][A[A[A[A

file: liv.per74.txt total words: 88 with 1 transliterated Greek words: ['Ariobarzanes']






 17%|█▋        | 31/178 [00:17<01:23,  1.75it/s][A[A[A[A

file: liv.per113.txt total words: 129 with 0 transliterated Greek words: []






 18%|█▊        | 32/178 [00:18<01:25,  1.71it/s][A[A[A[A

file: liv.per48.txt total words: 511 with 0 transliterated Greek words: []






 19%|█▊        | 33/178 [00:18<01:25,  1.70it/s][A[A[A[A

file: liv.per107.txt total words: 133 with 0 transliterated Greek words: []






 19%|█▉        | 34/178 [00:19<01:24,  1.71it/s][A[A[A[A

file: liv.per111.txt total words: 106 with 0 transliterated Greek words: []






 20%|█▉        | 35/178 [00:20<01:23,  1.71it/s][A[A[A[A

file: liv.per105.txt total words: 94 with 0 transliterated Greek words: []






 20%|██        | 36/178 [00:20<01:22,  1.72it/s][A[A[A[A

file: liv.per62.txt total words: 57 with 0 transliterated Greek words: []






 21%|██        | 37/178 [00:21<01:21,  1.74it/s][A[A[A[A

file: liv.per139.txt total words: 35 with 0 transliterated Greek words: []






 21%|██▏       | 38/178 [00:21<01:20,  1.74it/s][A[A[A[A

file: liv.per76.txt total words: 87 with 1 transliterated Greek words: ['Ariobarzanes']






 22%|██▏       | 39/178 [00:22<01:20,  1.73it/s][A[A[A[A

file: liv.per89.txt total words: 247 with 0 transliterated Greek words: []






 22%|██▏       | 40/178 [00:22<01:20,  1.72it/s][A[A[A[A

file: liv.per88.txt total words: 125 with 0 transliterated Greek words: []






 23%|██▎       | 41/178 [00:23<01:19,  1.73it/s][A[A[A[A

file: liv.per77.txt total words: 182 with 0 transliterated Greek words: []






 24%|██▎       | 42/178 [00:24<01:17,  1.75it/s][A[A[A[A

file: liv.per138.txt total words: 18 with 0 transliterated Greek words: []






 24%|██▍       | 43/178 [00:24<01:17,  1.74it/s][A[A[A[A

file: liv.per63.txt total words: 69 with 0 transliterated Greek words: []






 25%|██▍       | 44/178 [00:25<01:18,  1.71it/s][A[A[A[A

file: liv.per104.txt total words: 164 with 0 transliterated Greek words: []






 25%|██▌       | 45/178 [00:25<01:17,  1.71it/s][A[A[A[A

file: liv.per110.txt total words: 121 with 0 transliterated Greek words: []






 26%|██▌       | 46/178 [00:26<01:17,  1.71it/s][A[A[A[A

file: liv.per7.txt total words: 319 with 0 transliterated Greek words: []






 26%|██▋       | 47/178 [00:27<01:16,  1.70it/s][A[A[A[A

file: liv.per10.txt total words: 117 with 0 transliterated Greek words: []






 27%|██▋       | 48/178 [00:27<01:16,  1.70it/s][A[A[A[A

file: liv.per38.txt total words: 247 with 1 transliterated Greek words: ['Tolostobogios']






 28%|██▊       | 49/178 [00:28<01:15,  1.71it/s][A[A[A[A

file: liv.per39.txt total words: 205 with 0 transliterated Greek words: []






 28%|██▊       | 50/178 [00:28<01:14,  1.71it/s][A[A[A[A

file: liv.per11.txt total words: 198 with 0 transliterated Greek words: []






 29%|██▊       | 51/178 [00:29<01:14,  1.71it/s][A[A[A[A

file: liv.per6.txt total words: 107 with 0 transliterated Greek words: []






 29%|██▉       | 52/178 [00:29<01:14,  1.69it/s][A[A[A[A

file: liv.per4.txt total words: 187 with 0 transliterated Greek words: []






 30%|██▉       | 53/178 [00:30<01:12,  1.72it/s][A[A[A[A

file: liv.per13.txt total words: 166 with 0 transliterated Greek words: []






 30%|███       | 54/178 [00:32<01:56,  1.06it/s][A[A[A[A

file: liv.38.txt total words: 16,992 with 24 transliterated Greek words: ['dein', 'Ephesi', 'dein', 'dein', 'dein', 'Epheso', 'Byzantio', 'Hellesponti', 'dein', 'dein', 'Philopoemen', 'Lacedaemonii', 'Lacedaemonii', 'Lacedaemonii', 'Lacedaemonii', 'Philopoemenis', 'Lacedaemonii', 'Lacedaemonii', 'dein', 'Appropinquanti', 'Epheso', 'Epheso', 'Epheso', 'e']






 31%|███       | 55/178 [00:33<02:21,  1.15s/it][A[A[A[A

file: liv.10.txt total words: 15,286 with 29 transliterated Greek words: ['eis', 'eis', 'dein', 'discrimen', 'discrimen', 'quadringenti', 'eis', 'quadringenti', 'duodeuiginti', 'eis', 'discrimen', 'dein', 'eis', 'eis', 'discrimen', 'discrimen', 'dein', 'dein', 'circumuallaturi', 'circumagendi', 'dein', 'eis', 'integumento', 'eis', 'consentienti', 'quadringenti', 'duodeuiginti', 'quadringenti', 'quadringenti']






 31%|███▏      | 56/178 [00:35<02:38,  1.30s/it][A[A[A[A

file: liv.39.txt total words: 14,682 with 22 transliterated Greek words: ['discrimen', 'praecipitantes', 'transmontanos', 'discrimen', 'inhonoratos', 'Peloponneso', 'Lacedaemonii', 'Philopoemenis', 'Lacedaemonii', 'Lacedaemonii', 'Peloponneso', 'Lacedaemonii', 'Lacedaemonii', 'Lacedaemonii', 'querimonias', 'Lacedaemonii', 'Lacedaemonii', 'Peloponneso', 'Peloponneso', 'Philopoemenis', 'Peloponneso', 'Dentheletos']






 32%|███▏      | 57/178 [00:36<02:10,  1.08s/it][A[A[A[A

file: liv.per12.txt total words: 93 with 0 transliterated Greek words: []






 33%|███▎      | 58/178 [00:36<01:52,  1.07it/s][A[A[A[A

file: liv.per5.txt total words: 293 with 2 transliterated Greek words: ['Ueios', 'Ueios']






 33%|███▎      | 59/178 [00:37<01:39,  1.19it/s][A[A[A[A

file: liv.per1.txt total words: 444 with 0 transliterated Greek words: []






 34%|███▎      | 60/178 [00:39<02:08,  1.09s/it][A[A[A[A

file: liv.9.txt total words: 16,165 with 10 transliterated Greek words: ['eis', 'eis', 'discrimen', 'dein', 'circumueniri', 'dein', 'inclementiori', 'M', 'dein', 'dein']






 34%|███▍      | 61/178 [00:39<01:49,  1.07it/s][A[A[A[A

file: liv.per16.txt total words: 82 with 0 transliterated Greek words: []






 35%|███▍      | 62/178 [00:41<02:06,  1.09s/it][A[A[A[A

file: liv.29.txt total words: 12,385 with 19 transliterated Greek words: ['dissimulasti', 'dein', 'dein', 'praesidiarii', 'dein', 'dein', 'detulissentne', 'dein', 'Carthaginiensi', 'dein', 'Carthaginiensi', 'Oezalcem', 'Lacumaze', 'dein', 'Lacumazes', 'Oezalces', 'Lacumaze', 'Oezalcem', 'Carthaginiensi']






 35%|███▌      | 63/178 [00:42<02:30,  1.30s/it][A[A[A[A

file: liv.28.txt total words: 16,850 with 23 transliterated Greek words: ['eis', 'Carthaginiensi', 'eis', 'dein', 'quinquagenos', 'dein', 'eis', 'eis', 'dein', 'dein', 'eis', 'eis', 'circumspectantes', 'consociaturi', 'eis', 'discrimen', 'Carthaginiensi', 'renuntiaturi', 'eis', 'dissentienti', 'Carthaginiensi', 'discrimen', 'Carthaginiensi']






 36%|███▌      | 64/178 [00:43<02:04,  1.09s/it][A[A[A[A

file: liv.per17.txt total words: 115 with 0 transliterated Greek words: []






 37%|███▋      | 65/178 [00:45<02:21,  1.25s/it][A[A[A[A

file: liv.8.txt total words: 13,066 with 23 transliterated Greek words: ['eis', 'dein', 'eis', 'eis', 'eis', 'dein', 'discrimen', 'eis', 'eis', 'eis', 'quinquagenos', 'eis', 'eis', 'eis', 'Ueios', 'eis', 'eis', 'discrimen', 'dein', 'discrimen', 'eis', 'eis', 'antesignanos']






 37%|███▋      | 66/178 [00:45<01:58,  1.06s/it][A[A[A[A

file: liv.per2.txt total words: 453 with 0 transliterated Greek words: []






 38%|███▊      | 67/178 [00:46<01:41,  1.09it/s][A[A[A[A

file: liv.per29.txt total words: 378 with 0 transliterated Greek words: []






 38%|███▊      | 68/178 [00:46<01:29,  1.23it/s][A[A[A[A

file: liv.per15.txt total words: 65 with 0 transliterated Greek words: []






 39%|███▉      | 69/178 [00:47<01:20,  1.35it/s][A[A[A[A

file: liv.per14.txt total words: 117 with 0 transliterated Greek words: []






 39%|███▉      | 70/178 [00:48<01:15,  1.42it/s][A[A[A[A

file: liv.per28.txt total words: 272 with 0 transliterated Greek words: []






 40%|███▉      | 71/178 [00:48<01:10,  1.51it/s][A[A[A[A

file: liv.per3.txt total words: 257 with 0 transliterated Greek words: []






 40%|████      | 72/178 [00:49<01:08,  1.54it/s][A[A[A[A

file: liv.per25.txt total words: 215 with 0 transliterated Greek words: []






 41%|████      | 73/178 [00:50<01:36,  1.09it/s][A[A[A[A

file: liv.6.txt total words: 13,487 with 16 transliterated Greek words: ['eis', 'Ueios', 'Ueios', 'antesignanos', 'eis', 'Ueios', 'eis', 'dein', 'quous', 'eis', 'eis', 'eis', 'dein', 'praecipitauere', 'eis', 'eis']






 42%|████▏     | 74/178 [00:51<01:24,  1.23it/s][A[A[A[A

file: liv.per31.txt total words: 180 with 0 transliterated Greek words: []






 42%|████▏     | 75/178 [00:51<01:16,  1.35it/s][A[A[A[A

file: liv.per19.txt total words: 193 with 0 transliterated Greek words: []






 43%|████▎     | 76/178 [00:52<01:10,  1.45it/s][A[A[A[A

file: liv.per142.txt total words: 50 with 0 transliterated Greek words: []






 43%|████▎     | 77/178 [00:54<01:41,  1.01s/it][A[A[A[A

file: liv.26.txt total words: 17,088 with 25 transliterated Greek words: ['eis', 'qualemcun', 'transgredientes', 'duodeuiginti', 'dein', 'eis', 'quacun', 'eis', 'eis', 'eis', 'Lacedaemonii', 'eis', 'eis', 'praetereunti', 'quodcun', 'eis', 'dein', 'eis', 'dein', 'dein', 'eis', 'discrimen', 'eis', 'scorpionumsex', 'eis']






 44%|████▍     | 78/178 [00:55<01:50,  1.11s/it][A[A[A[A

file: liv.32.txt total words: 10,766 with 10 transliterated Greek words: ['dein', 'dein', 'dein', 'Peloponneso', 'Lacedaemonii', 'Megalopolitanos', 'times', 'eis', 'dein', 'descissehilocles']






 44%|████▍     | 79/178 [00:57<01:59,  1.20s/it][A[A[A[A

file: liv.33.txt total words: 11,631 with 11 transliterated Greek words: ['dein', 'dein', 'discrimen', 'transgredientes', 'antesignanos', 'Ephesi', 'Epheso', 'Epheso', 'supplementi', 'Peloponneso', 'Ephesi']






 45%|████▍     | 80/178 [00:58<02:17,  1.40s/it][A[A[A[A

file: liv.27.txt total words: 17,376 with 15 transliterated Greek words: ['Carthaginiensi', 'dein', 'Q', 'duodeuiginti', 'mediterranei', 'extraordinarii', 'dein', 'dein', 'interroganti', 'duodeuiginti', 'duodeuiginti', 'dein', 'uituperarentne', 'antesignanos', 'quadringenti']






 46%|████▌     | 81/178 [00:59<01:51,  1.15s/it][A[A[A[A

file: liv.per18.txt total words: 160 with 0 transliterated Greek words: []






 46%|████▌     | 82/178 [01:00<01:34,  1.02it/s][A[A[A[A

file: liv.per30.txt total words: 184 with 0 transliterated Greek words: []






 47%|████▋     | 83/178 [01:01<01:49,  1.15s/it][A[A[A[A

file: liv.7.txt total words: 13,288 with 12 transliterated Greek words: ['M', 'eis', 'discrimen', 'eis', 'alioquin', 'dein', 'reciperaturi', 'dein', 'L', 'existimaturi', 'antesignanos', 'instrumento']






 47%|████▋     | 84/178 [01:02<01:32,  1.02it/s][A[A[A[A

file: liv.per24.txt total words: 141 with 0 transliterated Greek words: []






 48%|████▊     | 85/178 [01:03<01:52,  1.21s/it][A[A[A[A

file: liv.5.txt total words: 16,315 with 63 transliterated Greek words: ['eis', 'ede', 'Ueios', 'Ueios', 'eis', 'Ueios', 'discrimen', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'discrimen', 'Ueios', 'Ueios', 'praedamnatos', 'transferendi', 'Ueios', 'praetereundi', 'Ueios', 'Ueios', 'praecipitauere', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'eis', 'Ueios', 'eis', 'extraordinarii', 'circumueniri', 'Ueios', 'Ueios', 'eis', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'Ueios']






 48%|████▊     | 86/178 [01:04<01:33,  1.02s/it][A[A[A[A

file: liv.per32.txt total words: 108 with 0 transliterated Greek words: []






 49%|████▉     | 87/178 [01:05<01:20,  1.13it/s][A[A[A[A

file: liv.per26.txt total words: 172 with 0 transliterated Greek words: []






 49%|████▉     | 88/178 [01:05<01:11,  1.26it/s][A[A[A[A

file: liv.per141.txt total words: 45 with 1 transliterated Greek words: ['transrhenanas']






 50%|█████     | 89/178 [01:07<01:33,  1.05s/it][A[A[A[A

file: liv.31.txt total words: 12,815 with 7 transliterated Greek words: ['quadringenti', 'L', 'eis', 'eis', 'Peloponneso', 'utpote', 'dein']






 51%|█████     | 90/178 [01:08<01:47,  1.22s/it][A[A[A[A

file: liv.25.txt total words: 14,686 with 15 transliterated Greek words: ['eis', 'eis', 'Ueios', 'discrimen', 'praecipitauere', 'circumsessuri', 'dein', 'dein', 'duodeuiginti', 'Lacedaemonii', 'dein', 'dein', 'dein', 'Carthaginiensi', 'dein']






 51%|█████     | 91/178 [01:10<01:55,  1.33s/it][A[A[A[A

file: liv.24.txt total words: 14,348 with 60 transliterated Greek words: ['eis', 'eis', 'eis', 'Carthaginiensi', 'Leontinos', 'eis', 'Carthaginiensi', 'eis', 'discrimen', 'duodeuiginti', 'eis', 'eis', 'eis', 'eis', 'dein', 'discrimen', 'quadringenti', 'eis', 'quacun', 'dein', 'dein', 'quaecun', 'eis', 'eis', 'eis', 'eis', 'eis', 'Leontinos', 'eis', 'Leontinos', 'Leontinos', 'Leontinos', 'eis', 'Leontinos', 'eis', 'eis', 'eis', 'dein', 'eis', 'eis', 'eis', 'eis', 'eis', 'Leontinos', 'Leontinos', 'machinamenta', 'dein', 'eis', 'dein', 'ero', 'dein', 'Leontinos', 'Leonta', 'eis', 'a', 'eis', 'eis', 'dein', 'dein', 'eis']






 52%|█████▏    | 92/178 [01:12<02:00,  1.41s/it][A[A[A[A

file: liv.30.txt total words: 13,632 with 21 transliterated Greek words: ['P', 'dein', 'dein', 'dein', 'instrumento', 'utpote', 'transiliendi', 'dein', 'dein', 'Carthaginiensi', 'dein', 'Carthaginiensi', 'interrogandi', 'dein', 'adpropinquanti', 'discrimen', 'discrimen', 'M', 'P', 'transferenti', 'Carthaginiensi']






 52%|█████▏    | 93/178 [01:12<01:38,  1.16s/it][A[A[A[A

file: liv.per140.txt total words: 34 with 0 transliterated Greek words: []






 53%|█████▎    | 94/178 [01:13<01:25,  1.02s/it][A[A[A[A

file: liv.per27.txt total words: 176 with 0 transliterated Greek words: []






 53%|█████▎    | 95/178 [01:13<01:13,  1.13it/s][A[A[A[A

file: liv.per33.txt total words: 107 with 0 transliterated Greek words: []






 54%|█████▍    | 96/178 [01:15<01:34,  1.15s/it][A[A[A[A

file: liv.4.txt total words: 16,830 with 15 transliterated Greek words: ['interroganti', 'discrimen', 'T', 'circumagenti', 'Ueios', 'Ueios', 'Ueios', 'Ueios', 'a', 'Ueios', 'antesignanos', 'Ueios', 'Ueios', 'Ueios', 'Ueios']






 54%|█████▍    | 97/178 [01:16<01:19,  1.02it/s][A[A[A[A

file: liv.per8.txt total words: 232 with 0 transliterated Greek words: []






 55%|█████▌    | 98/178 [01:16<01:08,  1.17it/s][A[A[A[A

file: liv.per37.txt total words: 142 with 0 transliterated Greek words: []






 56%|█████▌    | 99/178 [01:17<01:00,  1.30it/s][A[A[A[A

file: liv.per23.txt total words: 196 with 0 transliterated Greek words: []






 56%|█████▌    | 100/178 [01:19<01:21,  1.04s/it][A[A[A[A

file: liv.34.txt total words: 15,035 with 16 transliterated Greek words: ['intertrimenti', 'Peloponneso', 'Lacedaemonii', 'quadringenti', 'duodeuiginti', 'a', 'Menelai', 'pronuntiasti', 'Lacedaemonii', 'Lacedaemonii', 'Lacedaemonii', 'Lacedaemonii', 'Lacedaemonii', 'dein', 'dein', 'Ephesi']






 57%|█████▋    | 101/178 [01:20<01:36,  1.25s/it][A[A[A[A

file: liv.21.txt total words: 15,827 with 15 transliterated Greek words: ['querimonias', 'e', 'discrimen', 'quadringenti', 'Carthaginiensi', 'Carthaginiensi', 'quadringenti', 'Sedunoueragri', 'circumspectantes', 'reciperaturi', 'dein', 'dein', 'dein', 'dein', 'Q']






 57%|█████▋    | 102/178 [01:22<01:41,  1.33s/it][A[A[A[A

file: liv.35.txt total words: 12,675 with 16 transliterated Greek words: ['praetergressi', 'extraordinarii', 'extraordinarii', 'interequitantes', 'dein', 'Peloponneso', 'Ephesi', 'Epheso', 'antesignanos', 'Peloponneso', 'Philopoemenis', 'dein', 'Lacedaemonii', 'Chalcioecon', 'L', 'praegrauatura']






 58%|█████▊    | 103/178 [01:22<01:23,  1.11s/it][A[A[A[A

file: liv.per22.txt total words: 365 with 1 transliterated Greek words: ['circumposita']






 58%|█████▊    | 104/178 [01:23<01:09,  1.06it/s][A[A[A[A

file: liv.per36.txt total words: 55 with 0 transliterated Greek words: []






 59%|█████▉    | 105/178 [01:25<01:33,  1.28s/it][A[A[A[A

file: liv.1.txt total words: 17,350 with 8 transliterated Greek words: ['dein', 'discrimen', 'Peloponneso', 'dein', 'Ueios', 'circumueniri', 'a', 'existimaturi']






 60%|█████▉    | 106/178 [01:26<01:17,  1.08s/it][A[A[A[A

file: liv.pr.txt total words: 483 with 0 transliterated Greek words: []






 60%|██████    | 107/178 [01:26<01:06,  1.07it/s][A[A[A[A

file: liv.per9.txt total words: 262 with 0 transliterated Greek words: []






 61%|██████    | 108/178 [01:27<00:59,  1.18it/s][A[A[A[A

file: liv.per20.txt total words: 164 with 0 transliterated Greek words: []






 61%|██████    | 109/178 [01:29<01:31,  1.33s/it][A[A[A[A

file: liv.3.txt total words: 20,397 with 10 transliterated Greek words: ['dein', 'tumultuatos', 'discrimen', 'discrimen', 'C', 'dein', 'discrimen', 'interpellato', 'praeterequitantes', 'praecipitauere']






 62%|██████▏   | 110/178 [01:30<01:16,  1.12s/it][A[A[A[A

file: liv.per34.txt total words: 155 with 0 transliterated Greek words: []






 62%|██████▏   | 111/178 [01:32<01:36,  1.44s/it][A[A[A[A

file: liv.23.txt total words: 14,919 with 12 transliterated Greek words: ['eis', 'interroganti', 'M', 'quadringenti', 'Petelinos', 'antesignanos', 'ubicun', 'dein', 'dein', 'dein', 'dein', 'dein']






 63%|██████▎   | 112/178 [01:35<01:54,  1.74s/it][A[A[A[A

file: liv.37.txt total words: 16,390 with 32 transliterated Greek words: ['eis', 'Hellesponti', 'Ephesi', 'percunctanti', 'Ephesi', 'Hellesponti', 'Epheso', 'Ephesi', 'Epheso', 'Ephesi', 'Epheso', 'Epheso', 'discrimen', 'Hellesponti', 'Philopoemenis', 'Hellesponti', 'Epheso', 'Epheso', 'Teios', 'eis', 'dein', 'P', 'dein', 'dein', 'dein', 'quadringenti', 'Epheso', 'quadringenti', 'dein', 'Peloponneso', 'existimaturi', 'Epheso']






 63%|██████▎   | 113/178 [01:36<01:55,  1.78s/it][A[A[A[A

file: liv.36.txt total words: 11,411 with 17 transliterated Greek words: ['supplementi', 'eis', 'eis', 'Peloponneso', 'dein', 'eis', 'appropinquanti', 'Epheso', 'dein', 'utpote', 'Peloponneso', 'discrimen', 'eis', 'Ephesi', 'Ephesi', 'utpote', 'Ephesi']






 64%|██████▍   | 114/178 [01:39<02:06,  1.98s/it][A[A[A[A

file: liv.22.txt total words: 17,401 with 14 transliterated Greek words: ['uou', 'Carthaginiensi', 'dein', 'Ueios', 'instrumento', 'eis', 'dein', 'dein', 'appellauero', 'dein', 'eis', 'superincubanti', 'eis', 'recognoscendi']






 65%|██████▍   | 115/178 [01:40<01:46,  1.69s/it][A[A[A[A

file: liv.per35.txt total words: 188 with 1 transliterated Greek words: ['Ephesi']






 65%|██████▌   | 116/178 [01:44<02:28,  2.39s/it][A[A[A[A

file: liv.2.txt total words: 18,009 with 17 transliterated Greek words: ['dein', 'discrimen', 'dein', 'discrimen', 'praelatos', 'dein', 'ero', 'utpote', 'repraesentatas', 'dein', 'deliberabundi', 'dein', 'praecipitauere', 'Ueios', 'Ueios', 'dein', 'discrimen']






 66%|██████▌   | 117/178 [01:45<02:05,  2.05s/it][A[A[A[A

file: liv.per21.txt total words: 127 with 0 transliterated Greek words: []






 66%|██████▋   | 118/178 [01:46<01:47,  1.79s/it][A[A[A[A

file: liv.per46.txt total words: 206 with 0 transliterated Greek words: []






 67%|██████▋   | 119/178 [01:47<01:29,  1.51s/it][A[A[A[A

file: liv.per109.txt total words: 94 with 1 transliterated Greek words: ['dein']






 67%|██████▋   | 120/178 [01:48<01:20,  1.38s/it][A[A[A[A

file: liv.per52.txt total words: 272 with 0 transliterated Greek words: []






 68%|██████▊   | 121/178 [01:49<01:06,  1.16s/it][A[A[A[A

file: liv.per135.txt total words: 16 with 0 transliterated Greek words: []






 69%|██████▊   | 122/178 [01:50<00:58,  1.05s/it][A[A[A[A

file: liv.per121.txt total words: 42 with 0 transliterated Greek words: []






 69%|██████▉   | 123/178 [01:51<00:55,  1.02s/it][A[A[A[A

file: liv.per85.txt total words: 98 with 0 transliterated Greek words: []






 70%|██████▉   | 124/178 [01:52<00:51,  1.05it/s][A[A[A[A

file: liv.per91.txt total words: 40 with 0 transliterated Greek words: []






 70%|███████   | 125/178 [01:54<01:09,  1.32s/it][A[A[A[A

file: liv.45.txt total words: 13,274 with 13 transliterated Greek words: ['dein', 'discrimen', 'dein', 'Rhizoni', 'Antinous', 'Antinous', 'Rhizonitas', 'Rhizonitas', 'percunctanti', 'Peloponneso', 'eis', 'dein', 'eis']






 71%|███████   | 126/178 [01:59<02:04,  2.39s/it][A[A[A[A

file: liv.44.txt total words: 12,801 with 22 transliterated Greek words: ['Azorum', 'Hippias', 'Hippias', 'dein', 'dein', 'eis', 'eis', 'dein', 'existimastis', 'discrimen', 'Bylazora', 'dein', 'to', 'Pythoi', 'dein', 'dein', 'circumspectu', 'praetermisisse', 'pote', 'dein', 'Hippias', 'Amphipolitanos']






 71%|███████▏  | 127/178 [02:00<01:49,  2.15s/it][A[A[A[A

file: liv.per90.txt total words: 69 with 0 transliterated Greek words: []






 72%|███████▏  | 128/178 [02:02<01:46,  2.13s/it][A[A[A[A

file: liv.per84.txt total words: 115 with 0 transliterated Greek words: []






 72%|███████▏  | 129/178 [02:04<01:41,  2.06s/it][A[A[A[A

file: liv.per120.txt total words: 135 with 0 transliterated Greek words: []






 73%|███████▎  | 130/178 [02:06<01:31,  1.91s/it][A[A[A[A

file: liv.per134.txt total words: 46 with 0 transliterated Greek words: []






 74%|███████▎  | 131/178 [02:08<01:37,  2.07s/it][A[A[A[A

file: liv.per53.txt total words: 41 with 0 transliterated Greek words: []






 74%|███████▍  | 132/178 [02:10<01:34,  2.06s/it][A[A[A[A

file: liv.per108.txt total words: 81 with 0 transliterated Greek words: []






 75%|███████▍  | 133/178 [02:12<01:32,  2.06s/it][A[A[A[A

file: liv.per47.txt total words: 171 with 0 transliterated Greek words: []






 75%|███████▌  | 134/178 [02:15<01:38,  2.24s/it][A[A[A[A

file: liv.per.txt total words: 2 with 0 transliterated Greek words: []






 76%|███████▌  | 135/178 [02:17<01:36,  2.25s/it][A[A[A[A

file: liv.per51.txt total words: 158 with 0 transliterated Greek words: []






 76%|███████▋  | 136/178 [02:18<01:20,  1.91s/it][A[A[A[A

file: liv.per45.txt total words: 187 with 0 transliterated Greek words: []






 77%|███████▋  | 137/178 [02:19<01:07,  1.64s/it][A[A[A[A

file: liv.per122.txt total words: 35 with 0 transliterated Greek words: []






 78%|███████▊  | 138/178 [02:20<00:53,  1.34s/it][A[A[A[A

file: liv.per79.txt total words: 131 with 0 transliterated Greek words: []






 78%|███████▊  | 139/178 [02:21<00:43,  1.12s/it][A[A[A[A

file: liv.per92.txt total words: 75 with 0 transliterated Greek words: []






 79%|███████▊  | 140/178 [02:21<00:36,  1.04it/s][A[A[A[A

file: liv.per86.txt total words: 113 with 0 transliterated Greek words: []






 79%|███████▉  | 141/178 [02:22<00:31,  1.16it/s][A[A[A[A

file: liv.per87.txt total words: 34 with 0 transliterated Greek words: []






 80%|███████▉  | 142/178 [02:22<00:28,  1.26it/s][A[A[A[A

file: liv.per93.txt total words: 81 with 1 transliterated Greek words: ['dein']






 80%|████████  | 143/178 [02:23<00:27,  1.29it/s][A[A[A[A

file: liv.per78.txt total words: 65 with 0 transliterated Greek words: []






 81%|████████  | 144/178 [02:24<00:24,  1.39it/s][A[A[A[A

file: liv.per123.txt total words: 66 with 1 transliterated Greek words: ['dein']






 81%|████████▏ | 145/178 [02:24<00:23,  1.42it/s][A[A[A[A

file: liv.per44.txt total words: 141 with 0 transliterated Greek words: []






 82%|████████▏ | 146/178 [02:25<00:22,  1.39it/s][A[A[A[A

file: liv.per50.txt total words: 306 with 0 transliterated Greek words: []






 83%|████████▎ | 147/178 [02:26<00:22,  1.35it/s][A[A[A[A

file: liv.per127.txt total words: 105 with 0 transliterated Greek words: []






 83%|████████▎ | 148/178 [02:27<00:21,  1.42it/s][A[A[A[A

file: liv.per68.txt total words: 164 with 0 transliterated Greek words: []






 84%|████████▎ | 149/178 [02:27<00:19,  1.46it/s][A[A[A[A

file: liv.per133.txt total words: 78 with 0 transliterated Greek words: []






 84%|████████▍ | 150/178 [02:28<00:18,  1.50it/s][A[A[A[A

file: liv.per54.txt total words: 128 with 0 transliterated Greek words: []






 85%|████████▍ | 151/178 [02:28<00:18,  1.49it/s][A[A[A[A

file: liv.per40.txt total words: 200 with 0 transliterated Greek words: []






 85%|████████▌ | 152/178 [02:30<00:22,  1.17it/s][A[A[A[A

file: liv.43.txt total words: 5,693 with 5 transliterated Greek words: ['dein', 'tis', 'supplementi', 'dein', 'dein']






 86%|████████▌ | 153/178 [02:30<00:19,  1.28it/s][A[A[A[A

file: liv.per97.txt total words: 98 with 1 transliterated Greek words: ['dein']






 87%|████████▋ | 154/178 [02:31<00:17,  1.39it/s][A[A[A[A

file: liv.per83.txt total words: 139 with 0 transliterated Greek words: []






 87%|████████▋ | 155/178 [02:32<00:15,  1.49it/s][A[A[A[A

file: liv.per82.txt total words: 76 with 0 transliterated Greek words: []






 88%|████████▊ | 156/178 [02:32<00:14,  1.55it/s][A[A[A[A

file: liv.per96.txt total words: 100 with 0 transliterated Greek words: []






 88%|████████▊ | 157/178 [02:34<00:22,  1.08s/it][A[A[A[A

file: liv.42.txt total words: 16,858 with 13 transliterated Greek words: ['instrumento', 'eis', 'intemperanti', 'eis', 'Misagenen', 'ero', 'eis', 'Hippias', 'dein', 'Azorum', 'quadringenti', 'Hippias', 'circumsideri']






 89%|████████▉ | 158/178 [02:35<00:18,  1.08it/s][A[A[A[A

file: liv.per41.txt total words: 142 with 0 transliterated Greek words: []






 89%|████████▉ | 159/178 [02:35<00:15,  1.20it/s][A[A[A[A

file: liv.per55.txt total words: 217 with 0 transliterated Greek words: []






 90%|████████▉ | 160/178 [02:36<00:13,  1.34it/s][A[A[A[A

file: liv.per132.txt total words: 67 with 0 transliterated Greek words: []






 90%|█████████ | 161/178 [02:37<00:11,  1.42it/s][A[A[A[A

file: liv.per69.txt total words: 137 with 0 transliterated Greek words: []






 91%|█████████ | 162/178 [02:37<00:10,  1.52it/s][A[A[A[A

file: liv.per126.txt total words: 44 with 0 transliterated Greek words: []






 92%|█████████▏| 163/178 [02:38<00:09,  1.58it/s][A[A[A[A

file: liv.per130.txt total words: 74 with 0 transliterated Greek words: []






 92%|█████████▏| 164/178 [02:39<00:09,  1.40it/s][A[A[A[A

file: liv.per124.txt total words: 73 with 1 transliterated Greek words: ['dein']






 93%|█████████▎| 165/178 [02:39<00:09,  1.42it/s][A[A[A[A

file: liv.per43.txt total words: 79 with 0 transliterated Greek words: []






 93%|█████████▎| 166/178 [02:40<00:08,  1.48it/s][A[A[A[A

file: liv.per118.txt total words: 72 with 0 transliterated Greek words: []






 94%|█████████▍| 167/178 [02:41<00:08,  1.36it/s][A[A[A[A

file: liv.per57.txt total words: 173 with 0 transliterated Greek words: []






 94%|█████████▍| 168/178 [02:43<00:10,  1.05s/it][A[A[A[A

file: liv.40.txt total words: 14,714 with 14 transliterated Greek words: ['praetermitti', 'ero', 'eis', 'Dentheletos', 'dein', 'eis', 'quadringenti', 'quadringenti', 'quadringenti', 'quadringenti', 'quinquagenos', 'eis', 'illacrimasti', 'dein']






 95%|█████████▍| 169/178 [02:43<00:08,  1.09it/s][A[A[A[A

file: liv.per80.txt total words: 194 with 0 transliterated Greek words: []






 96%|█████████▌| 170/178 [02:44<00:06,  1.24it/s][A[A[A[A

file: liv.per94.txt total words: 43 with 0 transliterated Greek words: []






 96%|█████████▌| 171/178 [02:44<00:05,  1.36it/s][A[A[A[A

file: liv.per95.txt total words: 59 with 0 transliterated Greek words: []






 97%|█████████▋| 172/178 [02:45<00:04,  1.46it/s][A[A[A[A

file: liv.per81.txt total words: 41 with 0 transliterated Greek words: []






 97%|█████████▋| 173/178 [02:46<00:04,  1.20it/s][A[A[A[A

file: liv.41.txt total words: 7,612 with 5 transliterated Greek words: ['dein', 'eis', 'dein', 'tumultuosos', 'Peloponneso']






 98%|█████████▊| 174/178 [02:47<00:03,  1.33it/s][A[A[A[A

file: liv.per56.txt total words: 159 with 0 transliterated Greek words: []






 98%|█████████▊| 175/178 [02:47<00:02,  1.43it/s][A[A[A[A

file: liv.per119.txt total words: 151 with 0 transliterated Greek words: []






 99%|█████████▉| 176/178 [02:48<00:01,  1.51it/s][A[A[A[A

file: liv.per42.txt total words: 107 with 0 transliterated Greek words: []






 99%|█████████▉| 177/178 [02:48<00:00,  1.58it/s][A[A[A[A

file: liv.per125.txt total words: 69 with 0 transliterated Greek words: []






100%|██████████| 178/178 [02:49<00:00,  1.63it/s][A[A[A[A



[A[A[A[A

file: liv.per131.txt total words: 64 with 0 transliterated Greek words: []


In [66]:
print(len(corpus_files ))
rankings = [ (key, val[0], val[1], val[1]/val[0]) for key, val in corpus_files.items()]
rankings.sort(key=lambda x: x[3])
for rank in rankings:
    print (rank)

178
('liv.per67.txt', 171, 0, 0.0)
('liv.per128.txt', 50, 0, 0.0)
('liv.per73.txt', 127, 0, 0.0)
('liv.per114.txt', 92, 0, 0.0)
('liv.per100.txt', 68, 0, 0.0)
('liv.per98.txt', 116, 0, 0.0)
('liv.per99.txt', 103, 0, 0.0)
('liv.per101.txt', 77, 0, 0.0)
('liv.per115.txt', 77, 0, 0.0)
('liv.per72.txt', 70, 0, 0.0)
('liv.per129.txt', 80, 0, 0.0)
('liv.per66.txt', 41, 0, 0.0)
('liv.per136-7.txt', 4, 0, 0.0)
('liv.per64.txt', 111, 0, 0.0)
('liv.per103.txt', 140, 0, 0.0)
('liv.per58.txt', 198, 0, 0.0)
('liv.per117.txt', 125, 0, 0.0)
('liv.per116.txt', 182, 0, 0.0)
('liv.per59.txt', 339, 0, 0.0)
('liv.per102.txt', 97, 0, 0.0)
('liv.per65.txt', 72, 0, 0.0)
('liv.per71.txt', 83, 0, 0.0)
('liv.per106.txt', 111, 0, 0.0)
('liv.per49.txt', 706, 0, 0.0)
('liv.per112.txt', 129, 0, 0.0)
('liv.per75.txt', 96, 0, 0.0)
('liv.per61.txt', 150, 0, 0.0)
('liv.per60.txt', 206, 0, 0.0)
('liv.per113.txt', 129, 0, 0.0)
('liv.per48.txt', 511, 0, 0.0)
('liv.per107.txt', 133, 0, 0.0)
('liv.per111.txt', 106, 0, 0.0)


In [None]:
model_output_file = 'is_transliterated_greek.mdl.{}.joblib'.format(sklearn.__version__)
dump(classifier, model_output_file)

### Reconstitute the classifier for use at runtime

In [None]:
classifier = load(model_output_file)

### Some demo examples

In [None]:
classifier.predict(
    np.array([word_to_features(word, max_len) for word in ['quid', 'est', 'veritas']]))

In [None]:
classifier.predict(
    np.array([word_to_features(word, max_len) for word in ['ou', 'eis', 'panta', 'ton']]))

### Save model provenance 
#### Provenance -  a record of how, when, and with what the classifier was made with.
#### Important for allowing others to use the classifier in the future without rebuilding from scratch, or determining how it may be recreated for better or different performance

In [None]:
data_files = {}
idx = 0
for idx, file in enumerate(files):
    data_files[idx] = {"filename": file[file.rfind("/") + 1:],
                       "md5": md5(file)
                       }
data_files[idx + 1] = {
    "filename": transliterated_greek_file,
    "md5": md5(transliterated_greek_file)
}

provenance_file = '{}.prov.json'.format(model_output_file)

params = {
    "provenance_data": provenance_file,
    "date_created": str(datetime.datetime.now()),
    "model_parameters": mdl_params,
    "max_word_length": max_len,
    "num_samples": num_samples,
    "num_features": num_features,
    "library_version": sklearn.__version__,
    "classifier_class": "{}".format(str(classifier.__class__ )),
    "classifier_best_score": grids.best_score_,
    "data_files": data_files,
    "model_output_file": model_output_file,
    "model_output_md5": md5(model_output_file),
    "labels": label_encoder.classes_.tolist(),
    # manually added information
    "comment": "Transliterated Greek Classifier",
    "code_generated_by": "detect_transliterated_greek.ipynb",
    "feature_encoding_fun": "word_to_features",
    "author": "Todd Cook"
}

with open(provenance_file, 'wt') as writer:
    json.dump(params, writer, indent=2)
    print('Wrote provenance file: {}'.format(provenance_file))


# Appendix

### How about that model provenance file? It should be readable, here it is:

In [None]:
print(params)

# That's all folks!