# Education Review Sentiment

Here's the first go at the sentiment model in both English and Albanian.  The configuration files with the hyperparameters are in the `./models` directory.  However, note that you can *override* these settings directly in this Juypter notebook (and this is done below).

In [None]:
# environemnt configuration and set up: add this (deepnlp) library to the Python path and framework entry point
from mngfac import JupyterManagerFactory
fac = JupyterManagerFactory()
mng = fac()
# set facade defaults
fd = {'lang': 'en', 'embedding_name': 'glove_50', 'model': 'wordvec'}

In [None]:
# useful reporting functions
def verify_configuration():
    """Verify the configuration for transformer model."""
    print('config:')
    print(' ', facade.config['batch_stash']['decoded_attributes'])
    facade.config['transformer_trainable_embedding'].write(1)
    facade.config['transformer_trainable_resource'].write(1)

def verify_dataset():
    """Verify the sentiment dataset splits."""
    print('dataset:')
    facade.dataset_stash.write(1)
    stash = facade.batch_stash
    key_cont = stash.split_stash_container.split_container
    key_cont.stratified_write = True
    key_cont.write(1)
    batch: Batch = next(iter(stash.values()))
    point: LabeledFeatureDocumentDataPoint = batch.data_points[0]
    print(f'  sample: {point.doc}')

batch_clear_enabled: bool = True
    
def clear_batch():
    """Remove previous batches in case a new model is used for
    which HuggingFace Tokenizer IDs differ.
    
    """
    if batch_clear_enabled:
        from zensols.util.log import loglevel
        with loglevel('zensols.persist.composite'):
            facade.batch_stash.clear()

## Confirm stratified splits

In [None]:
facade = mng.create_facade(**fd)
verify_configuration()
verify_dataset()

# English with Word Vectors

We start with GloVE 50D.

In [None]:
# train and test
mng.run()

## Add review features

This adds the *emotion* and *topic* (course subject) as features to the model.

In [None]:
facade.epochs = int(facade.epochs * 1.5)
facade = mng.create_facade(**fd)
facade.batch_stash.decoded_attributes.update('emotion topic'.split())
mng.run()
facade.persist_result()

## fastText news embeddings

In [None]:
fd['embedding_name'] = 'fasttext_news_300'
facade = mng.create_facade(**fd)
facade.batch_stash.decoded_attributes.update(set('emotion topic'.split()))
mng.run()
facade.persist_result()

## Multilingual BERT English

Note that the hyperparameters were tuned for Albanian.  However, we still get decent results given we don't add *topic* or *emotion*.

In [None]:
fd = {'lang': 'en', 'model': 'transformer'}
facade = mng.create_facade(**fd)
mng.run()
facade.persist_result()

## Add back review features to the BERT model

This adds the *emotion* and *topic* (course subject) as features to the model.

In [None]:
facade = mng.create_facade(**fd)
facade.batch_stash.decoded_attributes.update('emotion topic'.split())
mng.run()
facade.persist_result()

# Albianian with Multilingual BERT

Use the pretrained Multilingual BERT model.  First, verify the configuration and report the stratified dataset statistics.

In [None]:
fd = {'lang': 'sq',
      'model': 'transformer',
      'transformer_trainable_resource_model_id': 'bert-base-multilingual-cased'}
facade = mng.create_facade(**fd)
# remove previous batches (see function doc)
clear_batch()
# make sure we're using Multilingual bert
verify_configuration()
# make sure we have the Albanian dataset
verify_dataset()
mng.run()
facade.persist_result()

## Albanian with Multilingual Roberta

A multilingual model that's been trained with Albanian

In [None]:
fd = {'lang': 'sq',
      'model': 'transformer',
      'transformer_trainable_resource_model_id': 'xlm-roberta-base'}
facade = mng.create_facade(**fd)
# remove previous batches (see function doc)
clear_batch()
# make sure we're using Multilingual bert
verify_configuration()
# make sure we have the Albanian dataset
verify_dataset()
mng.run()
facade.persist_result()

## Pretrained Albanian model last epoch

In [None]:
# now try the pretrained Albanian model (last epoch)
fd = {'lang': 'sq',
      'model': 'sq-transformer',
      'mm_name': 'model',
      'checkpoint': 'checkpoint-1855728'}
facade = mng.create_facade(**fd)
# remove previous batches (see function doc)
clear_batch()
# make sure we're using Multilingual bert
verify_configuration()
# make sure we have the Albanian dataset
verify_dataset()
mng.run()
facade.persist_result()

# Try adding emotion and topic features

This helped English more than it did Albanian.

In [None]:
fd['name'] = 'albanian add emotion topic'
facade = mng.create_facade(**fd)
facade.batch_stash.decoded_attributes.update(set('emotion topic'.split()))
mng.run()
facade.persist_result()

# Compare results

Generate a dataframe with the performance metrics of the previous runs.

In [None]:
from zensols.deeplearn.result import ModelResultManager, ModelResultReporter
rm: ModelResultManager = facade.result_manager
reporter = ModelResultReporter(rm, include_validation=False)
df = reporter.dataframe.drop(columns=['file'])
df.to_csv('../sentiment-results.csv')
df