### In-Context Cross-lingual Transfer.
Inference example notebook.

In [2]:
## import libraries
from transformers import AutoTokenizer, MT5ForConditionalGeneration, TrainingArguments
from peft import PeftModel
from datasets import Dataset
import numpy as np

from src.data_handling import get_class_set, get_kshot_dataset, get_class_objects
from src.ic_xlt_utils import create_icl_dataset, run_inference, compute_metrics

In [3]:
## instance model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('google/mt5-large')

base_model = MT5ForConditionalGeneration.from_pretrained('google/mt5-large')

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


### Load data
Also, create One-Shot adaptation dataset from the training set on the target language.

In [4]:
data_dir = 'data/massive'

## load datasets of target languages and select one-shot from training data

target_languages = ['english','azeri','turkish','swahili']

tl_datasets = {}

for tl in target_languages:
    
    tl_datasets[tl] = {
        'test': Dataset.load_from_disk('/'.join([data_dir,'test',tl])) ## to evaluate
    }
    
    ## load training to retrieve the one-shot demonstration and reduce to one-shot
    
    tl_datasets[tl]['train'] = get_kshot_dataset(
        dataset = Dataset.load_from_disk('/'.join([data_dir,'train',tl])),
        k = 1, # One-Shot per label
        seed = 42, # Seed to select shots
    )
    
    tl_datasets[tl]['class_set'],tl_datasets[tl]['lbl2id_class'], _ = get_class_objects(tl_datasets[tl]['train'],tl_datasets[tl]['test'])


Selecting 1-shots per label for train...
Length of reduced training dataset: 18
Selecting 1-shots per label for train...
Length of reduced training dataset: 18
Selecting 1-shots per label for train...
Length of reduced training dataset: 18
Selecting 1-shots per label for train...
Length of reduced training dataset: 18


### Inference on target languages / IC-XLT

In-Context Cross-lingual Transfer evaluated in a given target language.<br>

In [5]:
## load lora trained with ICT

path_lora_ict = 'trained_loras/massive/ict_m10/' # or 'trained_loras/acd/ict_m10/' to load the ACD trained model

model = PeftModel.from_pretrained(base_model,path_lora_ict) 

In [7]:
## iterate and predict/evaluate over target languages
metrics_per_lang = {}

for tl in target_languages:
    
    print(f'Running inference on {tl}')

    ## prepend one-shot demonstration in context
    icl_dataset_test = create_icl_dataset(dataset_test = tl_datasets[tl]['test'], dataset_train = tl_datasets[tl]['train'])
    
    ## predict samples
    predicted_icxlt = run_inference(
                model = model,
                tokenizer = tokenizer,
                test_texts = icl_dataset_test['text'],
                class_set = tl_datasets[tl]['class_set'],
            )
    
    ## evaluate samples
    metrics_per_lang[tl] = compute_metrics(
        predicted_labels = predicted_icxlt,
        truth_labels = tl_datasets[tl]['test']['label'],
        class_set = tl_datasets[tl]['class_set'],
        lbl2id_class = tl_datasets[tl]['lbl2id_class']
    )

Running inference on english


  0%|          | 0/372 [00:00<?, ?it/s]



Running inference on azeri


  0%|          | 0/372 [00:00<?, ?it/s]



Running inference on turkish


  0%|          | 0/372 [00:00<?, ?it/s]



Running inference on swahili


  0%|          | 0/372 [00:00<?, ?it/s]



In [8]:
print('IC-XLT\nF1 micro in target languages (with English as source)')
for tl in target_languages:
    print('\t{} : {}'.format(tl,metrics_per_lang[tl]['f1_score_micro']))

IC-XLT
F1 micro in target languages (with English as source)
	english : 0.8920645595158037
	azeri : 0.8140551445864156
	turkish : 0.8437867832520598
	swahili : 0.7915265635507733


### Inference on target languages / ZS-XLT

Zero/Few Cross-lingual Transfer (fine-tuned) evaluated in a given target language.<br>
To evaluate 1S/8S-XLT, we just continue fine-tuning the checkpoint with the reduced training dataset.

In [10]:
## load model trained with prompt-based fine-tuning 
path_lora_pft = 'trained_loras/massive/pft/' # or 'trained_loras/acd/pbt/' to load the ACD trained model
model = PeftModel.from_pretrained(base_model,path_lora_pft) 

In [11]:
## iterate and predict/evaluate over target languages

metrics_per_lang = {}

for tl in target_languages:
    
    print(f'Running inference on {tl}')

    ## predict samples
    predicted_zsxlt = run_inference(
                model = model,
                tokenizer = tokenizer,
                test_texts = tl_datasets[tl]['test']['text'],
                class_set = tl_datasets[tl]['class_set'],
            )
    
    ## evaluate samples
    metrics_per_lang[tl] = compute_metrics(
        predicted_labels = predicted_zsxlt,
        truth_labels = tl_datasets[tl]['test']['label'],
        class_set = tl_datasets[tl]['class_set'],
        lbl2id_class = tl_datasets[tl]['lbl2id_class']
    )

Running inference on english


  0%|          | 0/372 [00:00<?, ?it/s]



Running inference on azeri


  0%|          | 0/372 [00:00<?, ?it/s]



Running inference on turkish


  0%|          | 0/372 [00:00<?, ?it/s]



Running inference on swahili


  0%|          | 0/372 [00:00<?, ?it/s]



In [12]:
print('ZS-XLT\nF1 micro in target languages (with English as source)')
for tl in target_languages:
    print('\t{} : {}'.format(tl,metrics_per_lang[tl]['f1_score_micro']))

ZS-XLT
F1 micro in target languages (with English as source)
	english : 0.8954270342972428
	azeri : 0.7007397444519166
	turkish : 0.7683254875588433
	swahili : 0.6395427034297243
