# Evaluating our Fine Tuned T5 Model

Let's use the same `transformers` pipeline to evaluate the `t5` model we fine tuned on the framenet data. After training for only a few epochs, we find that the summarization is drastically different from the base model. This seems to indicate that these language models are very sensitive to their training data. 

In [1]:
import json

from transformers import pipeline
from nltk.corpus.reader import framenet

In [2]:
def load_fundraising_example():
    with open("../data/a_guide_to_seed_fundraising.json", "r") as f:
        sample = json.load(f)
    
    return sample

### Framenet Data

Load the framenet sentences and evaluate the fine tuned model after training for a few epochs.

In [3]:
datapath = "/home/ygx/dat/fndata-1.7"

In [4]:
fn = framenet.FramenetCorpusReader(datapath, fileids=None)
sentences = fn.sents()

In [5]:
# checkpoints after only a few epochs 
!ls ../results/summarization/

checkpoint-1000  checkpoint-500  checkpoint-7000


In [6]:
checkpoint = "../results/summarization/checkpoint-7000"

In [7]:
summarizer = pipeline(
    "summarization", model=checkpoint, tokenizer=checkpoint
)

In [8]:
sample_sentences = [sentences[idx].text for idx in range(10, 20)]
sample_frames = [sentences[idx].frame.name for idx in range(10, 20)]

In [9]:
sample_summaries = []
for sample in sample_sentences:
    summary = summarizer(sample, min_length=5, max_length=20)
    sample_summaries.append(summary)

Your max_length is set to 20, but you input_length is only 13. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)


In [10]:
sample_summaries

[[{'summary_text': 'This frame covers words for Body_part(s) (BP) belonging to a Pos'}],
 [{'summary_text': 'The adjectives and nouns in this frame describe an Experiencer who is feeling or experiencing'}],
 [{'summary_text': 'The adjectives and nouns in this frame describe an Experiencer who is feeling or experiencing'}],
 [{'summary_text': 'This frame covers words for Body_part(s) (BP) belonging to a Pos'}],
 [{'summary_text': 'This frame covers words for Body_part(s) (BP) belonging to a Pos'}],
 [{'summary_text': 'This frame covers words for Body_part(s) (BP) belonging to a Pos'}],
 [{'summary_text': 'The adjectives and nouns in this frame describe an Experiencer who is feeling or experiencing'}],
 [{'summary_text': 'The adjectives and nouns in this frame describe an Experiencer who is feeling or experiencing'}],
 [{'summary_text': 'This frame covers entities which are prototypically conceived of and created to fulfill the function of'}],
 [{'summary_text': 'The adjectives and nouns

In [17]:
sample_sentences[:2]

['That project will then be abandoned in favour of some other quickly completable component elsewhere , so that even relatively minor works can take years to finish .',
 "What this argument suggests in Gandhi 's case is that he does not abandon his commitment to the principle of non-violence or qualify it in any way when he approves the destruction of life ."]

In [11]:
sample_frames

['Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment']

### Test Fundraising Summary

In [12]:
sample = load_fundraising_example()

In [13]:
summary = summarizer(sample["text"], min_length=5, max_length=20)

Token indices sequence length is longer than the specified maximum sequence length for this model (6719 > 512). Running this sequence through the model will result in indexing errors


In [14]:
summary

[{'summary_text': 'This frame covers entities which are prototypically conceived of and created to fulfill the function of'}]