# Baseline HuggingFace Pipeline

As an initial baseline, we should see how well the text summarization Seq2Seq models in HuggingFace are capable of. Let's load sentences from Framenet and use a pre-trained `t5` model for summarization. We will then compare the model's summarization with the frame definitions associated with that sentence. 

In [1]:
import json
import nltk

from transformers import pipeline
from nltk.corpus.reader import framenet

In [2]:
!ls ../

data  frame_seq2seq.py	notebooks  README.md  results


In [3]:
def load_fundraising_example():
    with open("../data/a_guide_to_seed_fundraising.json", "r") as f:
        sample = json.load(f)
    
    return sample

In [4]:
datapath = "/home/ygx/dat/fndata-1.7/"

In [5]:
fn = framenet.FramenetCorpusReader(datapath, fileids=None)

### Framenet Sentences

Framenet contains sentences along with their associated frames. These sentence lengths would be on the order of the length of search queuries a user would make on a natural language search engine. We can use this to compare our summarization models. 

In [6]:
sentences = fn.sents()

In [7]:
# Sample sentence and associated frame
for idx, sent in enumerate(sentences):
    print(f"\nSentence {idx}:\n\n\t{sent.text}")
    print(f"\nFrame:\n\n\t{sent.frame.name}")
    print(f"\nFrame definition:\n\n\t{sent.frame.definition}")
    if idx == 0:
        break


Sentence 0:

	The bank has abandoned all plans to finance roads or logging in Cameroon 's forests , in keeping with its ` stringent policy to protect the rights of indigenous people . "

Frame:

	Abandonment

Frame definition:

	An Agent leaves behind a Theme effectively rendering it no longer within their control or of the normal security as one's property.   'Carolyn abandoned her car and jumped on a red double decker bus.'  'Perhaps he left the key in the ignition'  'Abandonment of a child is considered to be a serious crime in many jurisdictions.'  There are also metaphorically used examples:  'She left her old ways behind .'


In [8]:
# Simplest and most opaque interface for summarization
summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base")

In [9]:
sample_sentences = [sentences[idx].text for idx in range(10)]
sample_frames = [sentences[idx].frame.name for idx in range(10)]

In [10]:
sample_summaries = []
for sample in sample_sentences:
    summary = summarizer(sample, min_length=5, max_length=20)
    sample_summaries.append(summary)

In [11]:
sample_summaries

[[{'summary_text': "bank has abandoned all plans to finance roads or logging in Cameroon's forests"}],
 [{'summary_text': 'Stevenson and a friend took a Nova car from North Skelton and abandoned it'}],
 [{'summary_text': 'she had seen no reason to abandon it when she came to Medewich two years ago'}],
 [{'summary_text': 'as a result of other priorities following the fall of France in June 1940 , the project'}],
 [{'summary_text': 'Leeds Education Authority has abandoned plans for  drastic " cuts in travel subsidies for some Catholic students'}],
 [{'summary_text': 'he abandoned plans of working in missionary field and offered his services to the Netherlands Indies'}],
 [{'summary_text': 'the council later abandoned its plans to widen the highway . the reversion passed to'}],
 [{'summary_text': 'Um Al-Farajh was finally abandoned by the Palestinian Arabs in 1948 .'}],
 [{'summary_text': 'Lola is waiting loyally , if not faithfully, for the lover ,'}],
 [{'summary_text': 'pistol found in

In [12]:
sample_frames

['Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment',
 'Abandonment']

### Fundraising Example

Using the same `t5` model, we can test the summary on the fundraiser sample data.

In [13]:
sample = load_fundraising_example()

In [14]:
summary = summarizer(sample["text"], min_length=5, max_length=20)

Token indices sequence length is longer than the specified maximum sequence length for this model (6719 > 512). Running this sequence through the model will result in indexing errors


In [15]:
summary

[{'summary_text': 'if you can raise as much money as you need, you should get the investor’s'}]

In [16]:
sample["text"]

"Why Raise Money - When to Raise Money - How Much to Raise? - Financing Options - Convertible Debt - Safe - Equity - Valuation - Investors\nIntroduction\nStartup companies need to purchase equipment, rent offices, and hire staff. More importantly, they need to grow. In almost every case they will require outside capital to do these things.\nThe initial capital raised by a company is typically called “seed” capital. This brief guide is a summary of what startup founders need to know about raising the seed funds critical to getting their company off the ground.\nThis is not intended to be a complete guide to fundraising. It includes only the basic knowledge most founders will need. The information comes from my experiences working at startups, investing in startups, and advising startups at Y Combinator and Imagine K12. YC partners naturally gain a lot of fundraising experience and YC founder Paul Graham (PG) has written extensively on the topic 1, 2, 3, 4. His essays cover in more detai