# REBEL

Using [Rebel](https://aclanthology.org/2021.findings-emnlp.204/), [tutorial](https://medium.com/nlplanet/building-a-knowledge-base-from-texts-a-full-practical-example-8dbbffb912fa). 

In [1]:
#%pip install -q transformers pyvis

Note: you may need to restart the kernel to use updated packages.


In [1]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import math
import torch
import IPython
from pyvis.network import Network

In [2]:
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Babelscape/rebel-large")
model = AutoModelForSeq2SeqLM.from_pretrained("Babelscape/rebel-large")

# REBEL model

REBEL is an autoregressive seq2seq model built by fine-tuning BART on custom Wikipedia abstracts dataset - filtered using RoBERTa. The authors consider relation extraction and classification as a generation task. More specifically, a translation task where the model learns to translate from a sentence containing the entities and relations to the *(e,r,e)* triplets. The triplets is the token sequence that the model needs to decode.

## Model output

The REBEL model outputs a string with tokens to signify entities and relations extracted as triplets:
- **Input:** “This Must Be the Place” is a song by new wave band Talking Heads, released in November 1983 as the second single from its fifth album “Speaking in Tongues”.
- **Output:** <`triplet`> This Must Be the Place <`subj`> Talking Heads <`obj`> performer <`subj`> Speaking in Tongues <`obj`> part of <`triplet`> Talking Heads <`subj`> new wave <`obj`> genre <`triplet`> Speaking in Tongues <`subj`> Talking Heads <`obj`> performer

The `triplet` tag signifies the start of a new triplet, followed by the first entity of the triplet. The `subj` tag signifies start of the second entity of the triplet. Following the object, the `obj` tag signifies the start of the relation of the triplet.

## From short text to Knowledge Base

The next step is to write a function that is able to parse the strings generated by REBEL and transform them into relation triplets (e.g. the <Fabio, lives in, Italy> triplet). This function must take into account additional new tokens (i.e. the <triplet> , <subj>, and <obj> tokens) used while training the model. Fortunately, the [REBEL hugging-face model card](https://huggingface.co/Babelscape/rebel-large) provides us with a complete code example for this function, which we’ll use as-is.

In [3]:
def extract_relations_from_model_output(text):
    relations = []
    relation, subject, relation, object_ = '', '', '', ''
    text = text.strip()
    current = 'x'
    text_replaced = text.replace("<s>", "").replace("<pad>", "").replace("</s>", "")
    for token in text_replaced.split():
        if token == "<triplet>":
            current = 't'
            if relation != '':
                relations.append({
                    'head': subject.strip(),
                    'type': relation.strip(),
                    'tail': object_.strip()
                })
                relation = ''
            subject = ''
        elif token == "<subj>":
            current = 's'
            if relation != '':
                relations.append({
                    'head': subject.strip(),
                    'type': relation.strip(),
                    'tail': object_.strip()
                })
            object_ = ''
        elif token == "<obj>":
            current = 'o'
            relation = ''
        else:
            if current == 't':
                subject += ' ' + token
            elif current == 's':
                object_ += ' ' + token
            elif current == 'o':
                relation += ' ' + token
    if subject != '' and relation != '' and object_ != '':
        relations.append({
            'head': subject.strip(),
            'type': relation.strip(),
            'tail': object_.strip()
        })
    return relations

The output of this function is a list of triplets, which we can then use to populate our knowledge base. Each triplet is a dictionary with the following keys:
- `head`: the first entity of the triplet
- `type`: the relation of the triplet
- `tail`: the second entity of the triplet

# Knowledge base class

To store our knowledge base, we will create a `KB` class. This class will have the methods to add new relation, check if a relation exists, and print all the relations.

In [4]:
class KB():
    def __init__(self):
        self.relations = []

    def are_relations_equal(self, r1, r2):
        return all(r1[attr] == r2[attr] for attr in ["head", "type", "tail"])

    def exists_relation(self, r1):
        return any(self.are_relations_equal(r1, r2) for r2 in self.relations)

    def add_relation(self, r):
        if not self.exists_relation(r):
            self.relations.append(r)

    def print(self):
        print("Relations:")
        for r in self.relations:
            print(f"  {r}")
    
    def save(self, file_name):
        with open(file_name, "w") as f:
            for r in self.relations:
                f.write(f"{str(r)}\n")

    # def merge_relations(self, r1):
    #     r2 = [r for r in self.relations
    #           if self.are_relations_equal(r1, r)][0]
    #     spans_to_add = [span for span in r1["meta"]["spans"]
    #                     if span not in r2["meta"]["spans"]]
    #     r2["meta"]["spans"] += spans_to_add

    # def add_relation(self, r):
    #     if not self.exists_relation(r):
    #         self.relations.append(r)
    #     else:
    #         self.merge_relations(r)

# Creating KG from short text

Need to do the following:
1. Initialize an empty KB object
2. Tokenize the input text
3. Generate the triplets using REBEL
4. Add the triplets to the KB
5. Return the KB

In [5]:
def kb_from_small_text(text, verbose=False):
	kb = KB()

	# Tokenize the input
	model_inputs = tokenizer(text, max_length=256, 
			  padding=True, 
			  truncation=True, 
			  return_tensors="pt")
	if verbose:
		print(f"# tokens = {len(model_inputs['input_ids'][0])}")

	# Generate the output
	# arguments
	gen_kwargs = {
		"max_length": 256,
		"length_penalty": 0,
		"num_beams": 3,
		"num_return_sequences": 3
	}

	generated_tokens = model.generate(
		model_inputs["input_ids"].to(model.device),
		attention_mask=model_inputs["attention_mask"].to(model.device), 
		**gen_kwargs
	)

	decoded_preds = tokenizer.batch_decode(generated_tokens, skip_special_tokens=False)
	# create KB
	for sentence_pred in decoded_preds:
		if verbose:
			print(sentence_pred)
		relations = extract_relations_from_model_output(sentence_pred)
		for relation in relations:
			if verbose:
				print(relation)
			kb.add_relation(relation)
	return kb

In [8]:
# data_file = "../../data/01_the_fellowship_of_the_ring.txt"
# with open(data_file, "r") as f:
# 	text = f.read()

text = 'After an unknown period of time after the events of Breath of the Wild, Link and Zelda are exploring a secret passage hidden beneath Hyrule Castle. Although King Rhoam had warned Zelda in the past that not even the royal family was to go there, she believes that what lies beneath is connected to the strange phenomenon spreading throughout Hyrule known only as "Gloom". As they travel, they find ancient ruins which Zelda identifies as being of Zonai origin. They eventually find a mural depicting a great war against the being called "the Demon King", the stories of which had been passed down by the royal family. Another mural shows the Zonai descending from the sky, which causes Zelda to realize that the "gods" that descended from the heavens to found the royal family must have been the Zonai.'

text = 'Link awakens from a deep slumber and a mysterious voice guides him to discover what has become of the ruined country of Hyrule Kingdom. Link leaves the Shrine of Resurrection, runs up to the ledge and looks out at the ruins of the kingdom Hyrule. Link then meets an Old Man, who will give him the Paraglider, which is the only way to get to Hyrule. The Old Man wants the Spirit Orbs, in the Shrines, respectively the Oman Au Shrine, Ja Baij Shrine, Owa Daim Shrine, and the Keh Namut Shrine. After Link gets the spirit orbs, the Old Man appears, then mysteriously disappears, telling Link to meet him in the Temple of Time. The Old Man reveals himself as the spirit of the deceased King of Hyrule, King Rhoam. Link learns from King Rhoam that 100 years prior, a great evil known as the Calamity Ganon rose up and laid waste to the kingdom and its people. Unable to be defeated, it was sealed within Hyrule Castle, while the ruins of the land were ravaged by nature over time. Although trapped, the Calamity Ganon has grown in power, and Link must defeat it before it breaks free once more and destroys the world. The mysterious voice turns out to be Zelda, whom who is the daughter of King Rhoam. After escaping the confines of the Great Plateau, Link is directed to meet the wise Sheikah elder Impa, and learn about the Guardians and Divine Beasts: 10,000 years prior these machines were created and successfully used by another Hero and another Princess to defeat the Calamity Ganon. But throughout the ages, knowledge about the ancient technology was lost until excavations in Hyrule Kingdom brought them to light once more, coinciding with the expected return of Calamity Ganon a hundred years ago. The Guardians were reactivated and four Champions were chosen to control the Divine Beasts: The Zora princess Mipha, the Goron warrior Daruk, the Gerudo chief Urbosa, and the Rito archer Revali. All the while, Zelda was unsuccessfully trying to gain access to her own prophesied powers, accompanied on her quests by her knight, the Hylian Champion Link. When the Calamity Ganon ultimately attacked, it devastated the Kingdom of Hyrule Kingdom by taking control of the ancient machines and turning them against the Hyruleans. As a last resort, Zelda was able to place the gravely wounded Link in the Shrine of Resurrection and use her awoken sealing powers to trap herself with Calamity Ganon in Hyrule Castle. '

_text = "Napoleon Bonaparte (born Napoleone di Buonaparte; 15 August 1769 - 5 May 1821), and later known by his regnal name Napoleon I, was a French military and political leader who rose to prominence during the French Revolution and led several successful campaigns during the Revolutionary Wars. He was the de facto leader of the French Republic as First Consul from 1799 to 1804. As Napoleon I, he was Emperor of the French from 1804 until 1814 and again in 1815. Napoleon's political and cultural legacy has endured, and he has been one of the most celebrated and controversial leaders in world history."

text = "The fight with the elder dragon: You have finally found the location of your lifelong nemesis Slyrak. After slaying countless innocent people, Slyrak lays in the Eldwyrm Lair. Travel to the Eldwyrm Lair and slay Slyrak, the elder dragon"

master_KB = KB()
for i, sent in enumerate(text.split(".")):
	if sent.strip() == '':
		continue
	print(f"Processing sentence {i}:\n {sent}")
	kb = kb_from_small_text(sent)
	kb.print()
	for r in kb.relations:
		master_KB.add_relation(r)

Processing sentence 0:
 The fight with the elder dragon: You have finally found the location of your lifelong nemesis Slyrak
Relations:
  {'head': 'Slyrak', 'type': 'instance of', 'tail': 'elder dragon'}
  {'head': 'Slyrak', 'type': 'present in work', 'tail': 'fight with the elder dragon'}
  {'head': 'The fight with the elder dragon', 'type': 'characters', 'tail': 'Slyrak'}
Processing sentence 1:
  After slaying countless innocent people, Slyrak lays in the Eldwyrm Lair
Relations:
  {'head': 'Slyrak', 'type': 'place of death', 'tail': 'Eldwyrm Lair'}
  {'head': 'Slyrak', 'type': 'place of burial', 'tail': 'Eldwyrm Lair'}
  {'head': 'Slyrak', 'type': 'residence', 'tail': 'Eldwyrm Lair'}
Processing sentence 2:
  Travel to the Eldwyrm Lair and slay Slyrak, the elder dragon
Relations:
  {'head': 'Slyrak', 'type': 'residence', 'tail': 'Eldwyrm Lair'}
  {'head': 'Slyrak', 'type': 'present in work', 'tail': 'Eldwyrm Lair'}
  {'head': 'Slyrak', 'type': 'place of birth', 'tail': 'Eldwyrm Lair'}

In [27]:
master_KB.print()

Relations:
  {'head': 'Link', 'type': 'residence', 'tail': 'Hyrule Kingdom'}
  {'head': 'Link', 'type': 'country of citizenship', 'tail': 'Hyrule Kingdom'}
  {'head': 'Link', 'type': 'present in work', 'tail': 'Hyrule Kingdom'}
  {'head': 'Shrine of Resurrection', 'type': 'located in the administrative territorial entity', 'tail': 'Hyrule'}
  {'head': 'Link', 'type': 'residence', 'tail': 'Hyrule'}
  {'head': 'Shrine of Resurrection', 'type': 'location', 'tail': 'Hyrule'}
  {'head': 'Hyrule', 'type': 'characters', 'tail': 'Link'}
  {'head': 'Paraglider', 'type': 'owned by', 'tail': 'Link'}
  {'head': 'Oman Au Shrine', 'type': 'instance of', 'tail': 'Shrines'}
  {'head': 'Ja Baij Shrine', 'type': 'instance of', 'tail': 'Shrines'}
  {'head': 'Owa Daim Shrine', 'type': 'instance of', 'tail': 'Shrines'}
  {'head': 'Keh Namut Shrine', 'type': 'instance of', 'tail': 'Shrines'}
  {'head': 'Temple of Time', 'type': 'characters', 'tail': 'Link'}
  {'head': 'Old Man', 'type': 'residence', 'tail':

In [7]:
master_KB.save("zelda-botw-REBEL.kb")

In [6]:
def from_text_to_kb(text, span_length=128, verbose=False):
    # tokenize whole text
    inputs = tokenizer([text], return_tensors="pt")

    # compute span boundaries
    num_tokens = len(inputs["input_ids"][0])
    if verbose:
        print(f"Input has {num_tokens} tokens")
    num_spans = math.ceil(num_tokens / span_length)
    if verbose:
        print(f"Input has {num_spans} spans")
    overlap = math.ceil((num_spans * span_length - num_tokens) / 
                        max(num_spans - 1, 1))
    spans_boundaries = []
    start = 0
    for i in range(num_spans):
        spans_boundaries.append([start + span_length * i,
                                 start + span_length * (i + 1)])
        start -= overlap
    if verbose:
        print(f"Span boundaries are {spans_boundaries}")

    # transform input with spans
    tensor_ids = [inputs["input_ids"][0][boundary[0]:boundary[1]]
                  for boundary in spans_boundaries]
    tensor_masks = [inputs["attention_mask"][0][boundary[0]:boundary[1]]
                    for boundary in spans_boundaries]
    inputs = {
        "input_ids": torch.stack(tensor_ids),
        "attention_mask": torch.stack(tensor_masks)
    }

    # generate relations
    num_return_sequences = 3
    gen_kwargs = {
        "max_length": 256,
        "length_penalty": 0,
        "num_beams": 3,
        "num_return_sequences": num_return_sequences
    }
    generated_tokens = model.generate(
        **inputs,
        **gen_kwargs,
    )

    # decode relations
    decoded_preds = tokenizer.batch_decode(generated_tokens,
                                           skip_special_tokens=False)

    # create kb
    kb = KB()
    i = 0
    for sentence_pred in decoded_preds:
        current_span_index = i // num_return_sequences
        relations = extract_relations_from_model_output(sentence_pred)
        for relation in relations:
            relation["meta"] = {
                "spans": [spans_boundaries[current_span_index]]
            }
            kb.add_relation(relation)
        i += 1

    return kb

In [7]:
text = """
Napoleon Bonaparte (born Napoleone di Buonaparte; 15 August 1769 – 5 May 1821), and later known by his regnal name Napoleon I, was a French military and political leader who rose to prominence during the French Revolution and led several successful campaigns during the Revolutionary Wars. He was the de facto leader of the French Republic as First Consul from 1799 to 1804. As Napoleon I, he was Emperor of the French from 1804 until 1814 and again in 1815. Napoleon's political and cultural legacy has endured, and he has been one of the most celebrated and controversial leaders in world history. Napoleon was born on the island of Corsica not long after its annexation by the Kingdom of France.[5] He supported the French Revolution in 1789 while serving in the French army, and tried to spread its ideals to his native Corsica. He rose rapidly in the Army after he saved the governing French Directory by firing on royalist insurgents. In 1796, he began a military campaign against the Austrians and their Italian allies, scoring decisive victories and becoming a national hero. Two years later, he led a military expedition to Egypt that served as a springboard to political power. He engineered a coup in November 1799 and became First Consul of the Republic. Differences with the British meant that the French faced the War of the Third Coalition by 1805. Napoleon shattered this coalition with victories in the Ulm Campaign, and at the Battle of Austerlitz, which led to the dissolving of the Holy Roman Empire. In 1806, the Fourth Coalition took up arms against him because Prussia became worried about growing French influence on the continent. Napoleon knocked out Prussia at the battles of Jena and Auerstedt, marched the Grande Armée into Eastern Europe, annihilating the Russians in June 1807 at Friedland, and forcing the defeated nations of the Fourth Coalition to accept the Treaties of Tilsit. Two years later, the Austrians challenged the French again during the War of the Fifth Coalition, but Napoleon solidified his grip over Europe after triumphing at the Battle of Wagram. Hoping to extend the Continental System, his embargo against Britain, Napoleon invaded the Iberian Peninsula and declared his brother Joseph King of Spain in 1808. The Spanish and the Portuguese revolted in the Peninsular War, culminating in defeat for Napoleon's marshals. Napoleon launched an invasion of Russia in the summer of 1812. The resulting campaign witnessed the catastrophic retreat of Napoleon's Grande Armée. In 1813, Prussia and Austria joined Russian forces in a Sixth Coalition against France. A chaotic military campaign resulted in a large coalition army defeating Napoleon at the Battle of Leipzig in October 1813. The coalition invaded France and captured Paris, forcing Napoleon to abdicate in April 1814. He was exiled to the island of Elba, between Corsica and Italy. In France, the Bourbons were restored to power. However, Napoleon escaped Elba in February 1815 and took control of France.[6][7] The Allies responded by forming a Seventh Coalition, which defeated Napoleon at the Battle of Waterloo in June 1815. The British exiled him to the remote island of Saint Helena in the Atlantic, where he died in 1821 at the age of 51. Napoleon had an extensive impact on the modern world, bringing liberal reforms to the many countries he conquered, especially the Low Countries, Switzerland, and parts of modern Italy and Germany. He implemented liberal policies in France and Western Europe.
"""

text = 'After an unknown period of time after the events of Breath of the Wild, Link and Zelda are exploring a secret passage hidden beneath Hyrule Castle. Although King Rhoam had warned Zelda in the past that not even the royal family was to go there, she believes that what lies beneath is connected to the strange phenomenon spreading throughout Hyrule known only as "Gloom". As they travel, they find ancient ruins that Zelda identifies as being of Zonai origin. They eventually find a mural depicting a great war against the being called "the Demon King", the stories of which had been passed down by the royal family. Another mural shows the Zonai descending from the sky, which causes Zelda to realize that the "gods" that descended from the heavens to found the royal family must have been the Zonai.'

kb = from_text_to_kb(text, verbose=True)
kb.print()

Input has 167 tokens
Input has 2 spans
Span boundaries are [[0, 128], [39, 167]]


2023-05-29 13:21:28.917425: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Relations:
  {'head': 'Link', 'type': 'present in work', 'tail': 'Breath of the Wild', 'meta': {'spans': [[0, 128]]}}
  {'head': 'Breath of the Wild', 'type': 'characters', 'tail': 'Link', 'meta': {'spans': [[0, 128]]}}
  {'head': 'Breath of the Wild', 'type': 'characters', 'tail': 'Zelda', 'meta': {'spans': [[0, 128]]}}
  {'head': 'Zelda', 'type': 'present in work', 'tail': 'Breath of the Wild', 'meta': {'spans': [[0, 128]]}}
  {'head': 'Gloom', 'type': 'located in the administrative territorial entity', 'tail': 'Hyrule', 'meta': {'spans': [[39, 167]]}}
  {'head': 'Gloom', 'type': 'location', 'tail': 'Hyrule', 'meta': {'spans': [[39, 167]]}}
  {'head': 'Zonai', 'type': 'located in the administrative territorial entity', 'tail': 'Hyrule', 'meta': {'spans': [[39, 167]]}}
