# Relation Extraction

## Make the `get_relation` function more robust

For this exercise, try to extend the `get_relation` so that it can support more sentence structures. Specifically, try to extract the underlined relations from the corresponding sentences below. If no relation is found, then the function should return `None`. 

* John <u>completed</u> the difficult course prerequisites.
* People used to believe that the sun <u>rotates around</u> the sun. 
* It is well-known that James Watt <u>improved</u> the steam engine.
* Computer science <u>is</u> a combination of engineering and mathematics.
* In order to decode the Enigma machine, Alan Turing succeeded in <u>inventing</u> a decoding machine.
* In order to decode the Enigma machine, Alan Turing tried to <u>invent</u> a decoding machine.

In [1]:
sent1 = "John completed the difficult course prerequisites."
sent2 = "People used to believe that the sun rotates around the earth."
sent3 = "It is well-known that James Watt improved the steam engine."
sent4 = "Computer science is a combination of engineering and mathematics."
sent5 = "In order to decode the Enigma machine, Alan Turing succeeded in inventing a decoding machine."
sent6 = "In order to decode the Enigma machine, Alan Turing tried to invent a decoding machine."

In [2]:
import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")

In [3]:
def get_relation(sent):
    doc = nlp(sent)
    deps = []
    for token in doc:
        deps.append(token.dep_)
    deps = " ".join(deps)
    
    matcher = Matcher(nlp.vocab)
    span = None
    if "ROOT prep pcomp" in deps:
        pattern = [
            {"DEP": "ROOT"},
            {"DEP": "prep"},
            {"DEP": "pcomp"},
        ]
        matcher.add("matching", [pattern])
        matches = matcher(doc)
        if len(matches) == 0:
            return None
        span = doc[matches[-1][2]-1:matches[-1][2]]
    elif "ROOT aux xcomp" in deps:
        if "mark" in deps and "ccomp" in deps:
            pattern = [
                {"DEP": "ccomp"},
                {'DEP': "prep", 'OP': "?"},
            ]
            matcher.add("matching", [pattern])
            matches = matcher(doc)
            if len(matches) == 0:
                return None
            span = doc[matches[-1][1]:matches[-1][2]]
        else:
            pattern = [
                {"DEP": "ROOT"},
                {"DEP": "aux"},
                {"DEP": "xcomp"},
            ]
            matcher.add("matching", [pattern])
            matches = matcher(doc)
            if len(matches) == 0:
                return None
            span = doc[matches[-1][2]-1:matches[-1][2]]
            
    elif "mark" not in deps or "ccomp" not in deps:
        pattern = [
            {"DEP": "ROOT"},
            {"DEP": "prep", "OP": "?"},
            {"DEP": "agent", "OP": "?"},
            {"DEP": "ADJ", "OP": "?"},
        ]
        matcher.add("matching", [pattern])
        matches = matcher(doc)
        if len(matches) == 0:
            return None
        span = doc[matches[-1][1]:matches[-1][2]]
    else:
        pattern = [
            {"DEP": "ccomp"},
            {'DEP': "prep", 'OP': "?"},
        ]
        matcher.add("matching", [pattern])
        matches = matcher(doc)
        if len(matches) == 0:
            return None
        span = doc[matches[-1][1]:matches[-1][2]]
    
    if not span:
        return None

    return (span.text)

In [4]:
print(get_relation(sent1))
print("=" * 20)
print(get_relation(sent2))
print("=" * 20)
print(get_relation(sent3))
print("=" * 20)
print(get_relation(sent4))
print("=" * 20)
print(get_relation(sent5))
print("=" * 20)
print(get_relation(sent6))

completed
rotates around
improved
is
inventing
invent


## Try breaking your `get_relation` implementation
Now that your `get_relation` functions can handle more complex sentences, is there any other sentence structures that your implementation may fail to handle. Please try enumerating some sentence examples that will break your implementation!

In [5]:
# TODO: Come up with some sentences that will 
#       break your get_relation function.