## Information Extraction Pipeline

![image](https://miro.medium.com/max/700/1*pHl8XXk0GMo_40rRLRVDqA.png)

#### Coreference resolution

In [12]:
text = 'KEATING REVISES DOWN AUSTRALIAN GROWTH FORECAST Treasurer Paul Keating forecast economic growth at slightly under two % in the financial year ending June this year , down from the 2.25 % forecast contained in the 1986 / 87 budget delivered last August.Australia \' s terms of trade also fell , by 18 % , over the past two years , he told Parliament.Terms of trade are the difference between import and export price indexes.Despite the figures , the budget forecast of about 1.75 % annual growth in employment would be met , Keating said.Unemployment is currently at 8.2 % of the workforce." This government is dragging Australia through a trading holocaust the kind of which we have not seen since the Second World War ," Keating said." We are not pushing this place into a recession.We are not only holding our gains on unemployment , we are bringing unemployment down ," he said , adding that the government had help the country avoid recession .'

In [13]:
text

'KEATING REVISES DOWN AUSTRALIAN GROWTH FORECAST Treasurer Paul Keating forecast economic growth at slightly under two % in the financial year ending June this year , down from the 2.25 % forecast contained in the 1986 / 87 budget delivered last August.Australia \' s terms of trade also fell , by 18 % , over the past two years , he told Parliament.Terms of trade are the difference between import and export price indexes.Despite the figures , the budget forecast of about 1.75 % annual growth in employment would be met , Keating said.Unemployment is currently at 8.2 % of the workforce." This government is dragging Australia through a trading holocaust the kind of which we have not seen since the Second World War ," Keating said." We are not pushing this place into a recession.We are not only holding our gains on unemployment , we are bringing unemployment down ," he said , adding that the government had help the country avoid recession .'

In [34]:
import spacy
import neuralcoref

# Load SpaCy
nlp = spacy.load('en')
# Add neural coref to SpaCy's pipe
neuralcoref.add_to_pipe(nlp)

def coref_resolution(text):
    """Function that executes coreference resolution on a given text"""
    doc = nlp(text)
    # fetches tokens with whitespaces from spacy document
    
    
    tok_list = list(token.text_with_ws for token in doc) # fetches tokens with whitespaces from spacy document
    for cluster in doc._.coref_clusters:
        cluster_main_words = set(cluster.main.text.split(' ')) # get tokens from representative cluster name
        for coref in cluster:
            if coref!=cluster.main: #if coreference element is not the representative element of that cluster
                if coref.text!=cluster.main.text and bool(set(coref.text.split(' ')).intersection(cluster_main_words))==False: 
                # if coreference element text and representative element text are not equal and none of the coreference element words are in representative element. This was done to handle nested coreference scenarios
                    tok_list[coref.start] = cluster.main.text + doc[coref.end-1].whitespace_
                    for i in range(coref.start+1, coref.end):
                        tok_list[i] = ""     
    return "".join(tok_list)

In [35]:
result = coref_resolution(text)
result

'KEATING REVISES DOWN AUSTRALIAN GROWTH FORECAST Treasurer Paul Keating forecast economic growth at slightly under two % in the financial year ending June this year , down from the 2.25 % forecast contained in the 1986 / 87 budget delivered last August.Australia \' s terms of trade also fell , by 18 % , over the past two years , Paul Keating told Parliament.Terms of trade are the difference between import and export price indexes.Despite the figures , the budget forecast of about 1.75 % annual growth in employment would be met , Keating said.Unemployment is currently at 8.2 % of the workforce." This government is dragging Australia through a trading holocaust the kind of which we have not seen since the Second World War ," Keating said." we are not pushing this place into a recession.we are not only holding we gains on unemployment , we are bringing unemployment down ," Paul Keating said , adding that the government had help Australia avoid recession .'

In [36]:
import spacy
import neuralcoref

nlp = spacy.load('en_core_web_sm')  # load the model
neuralcoref.add_to_pipe(nlp)

doc = nlp(text)  # get the spaCy Doc (composed of Tokens)

print(doc._.coref_clusters)
# Result: [Eva and Martha: [Eva and Martha, their, they], Jenny: [Jenny, her]]

x = doc._.coref_resolved
print(x)
# Result: "Eva and Martha didn't want Eva and Martha friend Jenny to feel lonely so Eva and Martha invited Jenny to the party."

[Paul Keating: [Paul Keating, he, Keating, " Keating, Keating, he], Australia: [Australia, Australia, the country], This government: [This government, the government], we: [we, We, We, our, we]]
KEATING REVISES DOWN AUSTRALIAN GROWTH FORECAST Treasurer Paul Keating forecast economic growth at slightly under two % in the financial year ending June this year , down from the 2.25 % forecast contained in the 1986 / 87 budget delivered last August.Australia ' s terms of trade also fell , by 18 % , over the past two years , Paul Keating told Parliament.Terms of trade are the difference between import and export price indexes.Despite the figures , the budget forecast of about 1.75 % annual growth in employment would be met , Paul Keating said.Unemployment is currently at 8.2 % of the workforce." This government is dragging Australia through a trading holocaust the kind of which we have not seen since the Second World War ,Paul Keating Paul Keating said." we are not pushing this place into a r

In [37]:
import spacy
from spacy import displacy
import en_core_web_sm

# Create an instance of the small pipeline and model from SpaCy
nlp = en_core_web_sm.load()
doc = nlp(result)
displacy.render(doc, jupyter=True, style='ent') 

In [20]:
doc = nlp(text)
displacy.render(doc, jupyter=True, style='ent') 

In [22]:
import pandas as pd
doc = nlp(x)

entities = []
labels = []

for ent in doc.ents:
    entities.append(ent)
    labels.append(ent.label_)
    
df = pd.DataFrame({'Entities':entities,'Labels':labels})
df

Unnamed: 0,Entities,Labels
0,"(Paul, Keating)",PERSON
1,"(two, %)",PERCENT
2,"(the, financial, year, ending, June, this, year)",DATE
3,"(2.25, %)",PERCENT
4,"(1986, /, 87)",DATE
5,"(last, August)",DATE
6,(Australia),GPE
7,"(18, %)",PERCENT
8,"(the, past, two, years)",DATE
9,"(Paul, Keating)",PERSON
