<a href="https://colab.research.google.com/github/victor-roris/mediumseries/blob/master/NLP/NeuralCoref_Coreference_Resolution_use.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks.

The process of linking together mentions that relates to real world entities is called **coreference resolution**.

NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolves coreference clusters using a neural network. NeuralCoref is production-ready, integrated in spaCy's NLP pipeline and extensible to new training datasets. 

NeuralCoref is written in Python/Cython and comes with a pre-trained statistical model for **English only**.

For a brief introduction to coreference resolution and NeuralCoref: https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30

**NOTE**: In this moment, MeuralCoref only works with spaCy 2.1.0

## Installation

In [1]:
! pip install neuralcoref



If you have an error mentioning `spacy.strings.StringStore size changed, may indicate binary incompatibility` when loading NeuralCoref with `import neuralcoref`, it means you'll have to install NeuralCoref from the distribution's sources

In [0]:
# !pip uninstall neuralcoref
# !pip install neuralcoref --no-binary neuralcoref

The `neuralcoref` package has a known incompatibility with new versions of spaCy. A solution is install a specific version of spaCy (2.1.0): https://github.com/huggingface/neuralcoref/issues/158
 

In [3]:
! pip install 'spacy==2.1.0'
! pip install 'cython>=0.25'
! pip install 'pytest'



In [4]:
! python -m spacy download en

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.6/dist-packages/en_core_web_sm -->
/usr/local/lib/python3.6/dist-packages/spacy/data/en
You can now load the model via spacy.load('en')


## Using NeuralCoref

Load spacy Model 

In [0]:
# Load your usual SpaCy model (one of SpaCy English models)
import spacy
nlp = spacy.load('en')

Add neuralcoref component to the pipeline

In [5]:
# Add neural coref to SpaCy's pipe
import neuralcoref

print(f'Original components in pipeline : {nlp.pipe_names}')

# Mode 1: use the package instruction
if 'neuralcoref' not in nlp.pipe_names:
  neuralcoref.add_to_pipe(nlp)

  # Mode 2: manual adding
if 'neuralcoref' not in nlp.pipe_names:
  coref = neuralcoref.NeuralCoref(nlp.vocab)
  nlp.add_pipe(coref, name='neuralcoref')

print(f'New pipeline : {nlp.pipe_names}')

Original components in pipeline : ['tagger', 'parser', 'ner', 'neuralcoref']
New pipeline : ['tagger', 'parser', 'ner', 'neuralcoref']


In [0]:
# You're done. You can now use NeuralCoref as you usually manipulate a SpaCy document annotations.
doc = nlp(u'My sister has a dog. She loves him.')

 * **DOC attributes**

Has any coreference has been resolved in the Doc

In [8]:
doc._.has_coref

True

All the clusters of corefering mentions in the doc

In [9]:
doc._.coref_clusters

[My sister: [My sister, She], a dog: [a dog, him]]

Unicode representation of the doc where each corefering mention is replaced by the main mention in the associated cluster.

In [10]:
doc._.coref_resolved

'My sister has a dog. My sister loves a dog.'

Scores of the coreference resolution between mentions.

In [11]:
doc._.coref_scores

{My sister: {My sister: 1.3110305070877075},
 a dog: {My sister: -1.6715972423553467, a dog: 1.804752230644226},
 She: {My sister: 8.058426856994629,
  a dog: -1.0625176429748535,
  She: -0.10834205150604248},
 him: {My sister: 3.1147186756134033,
  a dog: 4.356405258178711,
  She: -3.1379528045654297,
  him: -1.870743989944458}}

* **SPAN attribute**

In [26]:
span = doc[0:2]
print(span)

My sister



Whether the span has at least one corefering mention



In [27]:
span._.is_coref

True

Cluster of mentions that corefer with the span

In [28]:
span._.coref_cluster

My sister: [My sister, She]

Scores of the coreference resolution of & span with other mentions (if applicable).

In [29]:
span._.coref_scores

{My sister: 1.3110305070877075}

* **TOKEN attribute**

In [31]:
token = doc[6]
print(token)

She


Whether the token is inside at least one corefering mention

In [32]:
token._.in_coref

True

All the clusters of corefering mentions that contains the token

In [33]:
token._.coref_clusters

[My sister: [My sister, She]]

## Navigating the coreference cluster chains

In [35]:
import spacy
import neuralcoref
nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)

doc = nlp(u'My sister has a dog. She loves him')



[My sister: [My sister, She], a dog: [a dog, him]]
[a dog, him]
him
a dog


a dog: [a dog, him]

In [43]:
print(f'Coref clusters = {doc._.coref_clusters}')
print(f'Coref cluster in the 2nd position = {doc._.coref_clusters[1]}')
print(f'Mentions of the 2nd cluster = {doc._.coref_clusters[1].mentions}')
print(f'Last mention of the 2nd cluster = {doc._.coref_clusters[1].mentions[-1]}')
print(f'Span of the most representative mention related with the last mention of the 2nd cluster = {doc._.coref_clusters[1].mentions[-1]._.coref_cluster.main}')

Coref clusters = [My sister: [My sister, She], a dog: [a dog, him]]
Coref cluster in the 2nd position = a dog: [a dog, him]
Mentions of the 2nd cluster = [a dog, him]
Last mention of the 2nd cluster = him
Span of the most representative mention related with the last mention of the 2nd cluster = a dog



In [46]:
token = doc[-1]
print(token)
print(token._.in_coref)
print(token._.coref_clusters)

him
True
[a dog: [a dog, him]]


In [47]:
span = doc[-1:]
print(span)
print(span._.is_coref)
print(span._.coref_cluster.main)
print(span._.coref_cluster.main._.coref_cluster)

him
True
a dog
a dog: [a dog, him]


## Configuration

### How to change a parameter

Ex. `greedyness` = A number between 0 and 1 determining how greedy the model is about making coreference decisions (more greedy means more coreference links). The default value is 0.5.

In [0]:
import spacy
import neuralcoref

# Let's load a SpaCy model
nlp = spacy.load('en')

# First way we can control a parameter
neuralcoref.add_to_pipe(nlp, greedyness=0.75)

# Another way we can control a parameter
nlp.remove_pipe("neuralcoref")  # This remove the current neuralcoref instance from SpaCy pipe
coref = neuralcoref.NeuralCoref(nlp.vocab, greedyness=0.75)
nlp.add_pipe(coref, name='neuralcoref')

### Using the conversion dictionary parameter to help resolve rare words

`conv_dict` = A conversion dictionary that you can use to replace the embeddings of rare words (keys) by an average of the embeddings of a list of common words (values). Ex: `conv_dict={"Angela": ["woman", "girl"]}` will help resolving coreferences for `Angela` by using the embeddings for the more common `woman` and `girl` instead of the embedding of `Angela`. This currently only works for single words (not for words groups).

In [0]:
import spacy
import neuralcoref

nlp = spacy.load('en')

# Let's try before using the conversion dictionary:
neuralcoref.add_to_pipe(nlp)
doc = nlp(u'Deepika has a dog. She loves him. The movie star has always been fond of animals')



In [2]:
print(doc._.coref_clusters)
print(doc._.coref_resolved)
# >>> [Deepika: [Deepika, She, him, The movie star]]
# >>> 'Deepika has a dog. Deepika loves Deepika. Deepika has always been fond of animals'
# >>> Not very good...

[Deepika: [Deepika, She, him, The movie star]]
Deepika has a dog. Deepika loves Deepika. Deepika has always been fond of animals


In [0]:
# Here are three ways we can add the conversion dictionary
nlp.remove_pipe("neuralcoref")
neuralcoref.add_to_pipe(nlp, conv_dict={'Deepika': ['woman', 'actress']})

# or
nlp.remove_pipe("neuralcoref")
coref = neuralcoref.NeuralCoref(nlp.vocab, conv_dict={'Deepika': ['woman', 'actress']})
nlp.add_pipe(coref, name='neuralcoref')

# or after NeuralCoref is already in SpaCy's pipe, by modifying NeuralCoref in the pipeline
nlp.get_pipe('neuralcoref').set_conv_dict({'Deepika': ['woman', 'actress']})

In [4]:
# Let's try agin with the conversion dictionary:
doc = nlp(u'Deepika has a dog. She loves him. The movie star has always been fond of animals')
print(doc._.coref_clusters)
print(doc._.coref_resolved)
# >>> [Deepika: [Deepika, She, The movie star], a dog: [a dog, him]]
# >>> 'Deepika has a dog. Deepika loves a dog. Deepika has always been fond of animals'
# >>> A lot better!

[Deepika: [Deepika, She, The movie star], a dog: [a dog, him]]
Deepika has a dog. Deepika loves a dog. Deepika has always been fond of animals


## Personal example

Thanks Naty for the example, you are a love (and a liar)!

In [0]:
import spacy
import neuralcoref

nlp = spacy.load('en')

# Let's try before using the conversion dictionary:
neuralcoref.add_to_pipe(nlp)
doc = nlp(u'Victor is a handsome boy. He always wears casual clothes and they look great on him.' )


In [8]:
print(doc._.coref_clusters)
print(doc._.coref_resolved)

[Victor: [Victor, He, him], casual clothes: [casual clothes, they]]
Victor is handsome boy. Victor always wears casual clothes and casual clothes look great on Victor.
