Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coreference crucial enhancement - account for predicate's arguments #35

Closed
kleinay opened this issue Sep 17, 2017 · 4 comments
Closed
Assignees

Comments

@kleinay
Copy link
Collaborator

kleinay commented Sep 17, 2017

Following #31 major change at splitting entities and marking implicit propositions, it is crucial to enhance proposition coreference algorithm by accounting for the arguments of the predicate (and not only the head lemma).

@kleinay
Copy link
Collaborator Author

kleinay commented Sep 17, 2017

Changed the input for the cluster_mentions function for propositions.
until now, each mention in the mention_list was a (mention-id, head_lemma, ) tuple, where the score function only used the second element from the tuple (the head lemma).
After my change, each mention is a (mention-unique-id, mention-head-lemma, mention-full-info) tuple.
mention-head-lemma is a string, and is given for backward compatibility.
mention-full-info is a dict containing all the info about the proposition-mention as given by props_wrapper, with a modification of the "Arguments" field, which would be a dict mapping template-symbols (e.g. "A1" or "P2") to their mention records.

a mention for example:
('7_P1', 'suspect',
{'Arguments':
{'A1': {'indices': (2,), 'sentence_id': '7', 'terms': u'down'},
'A2': {'indices': (0,), 'sentence_id': '7', 'terms': u'Turkey'},
'A3': {'indices': (5,), 'sentence_id': '7', 'terms': u'plane'}},
'Bare predicate': ('suspect', (3,)),
'Head': {'Lemma': 'suspect', 'POS': 'VBP', 'Surface': ('suspect', [3])},
'Template': '{A2} {A1} suspect {A3}',
'sentence_id': '7'}

@OriShapira
Copy link
Collaborator

Just as another example, in the attached file, notice proposition P.16 has many different unrelated predicates coreferred ("have been targeting", "killing", "raping and killing", "am trying", "western", "able", "are attacking over", "floating in", "one of", ...).

Burma.in.json.txt

kleinay added a commit to kleinay/OKR that referenced this issue Sep 26, 2017
…s by replacing the head lemma with head surface when head lemma is empty
@kleinay
Copy link
Collaborator Author

kleinay commented Sep 28, 2017

this was addressed by @shanybar in PR #40.

@kleinay kleinay closed this as completed Sep 28, 2017
@kleinay kleinay reopened this Sep 28, 2017
@kleinay
Copy link
Collaborator Author

kleinay commented Sep 28, 2017

@OriShapira , the strange coreference cluster was caused by clustering all proposition that has empty head lemma (failure of the lammatizer return empty string). fixed in #41.

@kleinay kleinay closed this as completed Sep 28, 2017
kleinay added a commit that referenced this issue Sep 28, 2017
Addressing Ori's comment at #35 - handling empty head lemma of propositions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants