Paraphrasing vs Coreferring: Two Sides of the Same Coin

Introduction

This code used in the paper "Paraphrasing vs Coreferring: Two Sides of the Same Coin" by Yehudit Meged, Avi Caciularu, Vered Shwartz, Ido Dagan. EMNLP Finding 2020. (https://arxiv.org/abs/2004.14979)

A random forest model for classifing and ranking for paraphrases identification taks.

Instructions

This research is consisit of 4 stages:

Tweets pair collection

This stage code is extracting the tweets by their id

Features creation

The features are consisint of 5 feture groups: Named Entity Coverage is in the NER directory, cross-document coreference resolution is in coreference directory, connected componenet and clique are in graph directory and the chirps features are derived from the chirps resource

Paraphrases annotation

Tha paraphrases annotation code in MTAnnotation directory

Model training

The model training code is in classification directory

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
MTAnnotation		MTAnnotation
NER		NER
classifiers		classifiers
coreference		coreference
data/tweets		data/tweets
graph		graph
README.md		README.md
extract_articles.py		extract_articles.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paraphrasing vs Coreferring: Two Sides of the Same Coin

Introduction

Instructions

Tweets pair collection

Features creation

Paraphrases annotation

Model training

About

Releases

Packages

Languages

yehudit96/coreferrability

Folders and files

Latest commit

History

Repository files navigation

Paraphrasing vs Coreferring: Two Sides of the Same Coin

Introduction

Instructions

Tweets pair collection

Features creation

Paraphrases annotation

Model training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages