This code used in the paper "Paraphrasing vs Coreferring: Two Sides of the Same Coin" by Yehudit Meged, Avi Caciularu, Vered Shwartz, Ido Dagan. EMNLP Finding 2020. (https://arxiv.org/abs/2004.14979)
A random forest model for classifing and ranking for paraphrases identification taks.
This research is consisit of 4 stages:
This stage code is extracting the tweets by their id
The features are consisint of 5 feture groups: Named Entity Coverage is in the NER directory, cross-document coreference resolution is in coreference directory, connected componenet and clique are in graph directory and the chirps features are derived from the chirps resource
Tha paraphrases annotation code in MTAnnotation directory
The model training code is in classification directory