A unified framework for recent evaluation metrics about natural language generation.
Work in progress.
git clone git@github.com:yuhui-zh15/nlg_metrics.git
cd nlg_metrics
pip install -e .
python -m pytest tests/
>>> from nlg_metrics import RougeScorer
>>> scorer = RougeScorer()
>>> scores = scorer.score(['This is a test sentence.'], ['This is another test sentence.'])
>>> print(scores)
Metric | Progress | Paper |
---|---|---|
ROUGE | COMPLETE | ROUGE: A Package for Automatic Evaluation of Summaries |
BERTScore | COMPLETE | BERTScore: Evaluating Text Generation With BERT |
FactScore | COMPLETE | Evaluating the Factual Correctness for Abstractive Summarization |
MoverScore | TODO | MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance |
BLEU | TODO | BLEU: a Method for Automatic Evaluation of Machine Translation |
METEOR | TODO | METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments |