Skip to content

tjsiledar/SummEval-OP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation

Paper Link: One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation

Abstract

Evaluation of opinion summaries using conventional reference-based metrics rarely provides a holistic evaluation and has been shown to have a relatively low correlation with human judgments. Recent studies suggest using Large Language Models (LLMs) as reference-free metrics for NLG evaluation, however, they remain unexplored for opinion summary evaluation. Moreover, limited opinion summary evaluation datasets inhibit progress. To address this, we release the SUMMEVAL-OP dataset covering 7 dimensions related to the evaluation of opinion summaries: fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, and specificity. We investigate Op-I-Prompt a dimension-independent prompt, and Op-Prompts, a dimension-dependent set of prompts for opinion summary evaluation. Experiments indicate that Op-I-Prompt emerges as a good alternative for evaluating opinion summaries achieving an average Spearman correlation of 0.70 with humans, outperforming all previous approaches. To the best of our knowledge, we are the first to investigate LLMs as evaluators on both closed-source and open-source models in the opinion summarization domain.

Dataset

The dataset can be found at annotated_dataset/summeval-op.jsonl. A single line in jsonl file is formatted as follows:

{
      "reviews": {
            "rev1" : "this is the first review",
            "..."
      },
      "summaries" : {
            "model1" : {
                  "summary" : "summary generated by model 1",
                  "dimensions" : {
                        "dimension1" : 3.5,
                        "..."
                  }
            },
            "..."
      }
}

Citation

@misc{siledar2024prompt,
      title={One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation}, 
      author={Tejpalsingh Siledar and Swaroop Nath and Sankara Sri Raghava Ravindra Muddu and Rupasai Rangaraju and Swaprava Nath and Pushpak Bhattacharyya and Suman Banerjee and Amey Patil and Sudhanshu Shekhar Singh and Muthusamy Chelliah and Nikesh Garera},
      year={2024},
      eprint={2402.11683},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages