One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation

Paper Link: One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation

Abstract

Evaluation of opinion summaries using conventional reference-based metrics rarely provides a holistic evaluation and has been shown to have a relatively low correlation with human judgments. Recent studies suggest using Large Language Models (LLMs) as reference-free metrics for NLG evaluation, however, they remain unexplored for opinion summary evaluation. Moreover, limited opinion summary evaluation datasets inhibit progress. To address this, we release the SUMMEVAL-OP dataset covering 7 dimensions related to the evaluation of opinion summaries: fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, and specificity. We investigate Op-I-Prompt a dimension-independent prompt, and Op-Prompts, a dimension-dependent set of prompts for opinion summary evaluation. Experiments indicate that Op-I-Prompt emerges as a good alternative for evaluating opinion summaries achieving an average Spearman correlation of 0.70 with humans, outperforming all previous approaches. To the best of our knowledge, we are the first to investigate LLMs as evaluators on both closed-source and open-source models in the opinion summarization domain.

Dataset

The dataset can be found at annotated_dataset/summeval-op.jsonl. A single line in jsonl file is formatted as follows:

{
      "reviews": {
            "rev1" : "this is the first review",
            "..."
      },
      "summaries" : {
            "model1" : {
                  "summary" : "summary generated by model 1",
                  "dimensions" : {
                        "dimension1" : 3.5,
                        "..."
                  }
            },
            "..."
      }
}

Citation

@misc{siledar2024prompt,
      title={One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation}, 
      author={Tejpalsingh Siledar and Swaroop Nath and Sankara Sri Raghava Ravindra Muddu and Rupasai Rangaraju and Swaprava Nath and Pushpak Bhattacharyya and Suman Banerjee and Amey Patil and Sudhanshu Shekhar Singh and Muthusamy Chelliah and Nikesh Garera},
      year={2024},
      eprint={2402.11683},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
annotated_dataset		annotated_dataset
code		code
prompts		prompts
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation

Abstract

Dataset

Citation

About

Releases

Packages

Languages

License

tjsiledar/SummEval-OP

Folders and files

Latest commit

History

Repository files navigation

One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation

Abstract

Dataset

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages