Skip to content

tjunlp-lab/CMoralEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

Introduction:

What a large language model (LLM) would respond in ethically relevant context? In this paper, we curate a large benchmark CMoralEval for morality evaluation of Chinese LLMs. The data sources of CMoralEval are two-fold: 1) a Chinese TV program discussing Chinese moral norms with stories from the society and 2) a collection of Chinese moral anomies from various newspapers and academic papers on morality. With these sources, we aim to create a moral evaluation dataset characterized by diversity and authenticity. We develop a morality taxonomy and a set of fundamental moral principles that are not only rooted in traditional Chinese culture but also consistent with contemporary societal norms. To facilitate efficient construction and annotation of instances in CMoralEval, we establish a platform with AI-assisted instance generation to streamline the annotation process. These help us curate CMoralEval that encompasses both explicit moral scenarios (14,964 instances) and moral dilemma scenarios (15,424 instances), each with instances from different data sources. We conduct extensive experiments with CMoralEval to examine a variety of Chinese LLMs. Experiment results demonstrate that CMoralEval is a challenging benchmark for Chinese LLMs.

Citation

@misc{yu2024cmoralevalmoralevaluationbenchmark,
      title={CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models}, 
      author={Linhao Yu and Yongqi Leng and Yufei Huang and Shang Wu and Haixin Liu and Xinmeng Ji and Jiahui Zhao and Jinwang Song and Tingting Cui and Xiaoqing Cheng and Tao Liu and Deyi Xiong},
      year={2024},
      eprint={2408.09819},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.09819}, 
}

About

A Moral Evaluation Benchmark for Chinese Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors