Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问一下用于事件抽取的数据集格式 #26

Closed
StCross opened this issue May 11, 2022 · 2 comments
Closed

请问一下用于事件抽取的数据集格式 #26

StCross opened this issue May 11, 2022 · 2 comments
Labels
question Further information is requested

Comments

@StCross
Copy link

StCross commented May 11, 2022

如题,看到之前的issue里面OpenUE可以用于事件抽取,并且提供了DuEE1.0数据集作为参考格式。请问DuEE1.0是可以直接作为模型输入还是需要改动呢?毕竟和demo中提供的三元组抽取数据集格式不一样

@zxlzr
Copy link
Contributor

zxlzr commented May 11, 2022

关于事件抽取openue做的比较简单,我们做的实验是先classification 做事件分类,再用序列标注做argument抽取。
比如输入句子S, 三元组抽取是输出多个(头实体、关系、尾实体),事件抽取是输出多个多元组(多元组的个数和schema定义有关,如四元组)(事件类型,论元1(时间),论元2(地点),论元3(目标)),有个线上demo用此技术实现的,您可以看下demo帮助理解http://openue.zjukg.org/ (事件抽取部分)

当时处理事件抽取的预处理代码因历史原因遗失了,您可以按照这个格式预处理数据把事件抽取当成多元组抽取任务:

{
"text": "事件句子",
“event_list": [{
"predicate": “事件类型1”, “arguments”:[{“role”:”时间”, “argument”:”论元1"},{“role””:”地点”, “argument”:”论元2"}]},
"predicate": “事件类型2”, “arguments”:[{“role”:”时间", “argument”:”论元1"},{“role””:”地点”, “argument”:”论元2"}]},
}
可以参考百度的数据集,我们当时开发工具时候格式参考了百度数据集例子的https://aistudio.baidu.com/aistudio/competition/detail/32/0/task-definition。

Openue相对是个轻量的小工具,另外我们会在下半年在实验室开发的另一个工具DeepKE https://github.com/zjunlp/deepke 开源一个重量的完整事件抽取框架。您可以后续关注。

@StCross
Copy link
Author

StCross commented May 11, 2022

好的,非常感谢您详细的回答,会持续关注贵实验室的工作

@zxlzr zxlzr added the question Further information is requested label May 11, 2022
@zxlzr zxlzr closed this as completed May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants