Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据重复 #5

Open
ggboondl opened this issue Nov 14, 2023 · 8 comments
Open

数据重复 #5

ggboondl opened this issue Nov 14, 2023 · 8 comments

Comments

@ggboondl
Copy link

为什么很多数据都是重复的呢

@taishan1994
Copy link
Owner

哪个数据集

@xuanxuanxuanxuan
Copy link

哪个数据集

1715414758089
比如”BERT-Relation-Extraction-main\data\duie\re_data\train.txt“路径下的就挺多重复,数据需要去重嘛

@taishan1994
Copy link
Owner

里面的关系不一样。

@xuanxuanxuanxuan
Copy link

里面的关系不一样。
嗯?怎么理解不一样呢,不都是季冠霖 周芷若 配音;而且上面7句话,一摸一样的呀? 是我哪里忽略了嘛

@taishan1994
Copy link
Owner

labels里面是一样的么

@xuanxuanxuanxuan
Copy link

labels里面是一样的么

嗯,我已经把图片上传了,不知道您是否看得到

@taishan1994
Copy link
Owner

看到了,确实重复了,可以考虑删除掉。

@zengtao1978
Copy link

process.py get_re_data 方法代码错误。
ent_rel_dict[spo["predicate"]].append((sbj, obj))
res.append(tmp) 错误
->
res.append(tmp.copy()) 修改后

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants