Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard中的run_bert.py对英文数据集conll2003进行实体识别出现错误 #187

Closed
qinglongheu opened this issue Dec 8, 2022 · 16 comments
Labels
bug Something isn't working

Comments

@qinglongheu
Copy link

报错信息如下
Traceback (most recent call last):
File "/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard/run_bert.py", line 135, in main
train_features = convert_examples_to_features(train_examples, label_list, cfg.max_seq_length, tokenizer)
File "/home/qinglong/.conda/envs/deepke/lib/python3.8/site-packages/deepke/name_entity_re/standard/tools/preprocess.py", line 92, in convert_examples_to_features
label_ids.append(label_map[labels[i]])
KeyError: 'EU\tB-ORG'
经debug发现examples中的每一个样本数据中的
text_a = 'EU B-ORG
rejects O
German B-MISC
call O
to O
boycott O
British B-MISC
lamb O
. O
'
labellist = ['EU\tB-ORG', 'rejects\tO', 'German\tB-MISC', 'call\tO', 'to\tO', 'boycott\tO', 'British\tB-MISC', 'lamb\tO', '.\tO']
textlist = ['EU\tB-ORG\n', 'rejects\tO\n', 'German\tB-MISC\n', 'call\tO\n', 'to\tO\n', 'boycott\tO\n', 'British\tB-MISC\n', 'lamb\tO\n', '.\tO\n']
而中文数据集中的
examples 中的
text_a ='海 钓 比 赛 地 点 在 厦 门 与 金 门 之 间 的 海 域 。'

@qinglongheu qinglongheu added the bug Something isn't working label Dec 8, 2022
@xxupiano
Copy link
Contributor

xxupiano commented Dec 8, 2022

DeepKE standard NER目前支持的是中文数据集

@zxlzr
Copy link
Contributor

zxlzr commented Dec 8, 2022

直接使用英文数据集您需要修改下预测时候的代码,我们会近期支持一下英文

@zxlzr
Copy link
Contributor

zxlzr commented Dec 8, 2022

您可以git pull下代码已更新

@qinglongheu
Copy link
Author

您可以git pull下代码已更新

您是指重新git clone 然后用python setup.py install吗?

@zxlzr
Copy link
Contributor

zxlzr commented Dec 8, 2022

您直接git pull 然后python setup.py install就可以

@qinglongheu
Copy link
Author

您直接git pull 然后python setup.py install就可以

运行run_bert.py 出现Process finished with exit code 139
pycharm没有报错。
然后还有个bug在/DeepKE/src/deepke/name_entity_re/standard/models/InferBert.py中
import json
import os

import torch
import torch.nn.functional as F
from pytorch_transformers import (BertConfig, BertForTokenClassification,
BertTokenizer)
from collections import OrderedDict
from .BiLSTM_CRF import *

import hydra
from hydra import utils
import nltk
from nltk 多了这一行
from nltk import word_tokenize

@xxupiano
Copy link
Contributor

xxupiano commented Dec 8, 2022

是的,在InferBERT的tokenize函数里,nltk用来英文分词,中文直接每个字对应一个label直接list就行,但英文可能几个word对应一个label。ps:nltk.download('punkt')可能要等很久

@qinglongheu
Copy link
Author

是的,在InferBERT的tokenize函数里,nltk用来英文分词,中文直接每个字对应一个label直接list就行,但英文可能几个word对应一个label。ps:nltk.download('punkt')可能要等很久

可是我pycharm运行run_bert.py直接报Process finished with exit code 139,就结束了

@xxupiano
Copy link
Contributor

xxupiano commented Dec 8, 2022

您直接git pull 然后python setup.py install就可以

运行run_bert.py 出现Process finished with exit code 139 pycharm没有报错。 然后还有个bug在/DeepKE/src/deepke/name_entity_re/standard/models/InferBert.py中 import json import os

import torch import torch.nn.functional as F from pytorch_transformers import (BertConfig, BertForTokenClassification, BertTokenizer) from collections import OrderedDict from .BiLSTM_CRF import *

import hydra from hydra import utils import nltk from nltk 多了这一行 from nltk import word_tokenize

不好意思手误,可能多了一行from nltk,直接删掉应该就行

@qinglongheu
Copy link
Author

运行run_bert.py
/home/qinglong/.conda/envs/deepke/bin/python3.8 /data/qinglong/knowledgegraph/DeepKE/example/ner/standard/run_bert.py
12/08/2022 21:19:09 - INFO - deepke.relation_extraction.multimodal.models.clip.file_utils - PyTorch version 1.11.0+cu113 available.
12/08/2022 21:19:09 - INFO - deepke.name_entity_re.multimodal.models.clip.file_utils - PyTorch version 1.11.0+cu113 available.
wandb: Currently logged in as: ql (use wandb login --relogin to force relogin)

Process finished with exit code 139
这个是删除之后运行的结果

@xxupiano
Copy link
Contributor

xxupiano commented Dec 8, 2022

不使用wandb试一下,run_bert.py里三块含wandb的句子注释掉

@qinglongheu
Copy link
Author

似乎您的改动并没有work,仍然报错,debug中的textlist = ['EU\tB-ORG\n', 'rejects\tO\n', 'German\tB-MISC\n', 'call\tO\n', 'to\tO\n', 'boycott\tO\n', 'British\tB-MISC\n', 'lamb\tO\n', '.\tO\n'],然后建议改动一下proprocess.py模块
Traceback (most recent call last):
File "/data/qinglong/knowledgegraph/DeepKE/example/ner/standard/run_bert.py", line 135, in main
train_features = convert_examples_to_features(train_examples, label_list, cfg.max_seq_length, tokenizer)
File "/home/qinglong/.conda/envs/deepke/lib/python3.8/site-packages/deepke-2.1.1-py3.8.egg/deepke/name_entity_re/standard/tools/preprocess.py", line 92, in convert_examples_to_features
label_ids.append(label_map[labels[i]])
KeyError: 'EU\tB-ORG'

@qinglongheu
Copy link
Author

似乎问题出现在deepke\name_entity_re\standard\tools\dataset.py中readfile函数
def readfile(filename):
'''
read file
'''
f = open(filename, encoding='utf-8')
data = []
sentence = []
label= []
for line in f:
if len(line)==0 or line.startswith('-DOCSTART') or line[0]=="\n":
if len(sentence) > 0:
data.append((sentence,label))
sentence = []
label = []
continue
splits = line.split(' ') 这里splits = ['EU\tB-ORG\n'], 似乎split函数没有work
sentence.append(splits[0])

@xxupiano
Copy link
Contributor

xxupiano commented Dec 9, 2022

如果是python setup.py develop方式安装的话您可以自行修改然后再次用此方式安装

@qinglongheu
Copy link
Author

感谢您的及时回复,应该是数据集的问题

@xxupiano
Copy link
Contributor

xxupiano commented Dec 9, 2022

proprocess.py代码已修改,可以更新后使用

@xxupiano xxupiano closed this as completed Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants