New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
运行/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard中的run_bert.py对英文数据集conll2003进行实体识别出现错误 #187
Comments
DeepKE standard NER目前支持的是中文数据集 |
直接使用英文数据集您需要修改下预测时候的代码,我们会近期支持一下英文 |
您可以git pull下代码已更新 |
您是指重新git clone 然后用python setup.py install吗? |
您直接git pull 然后python setup.py install就可以 |
运行run_bert.py 出现Process finished with exit code 139 import torch import hydra |
是的,在InferBERT的tokenize函数里,nltk用来英文分词,中文直接每个字对应一个label直接list就行,但英文可能几个word对应一个label。ps:nltk.download('punkt')可能要等很久 |
可是我pycharm运行run_bert.py直接报Process finished with exit code 139,就结束了 |
不好意思手误,可能多了一行from nltk,直接删掉应该就行 |
运行run_bert.py Process finished with exit code 139 |
不使用wandb试一下,run_bert.py里三块含wandb的句子注释掉 |
似乎您的改动并没有work,仍然报错,debug中的textlist = ['EU\tB-ORG\n', 'rejects\tO\n', 'German\tB-MISC\n', 'call\tO\n', 'to\tO\n', 'boycott\tO\n', 'British\tB-MISC\n', 'lamb\tO\n', '.\tO\n'],然后建议改动一下proprocess.py模块 |
似乎问题出现在deepke\name_entity_re\standard\tools\dataset.py中readfile函数 |
如果是python setup.py develop方式安装的话您可以自行修改然后再次用此方式安装 |
感谢您的及时回复,应该是数据集的问题 |
proprocess.py代码已修改,可以更新后使用 |
报错信息如下
Traceback (most recent call last):
File "/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard/run_bert.py", line 135, in main
train_features = convert_examples_to_features(train_examples, label_list, cfg.max_seq_length, tokenizer)
File "/home/qinglong/.conda/envs/deepke/lib/python3.8/site-packages/deepke/name_entity_re/standard/tools/preprocess.py", line 92, in convert_examples_to_features
label_ids.append(label_map[labels[i]])
KeyError: 'EU\tB-ORG'
经debug发现examples中的每一个样本数据中的
text_a = 'EU B-ORG
rejects O
German B-MISC
call O
to O
boycott O
British B-MISC
lamb O
. O
'
labellist = ['EU\tB-ORG', 'rejects\tO', 'German\tB-MISC', 'call\tO', 'to\tO', 'boycott\tO', 'British\tB-MISC', 'lamb\tO', '.\tO']
textlist = ['EU\tB-ORG\n', 'rejects\tO\n', 'German\tB-MISC\n', 'call\tO\n', 'to\tO\n', 'boycott\tO\n', 'British\tB-MISC\n', 'lamb\tO\n', '.\tO\n']
而中文数据集中的
examples 中的
text_a ='海 钓 比 赛 地 点 在 厦 门 与 金 门 之 间 的 海 域 。'
The text was updated successfully, but these errors were encountered: