运行/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard中的run_bert.py对英文数据集conll2003进行实体识别出现错误 #187

qinglongheu · 2022-12-08T03:22:29Z

报错信息如下
Traceback (most recent call last):
File "/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard/run_bert.py", line 135, in main
train_features = convert_examples_to_features(train_examples, label_list, cfg.max_seq_length, tokenizer)
File "/home/qinglong/.conda/envs/deepke/lib/python3.8/site-packages/deepke/name_entity_re/standard/tools/preprocess.py", line 92, in convert_examples_to_features
label_ids.append(label_map[labels[i]])
KeyError: 'EU\tB-ORG'
经debug发现examples中的每一个样本数据中的
text_a = 'EU B-ORG
rejects O
German B-MISC
call O
to O
boycott O
British B-MISC
lamb O
. O
'
labellist = ['EU\tB-ORG', 'rejects\tO', 'German\tB-MISC', 'call\tO', 'to\tO', 'boycott\tO', 'British\tB-MISC', 'lamb\tO', '.\tO']
textlist = ['EU\tB-ORG\n', 'rejects\tO\n', 'German\tB-MISC\n', 'call\tO\n', 'to\tO\n', 'boycott\tO\n', 'British\tB-MISC\n', 'lamb\tO\n', '.\tO\n']
而中文数据集中的
examples 中的
text_a ='海钓比赛地点在厦门与金门之间的海域。'

xxupiano · 2022-12-08T03:26:23Z

DeepKE standard NER目前支持的是中文数据集

zxlzr · 2022-12-08T03:31:01Z

直接使用英文数据集您需要修改下预测时候的代码，我们会近期支持一下英文

zxlzr · 2022-12-08T04:07:26Z

您可以git pull下代码已更新

qinglongheu · 2022-12-08T08:00:11Z

您可以git pull下代码已更新

您是指重新git clone 然后用python setup.py install吗？

zxlzr · 2022-12-08T08:06:05Z

您直接git pull 然后python setup.py install就可以

qinglongheu · 2022-12-08T13:08:41Z

您直接git pull 然后python setup.py install就可以

运行run_bert.py 出现Process finished with exit code 139
pycharm没有报错。
然后还有个bug在/DeepKE/src/deepke/name_entity_re/standard/models/InferBert.py中
import json
import os

import torch
import torch.nn.functional as F
from pytorch_transformers import (BertConfig, BertForTokenClassification,
BertTokenizer)
from collections import OrderedDict
from .BiLSTM_CRF import *

import hydra
from hydra import utils
import nltk
from nltk 多了这一行
from nltk import word_tokenize

xxupiano · 2022-12-08T13:13:26Z

是的，在InferBERT的tokenize函数里，nltk用来英文分词，中文直接每个字对应一个label直接list就行，但英文可能几个word对应一个label。ps：nltk.download('punkt')可能要等很久

qinglongheu · 2022-12-08T13:15:17Z

是的，在InferBERT的tokenize函数里，nltk用来英文分词，中文直接每个字对应一个label直接list就行，但英文可能几个word对应一个label。ps：nltk.download('punkt')可能要等很久

可是我pycharm运行run_bert.py直接报Process finished with exit code 139，就结束了

xxupiano · 2022-12-08T13:18:03Z

您直接git pull 然后python setup.py install就可以

运行run_bert.py 出现Process finished with exit code 139 pycharm没有报错。然后还有个bug在/DeepKE/src/deepke/name_entity_re/standard/models/InferBert.py中 import json import os

import torch import torch.nn.functional as F from pytorch_transformers import (BertConfig, BertForTokenClassification, BertTokenizer) from collections import OrderedDict from .BiLSTM_CRF import *

import hydra from hydra import utils import nltk from nltk 多了这一行 from nltk import word_tokenize

不好意思手误，可能多了一行from nltk，直接删掉应该就行

qinglongheu · 2022-12-08T13:20:57Z

运行run_bert.py
/home/qinglong/.conda/envs/deepke/bin/python3.8 /data/qinglong/knowledgegraph/DeepKE/example/ner/standard/run_bert.py
12/08/2022 21:19:09 - INFO - deepke.relation_extraction.multimodal.models.clip.file_utils - PyTorch version 1.11.0+cu113 available.
12/08/2022 21:19:09 - INFO - deepke.name_entity_re.multimodal.models.clip.file_utils - PyTorch version 1.11.0+cu113 available.
wandb: Currently logged in as: ql (use wandb login --relogin to force relogin)

Process finished with exit code 139
这个是删除之后运行的结果

xxupiano · 2022-12-08T13:49:38Z

不使用wandb试一下，run_bert.py里三块含wandb的句子注释掉

qinglongheu · 2022-12-09T01:11:36Z

似乎您的改动并没有work，仍然报错，debug中的textlist = ['EU\tB-ORG\n', 'rejects\tO\n', 'German\tB-MISC\n', 'call\tO\n', 'to\tO\n', 'boycott\tO\n', 'British\tB-MISC\n', 'lamb\tO\n', '.\tO\n']，然后建议改动一下proprocess.py模块
Traceback (most recent call last):
File "/data/qinglong/knowledgegraph/DeepKE/example/ner/standard/run_bert.py", line 135, in main
train_features = convert_examples_to_features(train_examples, label_list, cfg.max_seq_length, tokenizer)
File "/home/qinglong/.conda/envs/deepke/lib/python3.8/site-packages/deepke-2.1.1-py3.8.egg/deepke/name_entity_re/standard/tools/preprocess.py", line 92, in convert_examples_to_features
label_ids.append(label_map[labels[i]])
KeyError: 'EU\tB-ORG'

qinglongheu · 2022-12-09T01:41:26Z

似乎问题出现在deepke\name_entity_re\standard\tools\dataset.py中readfile函数
def readfile(filename):
'''
read file
'''
f = open(filename, encoding='utf-8')
data = []
sentence = []
label= []
for line in f:
if len(line)==0 or line.startswith('-DOCSTART') or line[0]=="\n":
if len(sentence) > 0:
data.append((sentence,label))
sentence = []
label = []
continue
splits = line.split(' ') 这里splits = ['EU\tB-ORG\n'], 似乎split函数没有work
sentence.append(splits[0])

xxupiano · 2022-12-09T01:53:03Z

如果是python setup.py develop方式安装的话您可以自行修改然后再次用此方式安装

qinglongheu · 2022-12-09T02:40:38Z

感谢您的及时回复，应该是数据集的问题

xxupiano · 2022-12-09T02:54:59Z

proprocess.py代码已修改，可以更新后使用

qinglongheu added the bug Something isn't working label Dec 8, 2022

xxupiano closed this as completed Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard中的run_bert.py对英文数据集conll2003进行实体识别出现错误 #187

运行/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard中的run_bert.py对英文数据集conll2003进行实体识别出现错误 #187

qinglongheu commented Dec 8, 2022

xxupiano commented Dec 8, 2022

zxlzr commented Dec 8, 2022

zxlzr commented Dec 8, 2022

qinglongheu commented Dec 8, 2022

zxlzr commented Dec 8, 2022

qinglongheu commented Dec 8, 2022

xxupiano commented Dec 8, 2022

qinglongheu commented Dec 8, 2022

xxupiano commented Dec 8, 2022

qinglongheu commented Dec 8, 2022

xxupiano commented Dec 8, 2022 •

edited

qinglongheu commented Dec 9, 2022

qinglongheu commented Dec 9, 2022

xxupiano commented Dec 9, 2022

qinglongheu commented Dec 9, 2022

xxupiano commented Dec 9, 2022

运行/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard中的run_bert.py对英文数据集conll2003进行实体识别出现错误 #187

运行/data/qinglong/knowledgeGraph/DeepKE/example/ner/standard中的run_bert.py对英文数据集conll2003进行实体识别出现错误 #187

Comments

qinglongheu commented Dec 8, 2022

xxupiano commented Dec 8, 2022

zxlzr commented Dec 8, 2022

zxlzr commented Dec 8, 2022

qinglongheu commented Dec 8, 2022

zxlzr commented Dec 8, 2022

qinglongheu commented Dec 8, 2022

xxupiano commented Dec 8, 2022

qinglongheu commented Dec 8, 2022

xxupiano commented Dec 8, 2022

qinglongheu commented Dec 8, 2022

xxupiano commented Dec 8, 2022 • edited

qinglongheu commented Dec 9, 2022

qinglongheu commented Dec 9, 2022

xxupiano commented Dec 9, 2022

qinglongheu commented Dec 9, 2022

xxupiano commented Dec 9, 2022

xxupiano commented Dec 8, 2022 •

edited