Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ner评估是用的那个值 #7

Closed
ak47-1234 opened this issue Jun 23, 2019 · 11 comments
Closed

ner评估是用的那个值 #7

ak47-1234 opened this issue Jun 23, 2019 · 11 comments

Comments

@ak47-1234
Copy link

seqeval的评估值有micro avg和 macro avg,报告里写的是那个值?
precision recall f1-score support

   MISC       0.00      0.00      0.00         1
    PER       1.00      1.00      1.00         1

micro avg 0.50 0.50 0.50 2
macro avg 0.50 0.50 0.50 2

@ymcui
Copy link
Owner

ymcui commented Jun 23, 2019

micro 微平均,也是seqeval中默认的配置。
https://github.com/chakki-works/seqeval/blob/master/seqeval/metrics/sequence_labeling.py#L116

@ymcui
Copy link
Owner

ymcui commented Jun 23, 2019

如有其它问题,欢迎随时reopen。

@ymcui ymcui closed this as completed Jun 23, 2019
@ak47-1234
Copy link
Author

1.您好,我复现ner的结果,普遍要比提供的结果低一个百分点,ner的fintuning程序是有加什么吗?
2.另外结果评估是取的每次训练的最后结果,还是每次训练的epoch中的最大值
3.另外在daily people数据有一些错误标记,入I-PER在B-PER,这样的数据是怎样处理的

@ymcui ymcui reopened this Jun 25, 2019
@ymcui
Copy link
Owner

ymcui commented Jun 25, 2019

你好,

  1. 代码上可以参考:https://github.com/ProHiryu/bert-chinese-ner/blob/master/BERT_NER.py
    如果你用参数设置和我们汇报的一致,那至少应该达到标称的平均值
  2. 在报告和README中提到,我们同时汇报最大值和平均值(平均值显示在括号内)。
  3. 由于运行过程中并没有异常,这一部分我们没有做特殊处理。

@ak47-1234
Copy link
Author

warmup_proportion是多大?

@ymcui
Copy link
Owner

ymcui commented Jun 25, 2019

文中没有提到的都使用默认值,warmup默认是0.1

@ymcui
Copy link
Owner

ymcui commented Jun 25, 2019

reopen if necessary

@ymcui ymcui closed this as completed Jun 25, 2019
@ak47-1234
Copy link
Author

你好一下是我复现的people daily结果(参数保持一致),
BERT-wwm 95.4 (95.1) 95.3 (95.0)95.3 (95.1)
BERT-wwm 复现 94.7 (94.4) 94.6 (95.2) 95.2 (94.8)
f1的结果基本一致,但percision和recall相差较大,普遍较低,是否有转大小写,或者有什么需要注意的地方

@ymcui
Copy link
Owner

ymcui commented Jun 26, 2019

  1. batch大小用的是64吗?另外,我们使用的是TensorFlow 1.14版本。
  2. 大小写转换与BERT默认一致,原版BERT-base Chinese是一个uncased模型,所以do_lower_case=True
  3. 你复现的结果中,召回率部分显示最大值94.6,平均值95.2,是不是写错了。
  4. 根据实验记录,P/R/F最低值分别为:94.84、94.68、94.76,供参考。

@ymcui ymcui reopened this Jun 26, 2019
@ak47-1234
Copy link
Author

  1. 召回最大值是95.6
    跑出的p普遍要低一些

@ymcui
Copy link
Owner

ymcui commented Jun 27, 2019

附一组结果供参考。

               precision    recall  f1-score   support
        LOC      97.16     95.17     96.15      2875
        PER      96.76     96.47     96.62      1984
        ORG      89.78     93.69     91.69      1331

avg / total      95.45     95.27     95.34      6190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants