Conll perl script refusing to score because of "too many repeated mentions (>10) in the response" #37

ritwikmishra · 2022-11-15T13:40:16Z

I ran the preparation scripts successfully.

Downloaded the roberta checkpoint from dropbox link, and placed it in data folder.

Ran the command: python calculate_conll.py roberta test 20

I noticed some errors due to subprocess because I was using python3.6 instead of python3.7.

Error was: unexpected keyword argument 'capture_output'

Fixed the issue with this

But then I got an error: 'NoneType' object has no attribute 'group' origin of error --> line 15

I ran the perl script directly in bash: perl reference-coreference-scorers/scorer.pl all data/conll_logs/roberta_test_e20.gold.conll data/conll_logs/roberta_test_e20.pred.conll none

MUC came out to be 86 (f1) but while calculating b3, I got this error: Found too many repeated mentions (> 10) in the response, so refusing to score. Please fix the output

I think it is because of this error only that the line 15 above was throwing that error (because output was empty).

How to proceed forward now? How to evaluate the results?

The text was updated successfully, but these errors were encountered:

ritwikmishra · 2022-11-15T13:51:00Z

UPDATE:

I ran the bash command for each metric

perl reference-coreference-scorers/scorer.pl <muc/bcub/ceafe> <keys_file> <response_file> none

I got 86, 79, and 76 f1 for muc, bcub, and ceafe respectively. Average = 80.3 which is ~ 81 claimed in the paper

But still I am not able to get why calculate_conll.py gives error...

vdobrovolskii · 2022-11-15T15:08:58Z

Please, see this solution
#4

ritwikmishra · 2022-11-16T06:13:50Z

@vdobrovolskii I replaced the loop mechanism as suggested here.

And I commented out the error throwing condition in the perl script as suggested here

The perl script runs fine through bash:

perl reference-coreference-scorers/scorer.pl all data/conll_logs/roberta_test_e20.gold.conll data/conll_logs/roberta_test_e20.pred.conll none

But the python file still shows error:

$ python calculate_conll.py roberta test 20
Traceback (most recent call last):
  File "calculate_conll.py", line 40, in <module>
    extract_f1(subprocess.run(part_a + [metric] + part_b, **kwargs)))
  File "calculate_conll.py", line 15, in extract_f1
    return float(re.search(r"F1:\s*([0-9.]+)%", prev_line).group(1))
AttributeError: 'NoneType' object has no attribute 'group'

Is there any other way to fix calculate_conll.py ?

vdobrovolskii · 2022-11-16T12:21:17Z

I believe the output is a bit different than expected for at least one of the perl scripts. Can you send me the outputs (just the last two lines) for the perl script with "muc", "ceafe" and "bcub" as metrics?

vdobrovolskii · 2022-11-16T14:36:09Z

I mean, can you send me the outputs of the perl script?

The calculate_conll.py reads the stdout of the perl script and searches for metrics there.

ritwikmishra · 2022-11-16T15:17:40Z

Here is the output of perl reference-coreference-scorers/scorer.pl all data/conll_logs/roberta_test_e20.gold.conll data/conll_logs/roberta_test_e20.pred.conll none

Output

version: 8.01 /media/data_dump/Ritwik/git/wl-coref/reference-coreference-scorers/lib/CorScorer.pm                                                                                                                        
                                                                                                                                                                                                                         
METRIC muc:                                                                                                                                                                                                              
Repeated mention in the response: 231, 237 2121                                                                                                                                                                          
Repeated mention in the response: 119, 122 3030                                                                                                                                                                          
Repeated mention in the response: 158, 160 44                                                                                                                                                                            
Repeated mention in the response: 57, 62 1515                                                                                                                                                                            
Repeated mention in the response: 154, 158 55                                                                                                                                                                            
Repeated mention in the response: 76, 78 1313                                                                                                                                                                            
                                                                                                                                                                                                                         
====== TOTALS =======                                                                                                                                                                                                    
Identification of Mentions: Recall: (17786 / 19764) 89.99%      Precision: (17786 / 20350) 87.4%        F1: 88.67%                                                                                                       
--------------------------------------------------------------------------                                                                                                                                               
Coreference: Recall: (13376 / 15232) 87.81%     Precision: (13376 / 15760) 84.87%       F1: 86.31%                                                                                                                       
--------------------------------------------------------------------------                                                                                                                                               
                                                                                                                                                                                                                         
METRIC bcub:                                                                                                                                                                                                             
Repeated mention in the response: 154, 158 55                                                                                                                                                                            
Repeated mention in the response: 57, 62 1515                                                                                                                                                                            
Repeated mention in the response: 119, 122 3030                                                                                                                                                                          
Repeated mention in the response: 158, 160 44                                                                                                                                                                            
Repeated mention in the response: 76, 78 1313                                                                                                                                                                            
Repeated mention in the response: 231, 237 2121                                                                                                                                                                          
                                                                                                                                                                                                                         
====== TOTALS =======                                                                                                                                                                                                    
Identification of Mentions: Recall: (17786 / 19764) 89.99%      Precision: (17786 / 20350) 87.4%        F1: 88.67%                                                                                                       
--------------------------------------------------------------------------                                                                                                                                               
Coreference: Recall: (16320.8804761468 / 19764) 82.57%  Precision: (15765.8216227679 / 20351) 77.46%    F1: 79.94%
--------------------------------------------------------------------------

METRIC ceafm:
Repeated mention in the response: 76, 78 1313
Repeated mention in the response: 57, 62 1515
Repeated mention in the response: 154, 158 55
Repeated mention in the response: 158, 160 44
Repeated mention in the response: 119, 122 3030
Repeated mention in the response: 231, 237 2121

====== TOTALS =======
Identification of Mentions: Recall: (17786 / 19764) 89.99%      Precision: (17786 / 20350) 87.4%        F1: 88.67%
--------------------------------------------------------------------------
Coreference: Recall: (16541 / 19764) 83.69%     Precision: (16541 / 20351) 81.27%       F1: 82.46%
--------------------------------------------------------------------------

METRIC ceafe:
Repeated mention in the response: 158, 160 44
Repeated mention in the response: 119, 122 3030
Repeated mention in the response: 57, 62 1515
Repeated mention in the response: 154, 158 55
Repeated mention in the response: 76, 78 1313
Repeated mention in the response: 231, 237 2121

====== TOTALS =======
Identification of Mentions: Recall: (17786 / 19764) 89.99%      Precision: (17786 / 20350) 87.4%        F1: 88.67%
--------------------------------------------------------------------------
Coreference: Recall: (3495.57012386207 / 4532) 77.13%   Precision: (3495.57012386207 / 4591) 76.13%     F1: 76.63%
--------------------------------------------------------------------------

METRIC blanc:
Repeated mention in the response: 76, 78 1313
Repeated mention in the response: 154, 158 55
Repeated mention in the response: 57, 62 1515
Repeated mention in the response: 158, 160 44
Repeated mention in the response: 119, 122 3030
Repeated mention in the response: 231, 237 2121

====== TOTALS =======
Identification of Mentions: Recall: (17786 / 19764) 89.99%      Precision: (17786 / 20350) 87.4%        F1: 88.67%
--------------------------------------------------------------------------

Coreference:
Coreference links: Recall: (98009 / 111931) 87.56%      Precision: (98009 / 121567) 80.62%      F1: 83.94%
--------------------------------------------------------------------------
Non-coreference links: Recall: (703839 / 883032) 79.7%  Precision: (703839 / 925055) 76.08%     F1: 77.85%
--------------------------------------------------------------------------
BLANC: Recall: (0.836345287908459 / 1) 83.63%   Precision: (0.783537821987766 / 1) 78.35%       F1: 80.9%
--------------------------------------------------------------------------

vdobrovolskii · 2022-11-16T20:00:54Z

Hmm. Each line that is supposed to be fed to the script is matched correctly:

Then it might be the case that when calling each metric separately the output is different...

I could investigate it further. Could you kindly modify the extract_f1 function as follows as run the script again? Then send me the output.

def extract_f1(proc: subprocess.CompletedProcess) -> float:
    prev_line = ""
    curr_line = ""
    for line in str(proc.stdout).splitlines():
        prev_line = curr_line
        curr_line = line
    print(repr(prev_line))
    return float(re.search(r"F1:\s*([0-9.]+)%", prev_line).group(1))

ritwikmishra · 2022-11-17T08:41:59Z

The issue was in the way you were converting bytes to string. As stated here; simply typecasting bytes to string using str() will give you unintended results. Bytes should be decoded in order to get appropriate strings.

Changing the for loop for line in str(proc.stdout).splitlines(): ---> for line in (proc.stdout).decode('utf-8').splitlines(): worked!

Output:

muc 86.31
ceafe 76.63
bcub 79.94
avg 80.96

This comment was marked as off-topic.

Sign in to view

ritwikmishra closed this as completed Nov 17, 2022

ritwikmishra added a commit to ritwikmishra/wl-coref that referenced this issue Nov 27, 2022

Updated for loop condition based on vdobrovolskii#37

ab34ab2

ritwikmishra mentioned this issue Nov 27, 2022

Updated for loop condition in calculate_conll.py file based on #37 #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conll perl script refusing to score because of "too many repeated mentions (>10) in the response" #37

Conll perl script refusing to score because of "too many repeated mentions (>10) in the response" #37

ritwikmishra commented Nov 15, 2022

ritwikmishra commented Nov 15, 2022

vdobrovolskii commented Nov 15, 2022

ritwikmishra commented Nov 16, 2022 •

edited

vdobrovolskii commented Nov 16, 2022

This comment was marked as off-topic.

vdobrovolskii commented Nov 16, 2022

ritwikmishra commented Nov 16, 2022

vdobrovolskii commented Nov 16, 2022

ritwikmishra commented Nov 17, 2022

Conll perl script refusing to score because of "too many repeated mentions (>10) in the response" #37

Conll perl script refusing to score because of "too many repeated mentions (>10) in the response" #37

Comments

ritwikmishra commented Nov 15, 2022

ritwikmishra commented Nov 15, 2022

vdobrovolskii commented Nov 15, 2022

ritwikmishra commented Nov 16, 2022 • edited

vdobrovolskii commented Nov 16, 2022

This comment was marked as off-topic.

vdobrovolskii commented Nov 16, 2022

ritwikmishra commented Nov 16, 2022

vdobrovolskii commented Nov 16, 2022

ritwikmishra commented Nov 17, 2022

ritwikmishra commented Nov 16, 2022 •

edited