Support for multiple references per candidate #30

feralvam · 2020-01-20T16:06:02Z

Hi,
Thank you for the good work. I was checking the code and, as far I could understand, there is an underlying assumption that there is only one single reference per candidate. Am I correct or am I missing something?

I'm working on a task (sentence simplification) for which we have valid and test sets with multiple references. I was thinking of trying out BERTScore there and make the appropriate modifications for the multi-reference scenario, but I wanted to check first in case there was something already implemented on that line of work that I hadn't noticed.

Thanks.

Tiiiger · 2020-01-20T16:59:47Z

hi @feralvam, internally bertscore caches the computed representations of encountered sentences. This means that if you have duplicated sentences, our package will reuse the cached vectors to compute bertscore. For multi-reference setting, you can simply repeat the candidate sentences for a couple times. This should be efficient.

That being said, I look forward to having a convenient interface to handle multiple references. Feel free to open a PR if you want to contribute.

Tiiiger · 2020-01-30T17:51:11Z

This issue will be addressed in the next version of this repo, which is likely going to happen by the end of February.

TheShadow29 · 2020-04-16T02:54:03Z

@Tiiiger thanks for the awesome repository.

Could you clarify the usage of multiple references? From the demo notebook (cell 15 https://github.com/Tiiiger/bert_score/blob/master/example/Demo.ipynb), I believe the usage should be:

single_candidates = [cand_for_sent1, cand_for_sent2 ....]
multi_references = [[ref1_for_sent1, ref2_for_sent1 ...,], [ref1_for_sent2, ref2_for_sent2, ...], ...]

I tried a naive example with one reference passed as multi reference.

single_cands = ['a woman is seen sitting in a chair holding a accordion and speaking to the camera',
 'the woman then begins playing the accordion while looking back to the camera']
multi_refs = [['a woman is seen speaking to the camera while holding an accordion and and moving her hands around'],
 ['she demonstrates how to play the instrument while still speaking to the camera and moving all around']]
P_mul, R_mul, F_mul = score(single_cands, multi_refs, lang="en", rescale_with_baseline=True)

However, this gives a RunTime error:

lib/python3.7/site-packages/bert_score/scorer.py in score(self, cands, refs, verbose, batch_size, return_hash)
    191             max_preds = []
    192             for start, end in ref_group_boundaries:
--> 193                 max_preds.append(all_preds[start:end].max(dim=0)[0])
    194             all_preds = torch.stack(max_preds, dim=0)
    195 

RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

It seems both start, end have the same value. Any guesses why this is happening?

Thank you for your patience.

felixgwu · 2020-04-16T03:20:32Z

Hi @TheShadow29,
Thank you for pointing this out. Your usage is correct. The bug has been fixed in the newest commit.

TheShadow29 · 2020-04-16T04:50:41Z

Thanks a lot @felixgwu for the prompt response.
It works correctly now.

Tiiiger closed this as completed Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multiple references per candidate #30

Support for multiple references per candidate #30

feralvam commented Jan 20, 2020

Tiiiger commented Jan 20, 2020

Tiiiger commented Jan 30, 2020

TheShadow29 commented Apr 16, 2020

felixgwu commented Apr 16, 2020

TheShadow29 commented Apr 16, 2020

Support for multiple references per candidate #30

Support for multiple references per candidate #30

Comments

feralvam commented Jan 20, 2020

Tiiiger commented Jan 20, 2020

Tiiiger commented Jan 30, 2020

TheShadow29 commented Apr 16, 2020

felixgwu commented Apr 16, 2020

TheShadow29 commented Apr 16, 2020