Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple references per candidate #30

Closed
feralvam opened this issue Jan 20, 2020 · 5 comments
Closed

Support for multiple references per candidate #30

feralvam opened this issue Jan 20, 2020 · 5 comments

Comments

@feralvam
Copy link

Hi,
Thank you for the good work. I was checking the code and, as far I could understand, there is an underlying assumption that there is only one single reference per candidate. Am I correct or am I missing something?

I'm working on a task (sentence simplification) for which we have valid and test sets with multiple references. I was thinking of trying out BERTScore there and make the appropriate modifications for the multi-reference scenario, but I wanted to check first in case there was something already implemented on that line of work that I hadn't noticed.

Thanks.

@Tiiiger
Copy link
Owner

Tiiiger commented Jan 20, 2020

hi @feralvam, internally bertscore caches the computed representations of encountered sentences. This means that if you have duplicated sentences, our package will reuse the cached vectors to compute bertscore. For multi-reference setting, you can simply repeat the candidate sentences for a couple times. This should be efficient.

That being said, I look forward to having a convenient interface to handle multiple references. Feel free to open a PR if you want to contribute.

@Tiiiger
Copy link
Owner

Tiiiger commented Jan 30, 2020

This issue will be addressed in the next version of this repo, which is likely going to happen by the end of February.

@Tiiiger Tiiiger closed this as completed Jan 30, 2020
@TheShadow29
Copy link

@Tiiiger thanks for the awesome repository.

Could you clarify the usage of multiple references? From the demo notebook (cell 15 https://github.com/Tiiiger/bert_score/blob/master/example/Demo.ipynb), I believe the usage should be:

single_candidates = [cand_for_sent1, cand_for_sent2 ....]
multi_references = [[ref1_for_sent1, ref2_for_sent1 ...,], [ref1_for_sent2, ref2_for_sent2, ...], ...]

I tried a naive example with one reference passed as multi reference.

single_cands = ['a woman is seen sitting in a chair holding a accordion and speaking to the camera',
 'the woman then begins playing the accordion while looking back to the camera']
multi_refs = [['a woman is seen speaking to the camera while holding an accordion and and moving her hands around'],
 ['she demonstrates how to play the instrument while still speaking to the camera and moving all around']]
P_mul, R_mul, F_mul = score(single_cands, multi_refs, lang="en", rescale_with_baseline=True)

However, this gives a RunTime error:

lib/python3.7/site-packages/bert_score/scorer.py in score(self, cands, refs, verbose, batch_size, return_hash)
    191             max_preds = []
    192             for start, end in ref_group_boundaries:
--> 193                 max_preds.append(all_preds[start:end].max(dim=0)[0])
    194             all_preds = torch.stack(max_preds, dim=0)
    195 

RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

It seems both start, end have the same value. Any guesses why this is happening?

Thank you for your patience.

@felixgwu
Copy link
Collaborator

Hi @TheShadow29,
Thank you for pointing this out. Your usage is correct. The bug has been fixed in the newest commit.

@TheShadow29
Copy link

Thanks a lot @felixgwu for the prompt response.
It works correctly now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants