How to handle Index Error Issue #17

ShauryaUppal-1Mg · 2019-10-10T17:06:26Z

Should I trim the string to 512 or I can increase the maximum sequence length or something.

ShauryaUppal-1Mg · 2019-10-10T17:07:46Z

felixgwu · 2019-10-10T20:42:21Z

Thank you for using our BERTScore.
The reason for this error is that both BERT and RoBERTa are trained with sentences with at most 512 tokens. Unlike the original Transformers, BERT and RoBERTa use learned positional embedding whose size is set during training.
BERTScore is commonly computed between a pair of sentences.
We would suggest that you split the documents into sentences before feeding them to BERT.
If there happen to be some sentences with more than 512 tokens, you can:

train a BERT model using longer sentences
cut sentences into multiple chunks and design a better way to aggregate them. This is one of the future directions to extend BERTScore, but we haven't studied it yet.
use XLNets which support longer inputs. However, they perform worse in our experiments.

Best,
Felix

ShauryaUppal-1Mg · 2019-10-11T06:14:49Z

Will if work will it for sentence having length > 512, I remove stopwords and common words?

ShauryaUppal-1Mg · 2019-10-11T06:17:38Z

But BERT as a service allows user to make max_sequence= ignore
Can't we do something with that?

https://github.com/hanxiao/bert-as-service

felixgwu · 2019-10-11T13:54:22Z

As far as I know, they just trim down the sequence. Please see:
https://github.com/hanxiao/bert-as-service/blob/85690491d66fd1ca0d03924f8c9ead3d1cad90b1/server/bert_serving/server/__init__.py#L414-L422

We would like to follow huggingface's transformers and just raise an error instead.
We encourage users to deal with their own special cases.

Thank you for raising this issue. We will update the README to remind other users.

felixgwu closed this as completed Oct 11, 2019

felixgwu mentioned this issue Oct 25, 2019

tokens that exceeds max_len raise error #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle Index Error Issue #17

How to handle Index Error Issue #17

ShauryaUppal-1Mg commented Oct 10, 2019

ShauryaUppal-1Mg commented Oct 10, 2019

felixgwu commented Oct 10, 2019

ShauryaUppal-1Mg commented Oct 11, 2019

ShauryaUppal-1Mg commented Oct 11, 2019

felixgwu commented Oct 11, 2019 •

edited

Loading

How to handle Index Error Issue #17

How to handle Index Error Issue #17

Comments

ShauryaUppal-1Mg commented Oct 10, 2019

ShauryaUppal-1Mg commented Oct 10, 2019

felixgwu commented Oct 10, 2019

ShauryaUppal-1Mg commented Oct 11, 2019

ShauryaUppal-1Mg commented Oct 11, 2019

felixgwu commented Oct 11, 2019 • edited Loading

felixgwu commented Oct 11, 2019 •

edited

Loading