We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi!
I noticed that "attention_mask" was ignored when preprocessing the training data, as shown in the code of the file src/tevatron/data.py.
src/tevatron/data.py
class TrainDataset(Dataset): def create_one_example(self, text_encoding: List[int], is_query=False): item = self.tok.prepare_for_model( text_encoding, truncation='only_first', max_length=self.data_args.q_max_len if is_query else self.data_args.p_max_len, padding=False, return_attention_mask=False, return_token_type_ids=False, ) return item
And I found that some other sources of work on dense retrieval didn't do this. So I want to ask what is the reason for designing the code like this.
Thanks for your answer:)
The text was updated successfully, but these errors were encountered:
I just found that the QPCollator will add "attention_mask" to it.
QPCollator
Sorry, something went wrong.
No branches or pull requests
Hi!
I noticed that "attention_mask" was ignored when preprocessing the training data, as shown in the code of the file
src/tevatron/data.py
.And I found that some other sources of work on dense retrieval didn't do this. So I want to ask what is the reason for designing the code like this.
Thanks for your answer:)
The text was updated successfully, but these errors were encountered: