Open
Description
🚀 Feature request
Hi, tokenizers get truncation
as an argument. When set to True
the tokenizer will truncate the suffix of a sequence so it does not surpass the specified max_length
. I'd like to have a functionality that truncates the prefix of the sequence, so the model will see the suffix of the sequence.
Motivation
In many applications (e.g. Dialog, and QA) the most important part of the sequence is the suffix (e.g. the question after the context, or the last response of the dialog).
Your contribution
Perhaps I'll submit a PR, but it might take me some time as I'm close to some deadlines of mine :(