Skip to content

Truncating the prefix of a sequence rather than the suffix #12909

Open
@yuvalkirstain

Description

@yuvalkirstain

🚀 Feature request

Hi, tokenizers get truncation as an argument. When set to True the tokenizer will truncate the suffix of a sequence so it does not surpass the specified max_length. I'd like to have a functionality that truncates the prefix of the sequence, so the model will see the suffix of the sequence.

Motivation

In many applications (e.g. Dialog, and QA) the most important part of the sequence is the suffix (e.g. the question after the context, or the last response of the dialog).

Your contribution

Perhaps I'll submit a PR, but it might take me some time as I'm close to some deadlines of mine :(

Metadata

Metadata

Assignees

No one assigned

    Labels

    WIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions