Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gpt2pre 2] GPT2 Tokenizer Class #7806

Merged
merged 2 commits into from
Jul 11, 2023

Conversation

pforderique
Copy link
Contributor

Implements the GPT2Tokenizer that extends BytePairEncoding.

This is a standalone PR with no dependencies or implementation followups.

GPT2Tokenizer will be used by GPT2Preprocessor in a future PR.

NOTE:
This class does not implement presets() as discussed earlier this week.

Copy link
Member

@mattsoulanille mattsoulanille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@Linchenn Linchenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pforderique pforderique merged commit 5c3f76e into tensorflow:master Jul 11, 2023
2 checks passed
@pforderique pforderique deleted the gpt2-tokenizer branch July 11, 2023 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants