Skip to content

Commit ba8c4d0

Browse files
thomwolfsgugger
andauthored
[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659)
* splitting fast and slow tokenizers [WIP] * [WIP] splitting sentencepiece and tokenizers dependencies * update dummy objects * add name_or_path to models and tokenizers * prefix added to file names * prefix * styling + quality * spliting all the tokenizer files - sorting sentencepiece based ones * update tokenizer version up to 0.9.0 * remove hard dependency on sentencepiece 🎉 * and removed hard dependency on tokenizers 🎉 * update conversion script * update missing models * fixing tests * move test_tokenization_fast to main tokenization tests - fix bugs * bump up tokenizers * fix bert_generation * update ad fix several tokenizers * keep sentencepiece in deps for now * fix funnel and deberta tests * fix fsmt * fix marian tests * fix layoutlm * fix squeezebert and gpt2 * fix T5 tokenization * fix xlnet tests * style * fix mbart * bump up tokenizers to 0.9.2 * fix model tests * fix tf models * fix seq2seq examples * fix tests without sentencepiece * fix slow => fast conversion without sentencepiece * update auto and bert generation tests * fix mbart tests * fix auto and common test without tokenizers * fix tests without tokenizers * clean up tests lighten up when tokenizers + sentencepiece are both off * style quality and tests fixing * add sentencepiece to doc/examples reqs * leave sentencepiece on for now * style quality split hebert and fix pegasus * WIP Herbert fast * add sample_text_no_unicode and fix hebert tokenization * skip FSMT example test for now * fix style * fix fsmt in example tests * update following Lysandre and Sylvain's comments * Update src/transformers/testing_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/testing_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
1 parent c65863c commit ba8c4d0

File tree

140 files changed

+6550
-3960
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

140 files changed

+6550
-3960
lines changed

.circleci/config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ jobs:
198198
- v0.3-build_doc-{{ checksum "setup.py" }}
199199
- v0.3-{{ checksum "setup.py" }}
200200
- run: pip install --upgrade pip
201-
- run: pip install .[tf,torch,docs]
201+
- run: pip install .[tf,torch,sentencepiece,docs]
202202
- save_cache:
203203
key: v0.3-build_doc-{{ checksum "setup.py" }}
204204
paths:
@@ -219,7 +219,7 @@ jobs:
219219
keys:
220220
- v0.3-deploy_doc-{{ checksum "setup.py" }}
221221
- v0.3-{{ checksum "setup.py" }}
222-
- run: pip install .[tf,torch,docs]
222+
- run: pip install .[tf,torch,sentencepiece,docs]
223223
- save_cache:
224224
key: v0.3-deploy_doc-{{ checksum "setup.py" }}
225225
paths:

.github/workflows/github-torch-hub.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,7 @@ jobs:
3030
run: |
3131
pip install --upgrade pip
3232
pip install torch
33-
pip install numpy filelock protobuf requests tqdm regex sentencepiece sacremoses packaging
34-
pip install tokenizers==0.9.0.rc2
33+
pip install numpy filelock protobuf requests tqdm regex sentencepiece sacremoses tokenizers packaging
3534
3635
- name: Torch hub list
3736
run: |

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,8 @@ __pycache__/
99
*.so
1010

1111
# tests and logs
12-
tests/fixtures
12+
tests/fixtures/*
13+
!tests/fixtures/sample_text_no_unicode.txt
1314
logs/
1415
lightning_logs/
1516
lang_code_data/

docs/source/task_summary.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -758,8 +758,8 @@ Here is an example of using the pipelines to do summarization. It leverages a Ba
758758
... If convicted, Barrientos faces up to four years in prison. Her next court appearance is scheduled for May 18.
759759
... """
760760
761-
Because the summarization pipeline depends on the ``PretrainedModel.generate()`` method, we can override the default arguments
762-
of ``PretrainedModel.generate()`` directly in the pipeline for ``max_length`` and ``min_length`` as shown below.
761+
Because the summarization pipeline depends on the ``PreTrainedModel.generate()`` method, we can override the default arguments
762+
of ``PreTrainedModel.generate()`` directly in the pipeline for ``max_length`` and ``min_length`` as shown below.
763763
This outputs the following summary:
764764

765765
.. code-block::
@@ -772,7 +772,7 @@ Here is an example of doing summarization using a model and a tokenizer. The pro
772772
1. Instantiate a tokenizer and a model from the checkpoint name. Summarization is usually done using an encoder-decoder model, such as ``Bart`` or ``T5``.
773773
2. Define the article that should be summarized.
774774
3. Add the T5 specific prefix "summarize: ".
775-
4. Use the ``PretrainedModel.generate()`` method to generate the summary.
775+
4. Use the ``PreTrainedModel.generate()`` method to generate the summary.
776776

777777
In this example we use Google`s T5 model. Even though it was pre-trained only on a multi-task mixed dataset (including CNN / Daily Mail), it yields very good results.
778778

@@ -819,15 +819,15 @@ translation results.
819819
>>> print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))
820820
[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]
821821
822-
Because the translation pipeline depends on the ``PretrainedModel.generate()`` method, we can override the default arguments
823-
of ``PretrainedModel.generate()`` directly in the pipeline as is shown for ``max_length`` above.
822+
Because the translation pipeline depends on the ``PreTrainedModel.generate()`` method, we can override the default arguments
823+
of ``PreTrainedModel.generate()`` directly in the pipeline as is shown for ``max_length`` above.
824824

825825
Here is an example of doing translation using a model and a tokenizer. The process is the following:
826826

827827
1. Instantiate a tokenizer and a model from the checkpoint name. Summarization is usually done using an encoder-decoder model, such as ``Bart`` or ``T5``.
828828
2. Define the article that should be summarizaed.
829829
3. Add the T5 specific prefix "translate English to German: "
830-
4. Use the ``PretrainedModel.generate()`` method to perform the translation.
830+
4. Use the ``PreTrainedModel.generate()`` method to perform the translation.
831831

832832
.. code-block::
833833

examples/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@ datasets
1717
fire
1818
pytest
1919
conllu
20+
sentencepiece != 0.1.92

setup.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,12 +92,13 @@
9292
extras["serving"] = ["pydantic", "uvicorn", "fastapi", "starlette"]
9393
extras["all"] = extras["serving"] + ["tensorflow", "torch"]
9494

95+
extras["sentencepiece"] = ["sentencepiece!=0.1.92"]
9596
extras["retrieval"] = ["faiss-cpu", "datasets"]
9697
extras["testing"] = ["pytest", "pytest-xdist", "timeout-decorator", "parameterized", "psutil"] + extras["retrieval"]
9798
# sphinx-rtd-theme==0.5.0 introduced big changes in the style.
9899
extras["docs"] = ["recommonmark", "sphinx", "sphinx-markdown-tables", "sphinx-rtd-theme==0.4.3", "sphinx-copybutton"]
99100
extras["quality"] = ["black >= 20.8b1", "isort >= 5.5.4", "flake8 >= 3.8.3"]
100-
extras["dev"] = extras["testing"] + extras["quality"] + extras["ja"] + ["scikit-learn", "tensorflow", "torch"]
101+
extras["dev"] = extras["testing"] + extras["quality"] + extras["ja"] + ["scikit-learn", "tensorflow", "torch", "sentencepiece!=0.1.92"]
101102

102103
setup(
103104
name="transformers",
@@ -114,7 +115,7 @@
114115
packages=find_packages("src"),
115116
install_requires=[
116117
"numpy",
117-
"tokenizers == 0.9.0.rc2",
118+
"tokenizers == 0.9.2",
118119
# dataclasses for Python versions that don't have it
119120
"dataclasses;python_version<'3.7'",
120121
# utilities from PyPA to e.g. compare versions

src/transformers/__init__.py

Lines changed: 72 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@
9292
MODEL_CARD_NAME,
9393
PYTORCH_PRETRAINED_BERT_CACHE,
9494
PYTORCH_TRANSFORMERS_CACHE,
95+
SPIECE_UNDERLINE,
9596
TF2_WEIGHTS_NAME,
9697
TF_WEIGHTS_NAME,
9798
TRANSFORMERS_CACHE,
@@ -104,8 +105,10 @@
104105
is_faiss_available,
105106
is_psutil_available,
106107
is_py3nvml_available,
108+
is_sentencepiece_available,
107109
is_sklearn_available,
108110
is_tf_available,
111+
is_tokenizers_available,
109112
is_torch_available,
110113
is_torch_tpu_available,
111114
)
@@ -152,60 +155,101 @@
152155
from .retrieval_rag import RagRetriever
153156

154157
# Tokenizers
155-
from .tokenization_albert import AlbertTokenizer, AlbertTokenizerFast
156158
from .tokenization_auto import TOKENIZER_MAPPING, AutoTokenizer
157-
from .tokenization_bart import BartTokenizer, BartTokenizerFast
158-
from .tokenization_bert import BasicTokenizer, BertTokenizer, BertTokenizerFast, WordpieceTokenizer
159-
from .tokenization_bert_generation import BertGenerationTokenizer
159+
from .tokenization_bart import BartTokenizer
160+
from .tokenization_bert import BasicTokenizer, BertTokenizer, WordpieceTokenizer
160161
from .tokenization_bert_japanese import BertJapaneseTokenizer, CharacterTokenizer, MecabTokenizer
161162
from .tokenization_bertweet import BertweetTokenizer
162163
from .tokenization_blenderbot import BlenderbotSmallTokenizer, BlenderbotTokenizer
163-
from .tokenization_camembert import CamembertTokenizer, CamembertTokenizerFast
164164
from .tokenization_ctrl import CTRLTokenizer
165165
from .tokenization_deberta import DebertaTokenizer
166-
from .tokenization_distilbert import DistilBertTokenizer, DistilBertTokenizerFast
166+
from .tokenization_distilbert import DistilBertTokenizer
167167
from .tokenization_dpr import (
168168
DPRContextEncoderTokenizer,
169-
DPRContextEncoderTokenizerFast,
170169
DPRQuestionEncoderTokenizer,
171-
DPRQuestionEncoderTokenizerFast,
170+
DPRReaderOutput,
172171
DPRReaderTokenizer,
173-
DPRReaderTokenizerFast,
174172
)
175-
from .tokenization_electra import ElectraTokenizer, ElectraTokenizerFast
173+
from .tokenization_electra import ElectraTokenizer
176174
from .tokenization_flaubert import FlaubertTokenizer
177175
from .tokenization_fsmt import FSMTTokenizer
178-
from .tokenization_funnel import FunnelTokenizer, FunnelTokenizerFast
179-
from .tokenization_gpt2 import GPT2Tokenizer, GPT2TokenizerFast
180-
from .tokenization_herbert import HerbertTokenizer, HerbertTokenizerFast
181-
from .tokenization_layoutlm import LayoutLMTokenizer, LayoutLMTokenizerFast
182-
from .tokenization_longformer import LongformerTokenizer, LongformerTokenizerFast
183-
from .tokenization_lxmert import LxmertTokenizer, LxmertTokenizerFast
184-
from .tokenization_mbart import MBartTokenizer, MBartTokenizerFast
185-
from .tokenization_mobilebert import MobileBertTokenizer, MobileBertTokenizerFast
186-
from .tokenization_openai import OpenAIGPTTokenizer, OpenAIGPTTokenizerFast
187-
from .tokenization_pegasus import PegasusTokenizer, PegasusTokenizerFast
176+
from .tokenization_funnel import FunnelTokenizer
177+
from .tokenization_gpt2 import GPT2Tokenizer
178+
from .tokenization_herbert import HerbertTokenizer
179+
from .tokenization_layoutlm import LayoutLMTokenizer
180+
from .tokenization_longformer import LongformerTokenizer
181+
from .tokenization_lxmert import LxmertTokenizer
182+
from .tokenization_mobilebert import MobileBertTokenizer
183+
from .tokenization_openai import OpenAIGPTTokenizer
188184
from .tokenization_phobert import PhobertTokenizer
189185
from .tokenization_rag import RagTokenizer
190-
from .tokenization_reformer import ReformerTokenizer, ReformerTokenizerFast
191-
from .tokenization_retribert import RetriBertTokenizer, RetriBertTokenizerFast
192-
from .tokenization_roberta import RobertaTokenizer, RobertaTokenizerFast
193-
from .tokenization_squeezebert import SqueezeBertTokenizer, SqueezeBertTokenizerFast
194-
from .tokenization_t5 import T5Tokenizer, T5TokenizerFast
186+
from .tokenization_retribert import RetriBertTokenizer
187+
from .tokenization_roberta import RobertaTokenizer
188+
from .tokenization_squeezebert import SqueezeBertTokenizer
195189
from .tokenization_transfo_xl import TransfoXLCorpus, TransfoXLTokenizer
196190
from .tokenization_utils import PreTrainedTokenizer
197191
from .tokenization_utils_base import (
192+
AddedToken,
198193
BatchEncoding,
199194
CharSpan,
200195
PreTrainedTokenizerBase,
201196
SpecialTokensMixin,
202197
TensorType,
203198
TokenSpan,
204199
)
205-
from .tokenization_utils_fast import PreTrainedTokenizerFast
206200
from .tokenization_xlm import XLMTokenizer
207-
from .tokenization_xlm_roberta import XLMRobertaTokenizer, XLMRobertaTokenizerFast
208-
from .tokenization_xlnet import SPIECE_UNDERLINE, XLNetTokenizer, XLNetTokenizerFast
201+
202+
203+
if is_sentencepiece_available():
204+
from .tokenization_albert import AlbertTokenizer
205+
from .tokenization_bert_generation import BertGenerationTokenizer
206+
from .tokenization_camembert import CamembertTokenizer
207+
from .tokenization_marian import MarianTokenizer
208+
from .tokenization_mbart import MBartTokenizer
209+
from .tokenization_pegasus import PegasusTokenizer
210+
from .tokenization_reformer import ReformerTokenizer
211+
from .tokenization_t5 import T5Tokenizer
212+
from .tokenization_xlm_roberta import XLMRobertaTokenizer
213+
from .tokenization_xlnet import XLNetTokenizer
214+
else:
215+
from .utils.dummy_sentencepiece_objects import *
216+
217+
if is_tokenizers_available():
218+
from .tokenization_albert_fast import AlbertTokenizerFast
219+
from .tokenization_bart_fast import BartTokenizerFast
220+
from .tokenization_bert_fast import BertTokenizerFast
221+
from .tokenization_camembert_fast import CamembertTokenizerFast
222+
from .tokenization_distilbert_fast import DistilBertTokenizerFast
223+
from .tokenization_dpr_fast import (
224+
DPRContextEncoderTokenizerFast,
225+
DPRQuestionEncoderTokenizerFast,
226+
DPRReaderTokenizerFast,
227+
)
228+
from .tokenization_electra_fast import ElectraTokenizerFast
229+
from .tokenization_funnel_fast import FunnelTokenizerFast
230+
from .tokenization_gpt2_fast import GPT2TokenizerFast
231+
from .tokenization_herbert_fast import HerbertTokenizerFast
232+
from .tokenization_layoutlm_fast import LayoutLMTokenizerFast
233+
from .tokenization_longformer_fast import LongformerTokenizerFast
234+
from .tokenization_lxmert_fast import LxmertTokenizerFast
235+
from .tokenization_mbart_fast import MBartTokenizerFast
236+
from .tokenization_mobilebert_fast import MobileBertTokenizerFast
237+
from .tokenization_openai_fast import OpenAIGPTTokenizerFast
238+
from .tokenization_pegasus_fast import PegasusTokenizerFast
239+
from .tokenization_reformer_fast import ReformerTokenizerFast
240+
from .tokenization_retribert_fast import RetriBertTokenizerFast
241+
from .tokenization_roberta_fast import RobertaTokenizerFast
242+
from .tokenization_squeezebert_fast import SqueezeBertTokenizerFast
243+
from .tokenization_t5_fast import T5TokenizerFast
244+
from .tokenization_utils_fast import PreTrainedTokenizerFast
245+
from .tokenization_xlm_roberta_fast import XLMRobertaTokenizerFast
246+
from .tokenization_xlnet_fast import XLNetTokenizerFast
247+
248+
if is_sentencepiece_available():
249+
from .convert_slow_tokenizer import SLOW_TO_FAST_CONVERTERS, convert_slow_tokenizer
250+
else:
251+
from .utils.dummy_tokenizers_objects import *
252+
209253

210254
# Trainer
211255
from .trainer_callback import (
@@ -539,7 +583,6 @@
539583
get_linear_schedule_with_warmup,
540584
get_polynomial_decay_schedule_with_warmup,
541585
)
542-
from .tokenization_marian import MarianTokenizer
543586

544587
# Trainer
545588
from .trainer import Trainer

src/transformers/configuration_auto.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -266,7 +266,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
266266
our S3, e.g., ``dbmdz/bert-base-german-cased``.
267267
- A path to a `directory` containing a configuration file saved using the
268268
:meth:`~transformers.PretrainedConfig.save_pretrained` method, or the
269-
:meth:`~transformers.PretrainedModel.save_pretrained` method, e.g., ``./my_model_directory/``.
269+
:meth:`~transformers.PreTrainedModel.save_pretrained` method, e.g., ``./my_model_directory/``.
270270
- A path or url to a saved configuration JSON `file`, e.g.,
271271
``./my_model_directory/configuration.json``.
272272
cache_dir (:obj:`str`, `optional`):

src/transformers/configuration_utils.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,9 @@ class PretrainedConfig(object):
4343
recreate the correct object in :class:`~transformers.AutoConfig`.
4444
4545
Args:
46+
name_or_path (:obj:`str`, `optional`, defaults to :obj:`""`):
47+
Store the string that was passed to :func:`~transformers.PreTrainedModel.from_pretrained` or :func:`~transformers.TFPreTrainedModel.from_pretrained`
48+
as ``pretrained_model_name_or_path`` if the configuration was created with such a method.
4649
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`False`):
4750
Whether or not the model should return all hidden-states.
4851
output_attentions (:obj:`bool`, `optional`, defaults to :obj:`False`):
@@ -206,6 +209,9 @@ def __init__(self, **kwargs):
206209
# TPU arguments
207210
self.xla_device = kwargs.pop("xla_device", None)
208211

212+
# Name or path to the pretrained checkpoint
213+
self._name_or_path = str(kwargs.pop("name_or_path", ""))
214+
209215
# Additional attributes without default values
210216
for key, value in kwargs.items():
211217
try:
@@ -214,6 +220,14 @@ def __init__(self, **kwargs):
214220
logger.error("Can't set {} with value {} for {}".format(key, value, self))
215221
raise err
216222

223+
@property
224+
def name_or_path(self) -> str:
225+
return self._name_or_path
226+
227+
@name_or_path.setter
228+
def name_or_path(self, value):
229+
self._name_or_path = str(value) # Make sure that name_or_path is a string (for JSON encoding)
230+
217231
@property
218232
def use_return_dict(self) -> bool:
219233
"""

src/transformers/convert_slow_tokenizer.py

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,14 @@
2020

2121
from typing import Dict, List, Tuple
2222

23-
from sentencepiece import SentencePieceProcessor
2423
from tokenizers import Tokenizer, decoders, normalizers, pre_tokenizers, processors
2524
from tokenizers.models import BPE, Unigram, WordPiece
2625

2726
# from transformers.tokenization_openai import OpenAIGPTTokenizer
2827
from transformers.utils import sentencepiece_model_pb2 as model
2928

29+
from .file_utils import requires_sentencepiece
30+
3031

3132
class SentencePieceExtractor:
3233
"""
@@ -35,7 +36,9 @@ class SentencePieceExtractor:
3536
"""
3637

3738
def __init__(self, model: str):
38-
# Get SentencePiece
39+
requires_sentencepiece(self)
40+
from sentencepiece import SentencePieceProcessor
41+
3942
self.sp = SentencePieceProcessor()
4043
self.sp.Load(model)
4144

@@ -568,11 +571,10 @@ def post_processor(self):
568571
)
569572

570573

571-
CONVERTERS = {
574+
SLOW_TO_FAST_CONVERTERS = {
572575
"AlbertTokenizer": AlbertConverter,
573-
"BertTokenizer": BertConverter,
574-
"BertGenerationTokenizer": BertGenerationConverter,
575576
"BartTokenizer": RobertaConverter,
577+
"BertTokenizer": BertConverter,
576578
"CamembertTokenizer": CamembertConverter,
577579
"DistilBertTokenizer": BertConverter,
578580
"DPRReaderTokenizer": BertConverter,
@@ -582,18 +584,44 @@ def post_processor(self):
582584
"FunnelTokenizer": FunnelConverter,
583585
"GPT2Tokenizer": GPT2Converter,
584586
"HerbertTokenizer": HerbertConverter,
587+
"LayoutLMTokenizer": BertConverter,
588+
"LongformerTokenizer": RobertaConverter,
585589
"LxmertTokenizer": BertConverter,
586590
"MBartTokenizer": MBartConverter,
591+
"MobileBertTokenizer": BertConverter,
587592
"OpenAIGPTTokenizer": OpenAIGPTConverter,
588593
"PegasusTokenizer": PegasusConverter,
589594
"ReformerTokenizer": ReformerConverter,
595+
"RetriBertTokenizer": BertConverter,
590596
"RobertaTokenizer": RobertaConverter,
597+
"SqueezeBertTokenizer": BertConverter,
591598
"T5Tokenizer": T5Converter,
592599
"XLMRobertaTokenizer": XLMRobertaConverter,
593600
"XLNetTokenizer": XLNetConverter,
594601
}
595602

596603

597604
def convert_slow_tokenizer(transformer_tokenizer) -> Tokenizer:
598-
converter_class = CONVERTERS[transformer_tokenizer.__class__.__name__]
605+
"""Utilities to convert a slow tokenizer instance in a fast tokenizer instance.
606+
607+
Args:
608+
transformer_tokenizer (:class:`~transformers.tokenization_utils_base.PreTrainedTokenizer`):
609+
Instance of a slow tokenizer to convert in the backend tokenizer for
610+
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerFast`.
611+
612+
Return:
613+
A instance of :class:`~tokenizers.Tokenizer` to be used as the backend tokenizer of a
614+
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerFast`
615+
"""
616+
617+
tokenizer_class_name = transformer_tokenizer.__class__.__name__
618+
619+
if tokenizer_class_name not in SLOW_TO_FAST_CONVERTERS:
620+
raise ValueError(
621+
f"An instance of tokenizer class {tokenizer_class_name} cannot be converted in a Fast tokenizer instance. "
622+
f"No converter was found. Currently available slow->fast convertors: {list(SLOW_TO_FAST_CONVERTERS.keys())}"
623+
)
624+
625+
converter_class = SLOW_TO_FAST_CONVERTERS[tokenizer_class_name]
626+
599627
return converter_class(transformer_tokenizer).converted()

0 commit comments

Comments
 (0)