xenova · xenova · Mar 22, 2024 · Mar 23, 2024 · Mar 24, 2024 · Mar 24, 2024
diff --git a/README.md b/README.md
@@ -296,6 +296,8 @@ You can refine your search by selecting the task you're interested in (e.g., [te
 1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
 1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le.
 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
+1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
+1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang.
 1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models.  **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
 1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
 1. **FastViT** (from Apple) released with the paper [FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization](https://arxiv.org/abs/2303.14189) by Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel and Anurag Ranjan.

diff --git a/docs/snippets/6_supported-models.snippet b/docs/snippets/6_supported-models.snippet
@@ -31,6 +31,8 @@
 1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
 1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le.
 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
+1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
+1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang.
 1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models.  **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
 1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
 1. **FastViT** (from Apple) released with the paper [FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization](https://arxiv.org/abs/2303.14189) by Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel and Anurag Ranjan.

diff --git a/scripts/supported_models.py b/scripts/supported_models.py
@@ -425,6 +425,74 @@
             'google/electra-base-discriminator',
         ],
     },
+    'ernie': {  # bert-like
+        # Feature extraction
+        'feature-extraction': [
+            'hf-internal-testing/tiny-random-ErnieModel',
+
+            'nghuyong/ernie-2.0-large-en',
+            'nghuyong/ernie-2.0-base-en',
+            'nghuyong/ernie-health-zh',
+            'nghuyong/ernie-3.0-mini-zh',
+            'nghuyong/ernie-3.0-nano-zh',
+            'nghuyong/ernie-3.0-micro-zh',
+            'nghuyong/ernie-gram-zh',
+
+            'shibing624/text2vec-base-chinese-paraphrase',
+            'shibing624/text2vec-base-chinese-sentence',
+        ],
+
+        # Text classification
+        'text-classification': [
+            'hf-internal-testing/tiny-random-ErnieForSequenceClassification',
+        ],
+
+        # Token classification
+        'token-classification': [
+            'hf-internal-testing/tiny-random-ErnieForTokenClassification',
+        ],
+
+        # Masked language modelling
+        'fill-mask': [
+            'nghuyong/ernie-3.0-xbase-zh',
+            'nghuyong/ernie-1.0-base-zh',
+            'nghuyong/ernie-3.0-medium-zh',
+            'nghuyong/ernie-3.0-base-zh',
+            'hf-internal-testing/tiny-random-ErnieForMaskedLM',
+        ],
+
+        # Question answering
+        'question-answering': [
+            'hf-internal-testing/tiny-random-ErnieForQuestionAnswering',
+        ],
+    },
+    'ernie_m': {  # distilbert-like
+        # Feature extraction
+        'feature-extraction': [
+            'hf-internal-testing/tiny-random-ErnieMModel',
+        ],
+
+        # Text classification
+        'text-classification': [
+            'hf-internal-testing/tiny-random-ErnieMForSequenceClassification',
+        ],
+
+        # Zero-shot classification
+        'zero-shot-classification': [
+            'MoritzLaurer/ernie-m-base-mnli-xnli',
+            'MoritzLaurer/ernie-m-large-mnli-xnli',
+        ],
+
+        # Token classification
+        'token-classification': [
+            'hf-internal-testing/tiny-random-ErnieMForTokenClassification',
+        ],
+
+        # Question answering
+        'question-answering': [
+            'hf-tiny-model-private/tiny-random-ErnieMForQuestionAnswering',
+        ],
+    },
     'esm': {
         # Masked language modelling
         'fill-mask': [

diff --git a/src/models.js b/src/models.js
@@ -1471,6 +1471,149 @@ export class BertForQuestionAnswering extends BertPreTrainedModel {
 }
 //////////////////////////////////////////////////
 
+//////////////////////////////////////////////////
+// Ernie models
+export class ErniePreTrainedModel extends PreTrainedModel { }
+
+/**
+ * The bare Ernie Model transformer outputting raw hidden-states without any specific head on top.
+ */
+export class ErnieModel extends ErniePreTrainedModel { }
+
+/**
+ * Ernie Model with a `language modeling` head on top.
+ */
+export class ErnieForMaskedLM extends ErniePreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<MaskedLMOutput>} An object containing the model's output logits for masked language modeling.
+     */
+    async _call(model_inputs) {
+        return new MaskedLMOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * Ernie Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output)
+ */
+export class ErnieForSequenceClassification extends ErniePreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<SequenceClassifierOutput>} An object containing the model's output logits for sequence classification.
+     */
+    async _call(model_inputs) {
+        return new SequenceClassifierOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * Ernie Model with a token classification head on top (a linear layer on top of the hidden-states output)
+ */
+export class ErnieForTokenClassification extends ErniePreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<TokenClassifierOutput>} An object containing the model's output logits for token classification.
+     */
+    async _call(model_inputs) {
+        return new TokenClassifierOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * Ernie Model with a span classification head on top for extractive question-answering tasks like SQuAD
+ * (a linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
+ */
+export class ErnieForQuestionAnswering extends ErniePreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<QuestionAnsweringModelOutput>} An object containing the model's output logits for question answering.
+     */
+    async _call(model_inputs) {
+        return new QuestionAnsweringModelOutput(await super._call(model_inputs));
+    }
+}
+//////////////////////////////////////////////////
+
+//////////////////////////////////////////////////
+// ErnieM models
+export class ErnieMPreTrainedModel extends PreTrainedModel { }
+
+/**
+ * The bare ErnieM Model transformer outputting raw hidden-states without any specific head on top.
+ */
+export class ErnieMModel extends ErnieMPreTrainedModel { }
+
+/**
+ * ErnieM Model with a `language modeling` head on top.
+ */
+export class ErnieMForMaskedLM extends ErnieMPreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<MaskedLMOutput>} An object containing the model's output logits for masked language modeling.
+     */
+    async _call(model_inputs) {
+        return new MaskedLMOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * ErnieM Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output)
+ */
+export class ErnieMForSequenceClassification extends ErnieMPreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<SequenceClassifierOutput>} An object containing the model's output logits for sequence classification.
+     */
+    async _call(model_inputs) {
+        return new SequenceClassifierOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * ErnieM Model with a token classification head on top (a linear layer on top of the hidden-states output)
+ */
+export class ErnieMForTokenClassification extends ErnieMPreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<TokenClassifierOutput>} An object containing the model's output logits for token classification.
+     */
+    async _call(model_inputs) {
+        return new TokenClassifierOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * ErnieM Model with a span classification head on top for extractive question-answering tasks like SQuAD
+ * (a linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
+ */
+export class ErnieMForQuestionAnswering extends ErnieMPreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<QuestionAnsweringModelOutput>} An object containing the model's output logits for question answering.
+     */
+    async _call(model_inputs) {
+        return new QuestionAnsweringModelOutput(await super._call(model_inputs));
+    }
+}
+//////////////////////////////////////////////////
+
+
 //////////////////////////////////////////////////
 // NomicBert models
 export class NomicBertPreTrainedModel extends PreTrainedModel { }
@@ -5530,6 +5673,8 @@ export class PretrainedMixin {
 
 const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
     ['bert', ['BertModel', BertModel]],
+    ['ernie', ['ErnieModel', ErnieModel]],
+    ['ernie_m', ['ErnieMModel', ErnieMModel]],
     ['nomic_bert', ['NomicBertModel', NomicBertModel]],
     ['roformer', ['RoFormerModel', RoFormerModel]],
     ['electra', ['ElectraModel', ElectraModel]],
@@ -5633,6 +5778,8 @@ const MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES = new Map([
 
 const MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = new Map([
     ['bert', ['BertForSequenceClassification', BertForSequenceClassification]],
+    ['ernie', ['ErnieForSequenceClassification', ErnieForSequenceClassification]],
+    ['ernie_m', ['ErnieMForSequenceClassification', ErnieMForSequenceClassification]],
     ['roformer', ['RoFormerForSequenceClassification', RoFormerForSequenceClassification]],
     ['electra', ['ElectraForSequenceClassification', ElectraForSequenceClassification]],
     ['esm', ['EsmForSequenceClassification', EsmForSequenceClassification]],
@@ -5654,6 +5801,8 @@ const MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = new Map([
 
 const MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = new Map([
     ['bert', ['BertForTokenClassification', BertForTokenClassification]],
+    ['ernie', ['ErnieForTokenClassification', ErnieForTokenClassification]],
+    ['ernie_m', ['ErnieMForTokenClassification', ErnieMForTokenClassification]],
     ['roformer', ['RoFormerForTokenClassification', RoFormerForTokenClassification]],
     ['electra', ['ElectraForTokenClassification', ElectraForTokenClassification]],
     ['esm', ['EsmForTokenClassification', EsmForTokenClassification]],
@@ -5703,6 +5852,8 @@ const MODEL_WITH_LM_HEAD_MAPPING_NAMES = new Map([
 
 const MODEL_FOR_MASKED_LM_MAPPING_NAMES = new Map([
     ['bert', ['BertForMaskedLM', BertForMaskedLM]],
+    ['ernie', ['ErnieForMaskedLM', ErnieForMaskedLM]],
+    ['ernie_m', ['ErnieMForMaskedLM', ErnieMForMaskedLM]],
     ['roformer', ['RoFormerForMaskedLM', RoFormerForMaskedLM]],
     ['electra', ['ElectraForMaskedLM', ElectraForMaskedLM]],
     ['esm', ['EsmForMaskedLM', EsmForMaskedLM]],
@@ -5722,6 +5873,8 @@ const MODEL_FOR_MASKED_LM_MAPPING_NAMES = new Map([
 
 const MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = new Map([
     ['bert', ['BertForQuestionAnswering', BertForQuestionAnswering]],
+    ['ernie', ['ErnieForQuestionAnswering', ErnieForQuestionAnswering]],
+    ['ernie_m', ['ErnieMForQuestionAnswering', ErnieMForQuestionAnswering]],
     ['roformer', ['RoFormerForQuestionAnswering', RoFormerForQuestionAnswering]],
     ['electra', ['ElectraForQuestionAnswering', ElectraForQuestionAnswering]],
     ['convbert', ['ConvBertForQuestionAnswering', ConvBertForQuestionAnswering]],

diff --git a/src/tokenizers.js b/src/tokenizers.js
@@ -4372,6 +4372,13 @@ export class VitsTokenizer extends PreTrainedTokenizer {
 
 export class CohereTokenizer extends PreTrainedTokenizer { }
 
+export class ErnieMTokenizer extends PreTrainedTokenizer {
+    constructor(tokenizerJSON, tokenizerConfig) {
+        super(tokenizerJSON, tokenizerConfig);
+        console.warn('WARNING: `ErnieMTokenizer` is not yet supported by Hugging Face\'s "fast" tokenizers library. Therefore, you may experience slightly inaccurate results.')
+    }
+}
+
 /**
  * Helper class which is used to instantiate pretrained tokenizers with the `from_pretrained` function.
  * The chosen tokenizer class is determined by the type specified in the tokenizer config.
@@ -4425,6 +4432,7 @@ export class AutoTokenizer {
         GemmaTokenizer,
         Grok1Tokenizer,
         CohereTokenizer,
+        ErnieMTokenizer,
 
         // Base case:
         PreTrainedTokenizer,

diff --git a/tests/generate_tests.py b/tests/generate_tests.py
@@ -42,6 +42,9 @@
     'gemma': [
         'Xenova/gemma-tokenizer',
     ],
+    'ernie_m': [
+        'Xenova/tiny-random-ErnieMModel',
+    ]
 }
 
 MODELS_TO_IGNORE = [
@@ -64,6 +67,9 @@
     # - decoding with `skip_special_tokens=True`.
     # - interspersing the pad token is broken.
     'vits',
+
+    # TODO: remove when ErnieMTokenizerFast is implemented
+    'ernie_m',
 ]
 
 TOKENIZERS_TO_IGNORE = [
@@ -184,6 +190,16 @@
             #  - New (correct):   ['▁Hey', '▁', '</s>', '.', '▁how', '▁are', '▁you']
             "Hey </s>. how are you",
         ],
+
+        "Xenova/tiny-random-ErnieMModel": [
+            'hello world',
+            '[UNK][SEP][PAD][CLS][MASK]',  # Special tokens
+            '1 2 3 123',  # Digit pretokenizer
+            'this,test',
+            'test 你好世界',  # Chinese characters
+            "A\n'll !!to?'d''d of, can't.",  # Punctuation
+            "test $1 R2 #3 €4 £5 ¥6 ₣7 ₹8 ₱9 test",  # Unknown tokens
+        ],
     },
 }
 
@@ -331,7 +347,7 @@ def generate_tokenizer_tests():
 
             for data in TOKENIZER_TEXT_PAIR_TEST_DATA:
                 try:
-                    output = tokenizer(**data).data
+                    output = tokenizer(**data, return_attention_mask=True).data
                 except Exception:
                     # Ignore testing tokenizers which fail in the python library
                     continue
@@ -347,7 +363,7 @@ def generate_tokenizer_tests():
             # Run tokenizer on test cases
             for text in shared_texts + custom_texts + custom_by_model_type_texts:
                 try:
-                    encoded = tokenizer(text).data
+                    encoded = tokenizer(text, return_attention_mask=True).data
                 except Exception:
                     # Ignore testing tokenizers which fail in the python library
                     continue