Skip to content

[SAMPLE]How to use the model "intfloat/multilingual-e5-small" to complete the embedding operation #352

Open
@qihui-liu

Description

@qihui-liu

I need to use the intfloat/multilingual-e5-small model. However, I encountered a problem with missing tags such as [UNK] and [SEP] when loading VOCab.txt on the ARM64 architecture. Upon researching, it was found that 'intfloat/multilingual-e5-small' uses XLMRobertaTokenizer (dependent on SentencePiece). I am in Microsoft I found SentencePieceTokenizers in ML.Tokenizers, and their usage is different from BertTokenizer's. I don't know how to use it. Can you provide me with a tutorial on how to use it. I went through the file The OpenRead method read the Stream and successfully loaded SentencePieceTokenizers, but I don't know how to use it in the future.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions