[SAMPLE]How to use the model "intfloat/multilingual-e5-small" to complete the embedding operation

I need to use the intfloat/multilingual-e5-small model. However, I encountered a problem with missing tags such as [UNK] and [SEP] when loading VOCab.txt on the ARM64 architecture. Upon researching, it was found that 'intfloat/multilingual-e5-small' uses XLMRobertaTokenizer (dependent on SentencePiece). I am in Microsoft I found SentencePieceTokenizers in ML.Tokenizers, and their usage is different from BertTokenizer's. I don't know how to use it. Can you provide me with a tutorial on how to use it. I went through the file The OpenRead method read the Stream and successfully loaded SentencePieceTokenizers, but I don't know how to use it in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SAMPLE]How to use the model "intfloat/multilingual-e5-small" to complete the embedding operation #352

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SAMPLE]How to use the model "intfloat/multilingual-e5-small" to complete the embedding operation #352

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions