You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When evaluating a model that uses sentencepiece using transformer.js, I do not get the ▁ marker included in the output as I do when running from python. I'm using the qanastek/pos-french-camembert model with to do POS tagging and have situations where a single word such as a verb with a tense suffix is returned as two or more tokens. I'd like to process the group of tokens and decide how to handle the different labels. I see the pre_tokenizer and decoder fields of the model's tokenizer.json include references to the Metaspace parameter, but I'm unsure if it's possible to configure it to retain the space placeholder token.
The text was updated successfully, but these errors were encountered:
But when playing around with this, it seems to me that the decoder should be applied to a list of tokens, not to a single token: When I null out the tokenizer's decoder and run the decoder on the full list of tokens, I recover the original input.
Question
When evaluating a model that uses sentencepiece using transformer.js, I do not get the
▁
marker included in the output as I do when running from python. I'm using the qanastek/pos-french-camembert model with to do POS tagging and have situations where a single word such as a verb with a tense suffix is returned as two or more tokens. I'd like to process the group of tokens and decide how to handle the different labels. I see thepre_tokenizer
anddecoder
fields of the model'stokenizer.json
include references to theMetaspace
parameter, but I'm unsure if it's possible to configure it to retain the space placeholder token.The text was updated successfully, but these errors were encountered: