You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was planning on implementing the same POS tagger architecture using the bertweet-base model using Huggingface, but since it is not supported by PreTrainedTokenizerFast, you cannot access the offset_mappings, and thus cannot easily access the embeddings for a given token for POS tagging (planned on pooling the subwords per token in Tweebank). The tokenizer doesn't seem to deviate from the Huggingface Roberta tokenizers except for the normalization functionality, so is there any way to use this feature or see it being added (perhaps in a setting that doesn't use normalization)? It already works for the bertweet-large model, so I assume its not impossible.
The text was updated successfully, but these errors were encountered:
@nu11us I recently developed the fast tokenizer for bertweet-base. You might experiment with it by installing transformers from: git clone --single-branch --branch fast_tokenizers_BARTpho_PhoBERT_BERTweet https://github.com/datquocnguyen/transformers.git
If you find it useful, please comment at this thread huggingface/transformers#17254 (comment), so that the fast tokenizer will be merged into the main transformers soon.
Hi, I was planning on implementing the same POS tagger architecture using the
bertweet-base
model using Huggingface, but since it is not supported byPreTrainedTokenizerFast
, you cannot access the offset_mappings, and thus cannot easily access the embeddings for a given token for POS tagging (planned on pooling the subwords per token in Tweebank). The tokenizer doesn't seem to deviate from the Huggingface Roberta tokenizers except for the normalization functionality, so is there any way to use this feature or see it being added (perhaps in a setting that doesn't use normalization)? It already works for thebertweet-large
model, so I assume its not impossible.The text was updated successfully, but these errors were encountered: