New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
咨询 MacBert 上的一些问题 #13
Comments
你好,邮箱没有问题,邮件可以发送到论文中的邮箱或者ymcui@ieee.org。 |
那比如四个要以 10% 概率进行 mask 的时候,这四个词必须是常在一起的吗?比如“自然/语言/处理/很酷/”。 |
不确定是否正确理解了你说的。
step1:指针指向A,此时假设概率选择了3-gram,那么A/B/C将同时被mask(若进行相似词替换,则每个词都会被其相似词替换) |
非常谢谢,我就是想问这个问题。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
你好,我在看 MacBert 的论文时有一些迷惑。本来想法邮件的,但好像发不到那个邮箱。
关于“ We use whole word masking as well as N-gram masking strategies for selecting candidate tokens for masking, with a percentage of 40%, 30%, 20%, 10% for word-level unigram to 4-gram. ”,这段是指一个词 40% 的概率被换成近义词,两个词以 30% 概率换成近义词,以此类推吗?
The text was updated successfully, but these errors were encountered: