We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
作者您好,我在学习您相关工作时遇到了一些困惑,以下是我的理解以及提问,不知是否准确:
非常感谢您的工作,期待您的解答!
The text was updated successfully, but these errors were encountered:
你好!
Sorry, something went wrong.
感谢您的解答!
作者您好,我在学习您相关工作时遇到了一些困惑,以下是我的理解以及提问,不知是否准确: 您在Decoder部分使用了ProbMask,以对选中(activation态)的q进行mask,使其只获得对应时间之前的Attn值。我不理解的是:既然使用Generative Inference以及label,不会导致未来的信息泄露,为何还要进行mask?(按照我有限的理解,transformer在decoder中使用mask是为了方便train时并行计算,但为何在valid/test时保留mask呢?我也有类似的困惑) 您在ablation study部分有将dynamic decoding和generative style inference对比,给出了预测序列长度为336和480的两组对比结果,其中为336时结果接近,为480时generative style inference有较明显优势,请问当预测序列更短时,dynamic decoding会表现比generative style inference更好么? 非常感谢您的工作,期待您的解答!
请问,这里提到的“......既然使用Generative Inference以及label,不会导致未来的信息泄露......”是基于什么原因呢,或者在哪里能找到依据呢,谢谢
No branches or pull requests
作者您好,我在学习您相关工作时遇到了一些困惑,以下是我的理解以及提问,不知是否准确:
非常感谢您的工作,期待您的解答!
The text was updated successfully, but these errors were encountered: