Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在论文中gelu是被用到的,但是在您这段代码中,最后一次的gelu是没有用到的,这个是否会对结果有较大影响呢? #16

Open
wmc1421910835 opened this issue Sep 18, 2023 · 1 comment

Comments

@wmc1421910835
Copy link

图片

还想问一个,就是为什么我在用您的代码能跑通,但是precision,f等分数都是0.0,是我哪里用错了吗?
图片
这是用到的gpu
图片

@yhcc
Copy link
Owner

yhcc commented May 20, 2024

最后一次应该不能用gelu了,因为他马上要输出会经过sigmoid了。
第二个问题我感觉和gpu应该是无关的,你可以尝试看看能不能overfit一小部分数据

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants