关于模型代码 #3

bolin-chen · 2022-08-22T09:00:03Z

你好，模型代码里有两处地方不是很懂，可以帮忙看一下吗？

class TargetNet 里的 forward 函数

    def forward(self, x, paras):

        q = self.fc1(x)
        # print(q.shape)
        q = self.bn1(q)
        q = self.relu1(q)
        q = self.drop1(q) 

        self.lin = nn.Sequential(TargetFC(paras['res_last_out_w'], paras['res_last_out_b']))
        q = self.lin(q)
        q = self.softmax(q)
        return q

其中 res_last_out_w 的 shape 是 [batch_size, 100]， res_last_out_b 是 [batch_size, 1]，self.lin 的输入 tensor 的 shape 是 [batch_size, 100]，这样 self.lin 的输出 tensor 的 shape 为 [batch_size, batch_size]，是一个 shape 与 batch_size 相关的 tensor，这样如果 batch_size 为 1 的话，这个函数输出的 tensor 的 shape 就固定为 [1, 1]，值也就固定为 1，这样等于主题网络部分输出一个固定为 1 的值，应该是有点问题？

Attention 函数里

def Attention(x):
    batch_size, in_channels, h, w = x.size()
    quary = x.view(batch_size, in_channels, -1)
    key = quary
    quary = quary.permute(0, 2, 1)

    sim_map = torch.matmul(quary, key)

    ql2 = torch.norm(quary, dim=2, keepdim=True)
    kl2 = torch.norm(key, dim=1, keepdim=True)
    sim_map = torch.div(sim_map, torch.matmul(ql2, kl2).clamp(min=1e-8))

    return sim_map

这里的实现跟论文里说的似乎不一样？这里的做法应该是 value 的 similarity_map 除以归一化值的 similarity_map，而非论文里说的常规 attention 去掉 V。

The text was updated successfully, but these errors were encountered:

Wunaiq · 2023-01-13T08:18:53Z

同样也发现了这两个问题，尤其是1会导致batch_size不同时，inference结果不同的情况。
另外也很好奇为什么要这样做，直接在这里把w和b设置成可学习的参数和现在的做法会有什么区别吗？

woshidandan · 2023-01-13T08:46:52Z

你好，模型代码里有两处地方不是很懂，可以帮忙看一下吗？

class TargetNet 里的 forward 函数
    def forward(self, x, paras):



        q = self.fc1(x)

        # print(q.shape)

        q = self.bn1(q)

        q = self.relu1(q)

        q = self.drop1(q) 



        self.lin = nn.Sequential(TargetFC(paras['res_last_out_w'], paras['res_last_out_b']))

        q = self.lin(q)

        q = self.softmax(q)

        return q
其中 res_last_out_w 的 shape 是 [batch_size, 100]， res_last_out_b 是 [batch_size, 1]，self.lin 的输入 tensor 的 shape 是 [batch_size, 100]，这样 self.lin 的输出 tensor 的 shape 为 [batch_size, batch_size]，是一个 shape 与 batch_size 相关的 tensor，这样如果 batch_size 为 1 的话，这个函数输出的 tensor 的 shape 就固定为 [1, 1]，值也就固定为 1，这样等于主题网络部分输出一个固定为 1 的值，应该是有点问题？

Attention 函数里
def Attention(x):

    batch_size, in_channels, h, w = x.size()

    quary = x.view(batch_size, in_channels, -1)

    key = quary

    quary = quary.permute(0, 2, 1)



    sim_map = torch.matmul(quary, key)



    ql2 = torch.norm(quary, dim=2, keepdim=True)

    kl2 = torch.norm(key, dim=1, keepdim=True)

    sim_map = torch.div(sim_map, torch.matmul(ql2, kl2).clamp(min=1e-8))



    return sim_map
这里的实现跟论文里说的似乎不一样？这里的做法应该是 value 的 similarity_map 除以归一化值的 similarity_map，而非论文里说的常规 attention 去掉 V。

抱歉，因为最近在忙别的工作，自己一直没有登录这个账号，没看到问题。
第一个问题，没有特别的原因，主要是考虑到我们模型嵌入到移动端时，所做出的取舍，想把它的维度和计算量迅速降下去，所以就设定shape为1了。第二个问题，这是一种attention的简化表达形式，也是出于移动端优化的考虑，严格意义上说只是在计算相似性，因为ijcai的篇幅太有限不方面说明，文章里只好将其往self-attention那边靠。
另外，我们有做过实验，改变shape或者用原生的self-attention，对性能影响不是特别大，您可以自己试一下。

Wunaiq mentioned this issue Jan 30, 2023

不同batch size时inference结果不同 #8

Closed

woshidandan closed this as completed Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于模型代码 #3

关于模型代码 #3

bolin-chen commented Aug 22, 2022 •

edited

Loading

Wunaiq commented Jan 13, 2023

woshidandan commented Jan 13, 2023

关于模型代码 #3

关于模型代码 #3

Comments

bolin-chen commented Aug 22, 2022 • edited Loading

Wunaiq commented Jan 13, 2023

woshidandan commented Jan 13, 2023

bolin-chen commented Aug 22, 2022 •

edited

Loading