# FM-隐向量特征交叉

$FM = w_0 + \sum_{i=1}^n w_ix_i + \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} w_{ij}x_ix_j$
$前两项其实就是一阶加权特征，计算复杂度为O(n)，第三项中的权重 w_{ij}，这儿使用到了矩阵分解，分解为 W=V^TV , v_i、v_j分别为x_i、x_j的隐向量$

<img src="../data/img/fm.jpeg" sytle="zoom:50%"/>

$FM = w_0 + \sum_{i=1}^n w_ix_i + \sum_{i=1}^{n-1} \sum_{j=i+1}^n <v_i, v_j> x_ix_j$

$<v_i, v_j> = v_i \cdot v_j = \sum_{f=1}^k v_{if}v_{jf}$

$\sum_{i=1}^n \sum_{j=1}^n = (\sum_{i=1}^n x_i)^2$

我们假设隐向量的长度为k ，那么交叉项的参数量变为 kn 个。此时时间复杂度仍为$O(kn^2)$，通过以下方式可以简化为O(kn)，如下图：

<img src="../data/img/fm_simple.webp" sytle="zoom:50%"/>

最终公式
$FM = w_0 + \sum_{i=1}^n w_ix_i + \frac {1}{2} \sum_{f=1}^k [(\sum_{i=1}^n v_{if}x_i)^2 - \sum_{i=1}^n v_{if}^2x_i^2]$

In [None]:
import numpy as np
import torch
import torch.nn as nn


class FactorizationMachine(nn.Module):
    """
        Factorization Machine
    """

    def __init__(self, feature_fields, embed_dim):
        """
            feature_fileds : array_like
                             类别特征的field的数目
        """
        super(FactorizationMachine, self).__init__()

        # 输入的是label coder 用输出为1的embedding来形成linear part
        self.linear = torch.nn.Embedding(sum(feature_fields) + 1, 1)
        self.bias = torch.nn.Parameter(torch.zeros((1,)))

        self.embedding = torch.nn.Embedding(sum(feature_fields) + 1, embed_dim)
        self.offset = np.array((0, *np.cumsum(feature_fields)[:-1]), dtype=np.long)
        nn.init.xavier_uniform_(self.embedding.weight.data)

    def forward(self, x):
        tmp = x + x.new_tensor(self.offset).unsqueeze(0)  # bs,fields_num [bs,22]

        # 线性层
        # bs,fields_num,1 -> bs,1 [bs,22,1]->[bs,1]
        linear_part = torch.sum(self.linear(tmp), dim=1) + self.bias
        # print("linear_part shape", linear_part.shape)

        # 内积项
        ## embedding
        # [bs,1] -> bs,1,embedding_dim [bs,1,8]
        tmp = self.embedding(tmp)
        ##  XY
        # bs,1,embedding_dim -> bs,embedding_dim;; [bs,1,8]->bs,8
        square_of_sum = torch.sum(tmp, dim=1) ** 2
        # bs,1,embedding_dim -> bs,embedding_dim;; [bs,1,8]->bs,8
        sum_of_square = torch.sum(tmp ** 2, dim=1)
        # 加权线性层与FM层之和
        x = linear_part + 0.5 * torch.sum(square_of_sum - sum_of_square, dim=1, keepdim=True)
        # sigmoid
        x = torch.sigmoid(x.squeeze(1))
        return x

## FFM

FFM在FM的基础上进行改进，提出了特征域的概念，特征域里是同一个特征的不同取值。FM做法对于不同特征交叉认为是同等重要的，然而FFM的理论是不同特征的交叉影响不同，举个简单例子。
比如有性别、年龄、职业三种特征，那么在与“职业”中的“清洁工”特征交叉时“男性”的隐向量是 $v_{男性,职业}$，在与“年龄”中的“中年”特征交叉时，“男性”的隐向量是 $v_{男性,年龄}$。
这种思维更符合实际场景，不同的特征交叉权重确实应该不同。

<img src="../data/img/ffm_table.png"  style="zoom:50%"/>


FFM模型认为$v_i$不仅跟$x_i$有关系，还跟与$x_i$相乘的$x_j$所属的Field有关系，即$v_i$成了一个二维向量$v_{F\times K}$，$F$是Field的总个数。FFM只保留了(FM
)中的二次项.

$\hat y =\sum_{i=1}^n\sum_{j=i+1}^n v_{i,fj}\cdot v_{j,fi}x_ix_j$

以上文的表格数据为例，计算用户1的$\hat{y}$

$\hat y =v_{1,f2} \cdot v_{2,f1} x_1x_2 + v_{1,f3}\cdot v_{3,f1}x_1x_3 + v_{1,f4}\cdot v_{4,f1}x_1x_4 + ⋯$
 

由于$x_2,x_3,x_4$属于同一个Field，所以$f2,f3,f4$可以用同一个变量来代替，比如就用$f2$。

$\hat y =v_{1,f2} \cdot v_{2,f1} x_1x_2 + v_{1,f2}\cdot v_{3,f1}x_1x_3 + v_{1,f2}\cdot v_{4,f1}x_1x_4 + ⋯$

 
我们来算一下$\hat{y}$对$v_{1,f2}$的偏导。

$ \frac {\partial \hat y}{\partial v_{1, f2}} = v_{2,f1}x_1x_2 + v_{3,f1}x_1x_3 + v_{4,f1}x_1x_4$

注意$x_2,x_3,x_4$是同一个属性的one-hot表示，即$x_2,x_3,x_4$中只有一个为1，其他都为0。在本例中$x_3=x_4=0, x_2=1$，所以

$ \frac {\partial \hat y}{\partial v_{1, f2}} = v_{2,f1}x_1x_2 $

推广到一般情况    
$ \frac {\partial \hat y}{\partial v_{i, fj}} = v_{i,fj}x_ix_kj $


$z=\phi(v,x)=\sum_{i=1}^n \sum_{j=i+1}^n v_{i,fj}\cdot v_{j,fi}x_ix_j$

$ \frac {\partial z}{\partial v_{i, fj}} = v_{i,fj}x_ix_kj $

$a=\sigma(z)=\frac {1}{1+ e^{-z}}= \frac{1}{1 + e^{-\phi(v,x)}}$

令$y=0$表示负样本，$y=1$表示正样本，$C$表示交叉熵损失函数

$κ=\frac {\partial C}{\partial z} = a - y $

$缺点: - FFM公式无法化简，计算复杂度较高，FFM需要学习n个特征在f个域上的k维隐向量，参数量nfk个，复杂度 O(kn^2)$


In [None]:
import numpy as np
import torch
import torch.nn as nn


class FieldAwareFactorizationMachine(nn.Module):
    """
        FFM 
    """

    def __init__(self, field_dims, embed_dim):
        super(FieldAwareFactorizationMachine, self).__init__()

        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)

        # 输入的是label coder 用输出为1的embedding来形成linear part
        # linear part
        self.linear = torch.nn.Embedding(sum(field_dims) + 1, 1)
        self.bias = torch.nn.Parameter(torch.zeros((1,)))

        # ffm part
        print("field_dims", field_dims)
        self.num_fields = len(field_dims)  # 特征域的数目
        self.embeddings = torch.nn.ModuleList([
            torch.nn.Embedding(sum(field_dims), embed_dim) for _ in range(self.num_fields)
        ])
        for embedding in self.embeddings:
            torch.nn.init.xavier_uniform_(embedding.weight.data)

    def forward(self, x):
        # bs,fields_num [bs,22]
        tmp = x + x.new_tensor(self.offsets).unsqueeze(0)
        # linear part forward
        ## bs,fields_num,1 -> bs,1 [bs,22,1]->[bs,1]
        linear_part = torch.sum(self.linear(tmp), dim=1) + self.bias
        # ffm part forward
        # 为每一个field都使用embedding进行映射编码
        # 每个embedding中的shape应该为:bs,filed_num,embedding_num -> bs,22,8
        xs = [self.embeddings[i](x) for i in range(self.num_fields)]
        ix = []
        for i in range(self.num_fields - 1):
            for j in range(i + 1, self.num_fields):
                # xs[j].shape: torch.Size([2, 22, 8]) bs,field_nums,embedding_num
                # xs[j].shape: torch.Size([2, 22, 8])
                # xs[j][:, i] shape: torch.Size([2, 8]) bs,embedding_num
                ix.append(xs[j][:, i] * xs[i][:, j])
        # print("ix len:",len(ix)) 231
        # print("ix [0]:",ix[0].shape) #bs,embdding_num -> bs,8
        ix = torch.stack(ix, dim=1)  # ix: -> bs,231,embedding_num
        ffm_part = torch.sum(torch.sum(ix, dim=1), dim=1, keepdim=True)  # bs,231,embedding_num -> bs,embedding -> bs,1

        x = linear_part + ffm_part
        x = torch.sigmoid(x.squeeze(1))
        return x