Factorization Machine，简称FM（因子分解机），通常被用来解决大规模稀疏数据下的特征组合问题。

1. 线性模型，未考虑特征之间的组合关系：
![](http://7xnkah.com1.z0.glb.clouddn.com/%E5%BE%AE%E4%BF%A1%E6%88%AA%E5%9B%BE_20171107153306.png)

2. 二项式模型，加入特征之间的关系，但是在数据稀疏的场景中，二次项训练比较困难：
![](http://7xnkah.com1.z0.glb.clouddn.com/%E5%BE%AE%E4%BF%A1%E6%88%AA%E5%9B%BE_20171107151915.png)

3. FM 模型，为每个特征维度引入一个辅助向量 V：
![](http://7xnkah.com1.z0.glb.clouddn.com/%E5%BE%AE%E4%BF%A1%E6%88%AA%E5%9B%BE_20171107152131.png)
其中
![](http://7xnkah.com1.z0.glb.clouddn.com/%E5%BE%AE%E4%BF%A1%E6%88%AA%E5%9B%BE_20171107152048.png)
上面式子最后一项通过如下变换，可以得到简化后的结果（复杂度由 kn^2 变为 kn）
![](http://7xnkah.com1.z0.glb.clouddn.com/%E5%BE%AE%E4%BF%A1%E6%88%AA%E5%9B%BE_20171107152140.png)

这里主要列一下 FM 相关的公式，便于后面代码实现参考，详细一些介绍可以参考 http://blog.csdn.net/jediael_lu/article/details/77772565

In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
from sklearn import metrics



In [2]:
# 读取 iris 数据
iris = load_iris()  
x = iris["data"]  
y = iris["target"].reshape(-1,1) 

# 去除 label 为 2 的数据，变为二分类问题
idxs = (y !=2).flatten()
x, y = x[idxs], y[idxs]  

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

In [3]:
x_train.shape, x_test.shape

((70, 4), (30, 4))

In [4]:
import tensorflow as tf

In [5]:
n = x_train.shape[1]
# 潜在因子，越大拟合能力越强，越小泛化能力越强
k = 5

X = tf.placeholder('float', shape=[None, n])
y = tf.placeholder('float', shape=[None, 1])

# 初始化 0 次项，1 次项，辅助向量 v
w0 = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.zeros([n])) 
V = tf.Variable(tf.random_normal([n, k], stddev=0.01))

In [6]:
# FM 公式前两部分
linear_terms = tf.add(w0,
                      tf.reduce_sum(tf.multiply(W, X),
                                    axis=1,
                                    keep_dims=True))

In [7]:
# FM 公式最后一部分
interactions = (tf.multiply(0.5,
                tf.reduce_sum(
                    tf.subtract(
                        tf.pow(tf.matmul(X, V), 2),
                        tf.matmul(tf.pow(X, 2), tf.pow(V, 2))),
                    1, keep_dims=True)))

In [8]:
# mse error
# y_hat = tf.add(linear_terms, interactions)
# loss = tf.reduce_mean(tf.square(tf.subtract(y, y_hat)))

In [9]:
# 交叉熵损失
logits = tf.add(linear_terms, interactions)
y_hat = tf.sigmoid(logits)
loss = tf.losses.sigmoid_cross_entropy(y, logits)

INFO:tensorflow:logits.dtype=<dtype: 'float32'>.
INFO:tensorflow:multi_class_labels.dtype=<dtype: 'float32'>.
INFO:tensorflow:losses.dtype=<dtype: 'float32'>.


In [10]:
eta = tf.constant(0.1)
optimizer = tf.train.AdagradOptimizer(eta).minimize(loss)

In [11]:
N_EPOCHS = 2000

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(N_EPOCHS):
        indices = np.arange(x_train.shape[0])
        np.random.shuffle(indices)
        x_data, y_data = x_train[indices], y_train[indices]
        sess.run(optimizer, feed_dict={X: x_data, y: y_data})

    print('train MSE: ', sess.run(loss, feed_dict={X: x_train, y: y_train}))
    print('test MSE: ', sess.run(loss, feed_dict={X: x_test, y: y_test}))
    test_pred = sess.run(y_hat, feed_dict={X: x_test}).flatten()
    print('test AUC: ', metrics.roc_auc_score(y_score=test_pred, y_true=y_test.flatten()))
    print('test Logloss: ', metrics.log_loss(y_pred=test_pred.tolist(), y_true=y_test.flatten()))

train MSE:  9.71238e-05
test MSE:  1.99851e-05
test AUC:  1.0
test Logloss:  1.99830922837e-05


参考：  
http://blog.csdn.net/jediael_lu/article/details/77772565  
http://nowave.it/factorization-machines-with-tensorflow.html  
http://blog.csdn.net/u013818406/article/details/70194575  
http://www.cnblogs.com/pinard/p/6370127.html  
https://tech.meituan.com/deep-understanding-of-ffm-principles-and-practices.html  