### 来说说数据预处理

- UserID、Occupation和MovieID不用变。
- Gender字段：需要将‘F’和‘M’转换成0和1。
- Age字段：要转成7个连续数字0~6。
- Genres字段：是分类字段，要转成数字。首先将Genres中的类别转成字符串到数字的字典，然后再将每个电影的Genres字段转成数字列表，因为有些电影是多个Genres的组合。
- Title字段：处理方式跟Genres字段一样，首先创建文本到数字的字典，然后将Title中的描述转成数字的列表。另外Title中的年份也需要去掉。
- Genres和Title字段需要将长度统一，这样在神经网络中方便处理。空白部分用‘< PAD >’对应的数字填充。

In [1]:
import numpy as np
import pandas as pd
from collections import Counter
import tensorflow as tf
from tensorflow.python.ops import math_ops
from sklearn.model_selection import train_test_split
import re
import pickle

E:\Anaconda3\envs\dl\lib\site-packages\numpy\.libs\libopenblas.IPBC74C7KURV7CB2PKT5Z5FNR3SIBV4J.gfortran-win_amd64.dll
E:\Anaconda3\envs\dl\lib\site-packages\numpy\.libs\libopenblas.TXA6YQSD3GCQQC22GEQ54J2UDCXDXHWN.gfortran-win_amd64.dll
  stacklevel=1)


In [2]:
def load_data():
    """
    加载数据集
    """
    #读取User数据
    users_title = ['UserID', 'Gender', 'Age', 'JobID', 'Zip-code']
    users = pd.read_csv('./ml-1m/ml-1m/users.dat',
                        sep='::',
                        header=None,
                        names=users_title,
                        engine='python')
    users = users.filter(regex='UserID|Gender|Age|JobID')
    users_orig = users.values
    #改变User数据中性别和年龄
    gender_map = {'F': 0, 'M': 1}
    users['Gender'] = users['Gender'].map(gender_map)

    age_map = {val: ii for ii, val in enumerate(set(users['Age']))}
    users['Age'] = users['Age'].map(age_map)

    #读取Movie数据集
    movies_title = ['MovieID', 'Title', 'Genres']
    movies = pd.read_csv('./ml-1m/ml-1m/movies.dat',
                         sep='::',
                         header=None,
                         names=movies_title,
                         engine='python')
    movies_orig = movies.values
    #将Title中的年份去掉
    pattern = re.compile(r'^(.*)\((\d+)\)$')

    title_map = {
        val: pattern.match(val).group(1)
        for ii, val in enumerate(set(movies['Title']))
    }
    movies['Title'] = movies['Title'].map(title_map)

    #电影类型转数字字典
    genres_set = set()
    for val in movies['Genres'].str.split('|'):
        genres_set.update(val)

    genres_set.add('<PAD>')
    genres2int = {val: ii for ii, val in enumerate(genres_set)}

    #将电影类型转成等长数字列表，长度是18
    genres_map = {
        val: [genres2int[row] for row in val.split('|')]
        for ii, val in enumerate(set(movies['Genres']))
    }

    for key in genres_map:
        for cnt in range(max(genres2int.values()) - len(genres_map[key])):
            genres_map[key].insert(
                len(genres_map[key]) + cnt, genres2int['<PAD>'])

    movies['Genres'] = movies['Genres'].map(genres_map)

    #电影Title转数字字典
    title_set = set()
    for val in movies['Title'].str.split():
        title_set.update(val)

    title_set.add('<PAD>')
    title2int = {val: ii for ii, val in enumerate(title_set)}

    #将电影Title转成等长数字列表，长度是15
    title_count = 15
    title_map = {
        val: [title2int[row] for row in val.split()]
        for ii, val in enumerate(set(movies['Title']))
    }

    for key in title_map:
        for cnt in range(title_count - len(title_map[key])):
            title_map[key].insert(
                len(title_map[key]) + cnt, title2int['<PAD>'])

    movies['Title'] = movies['Title'].map(title_map)

    #读取评分数据集
    ratings_title = ['UserID', 'MovieID', 'ratings', 'timestamps']
    ratings = pd.read_csv('./ml-1m/ml-1m/ratings.dat',
                          sep='::',
                          header=None,
                          names=ratings_title,
                          engine='python')
    ratings = ratings.filter(regex='UserID|MovieID|ratings')

    #合并三个表
    data = pd.merge(pd.merge(ratings, users), movies)

    #将数据分成X和y两张表
    target_fields = ['ratings']
    features_pd, targets_pd = data.drop(target_fields,
                                        axis=1), data[target_fields]

    features = features_pd.values
    targets_values = targets_pd.values

    return title_count, title_set, genres2int, features, targets_values, ratings, users, movies, data, movies_orig, users_orig

### 加载数据并保存到本地

- title_count：Title字段的长度（15）
- title_set：Title文本的集合
- genres2int：电影类型转数字的字典
- features：是输入X
- targets_values：是学习目标y
- ratings：评分数据集的Pandas对象
- users：用户数据集的Pandas对象
- movies：电影数据的Pandas对象
- data：三个数据集组合在一起的Pandas对象
- movies_orig：没有做数据处理的原始电影数据
- users_orig：没有做数据处理的原始用户数据

In [3]:
title_count, title_set, genres2int, features, targets_values, ratings, users, movies, data, movies_orig, users_orig = load_data()

pickle.dump((title_count, title_set, genres2int, features, targets_values,
             ratings, users, movies, data, movies_orig, users_orig),
            open('preprocess.p', 'wb'))

### 预处理后数据

In [4]:
users.head()

Unnamed: 0,UserID,Gender,Age,JobID
0,1,0,0,10
1,2,1,5,16
2,3,1,6,15
3,4,1,2,7
4,5,1,6,20


In [5]:
movies.head()

Unnamed: 0,MovieID,Title,Genres
0,1,"[2382, 3040, 4027, 4027, 4027, 4027, 4027, 402...","[4, 3, 2, 17, 17, 17, 17, 17, 17, 17, 17, 17, ..."
1,2,"[4563, 4027, 4027, 4027, 4027, 4027, 4027, 402...","[11, 3, 7, 17, 17, 17, 17, 17, 17, 17, 17, 17,..."
2,3,"[2783, 2857, 2260, 4027, 4027, 4027, 4027, 402...","[2, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17..."
3,4,"[4340, 2199, 4214, 4027, 4027, 4027, 4027, 402...","[2, 8, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,..."
4,5,"[2447, 992, 2892, 1195, 1191, 4404, 4027, 4027...","[2, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17..."


In [6]:
movies.values[0]

array([1,
       list([2382, 3040, 4027, 4027, 4027, 4027, 4027, 4027, 4027, 4027, 4027, 4027, 4027, 4027, 4027]),
       list([4, 3, 2, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17])],
      dtype=object)

In [7]:
# 从本地读取数据
title_count, title_set, genres2int, features, targets_values, ratings, users, movies, data, movies_orig, users_orig = pickle.load(
    open('preprocess.p', mode='rb'))

### 模型设计  
通常的处理是将这些字段转成one hot编码，但是像UserID、MovieID这样的字段就会变成非常的稀疏，输入的维度急剧膨胀  
所以在预处理数据时将这些字段转成了数字，我们用这个数字当做嵌入矩阵的索引，在网络的第一层使用了嵌入层，维度是（N，32）和（N，16）  
电影类型的处理要多一步，有时一个电影有多个电影类型，这样从嵌入矩阵索引出来是一个（n，32）的矩阵，因为有多个类型嘛，我们要将这个矩阵求和，变成（1，32）的向量  

## 文本卷积网络
网络的第一层是词嵌入层，由每一个单词的嵌入向量组成的嵌入矩阵。下一层使用多个不同尺寸（窗口大小）的卷积核在嵌入矩阵上做卷积，窗口大小指的是每次卷积覆盖几个单词。这里跟对图像做卷积不太一样，图像的卷积通常用2x2、3x3、5x5之类的尺寸，而文本卷积要覆盖整个单词的嵌入向量，所以尺寸是（单词数，向量维度），比如每次滑动3个，4个或者5个单词。第三层网络是max pooling得到一个长向量，最后使用dropout做正则化，最终得到了电影Title的特征。

### 辅助函数

In [8]:
def save_params(params):
    '''
    参数保存到文件
    '''
    pickle.dump(params, open('params.p', 'wb'))


def load_params():
    '''
    参数保存到文件
    '''
    return pickle.load(open('params.p', mode='rb'))

### 编码实现

In [9]:
# 嵌入矩阵维度
embed_dim = 32
# 用户ID个数
uid_max = max(features.take(0, 1)) + 1 # 6040
# 性别个数 
gender_max = max(features.take(2, 1)) + 1 # 1 + 1 = 2
# 年龄类别个数
age_max = max(features.take(3, 1)) + 1 # 6 + 1 = 7
# 职业个数
job_max = max(features.take(4, 1)) + 1 # 20 + 1 = 21

# 电影ID个数
movie_id_max = max(features.take(1, 1)) + 1 # 3952
# 电影类型地个数
movie_categories_max = max(genres2int.values()) + 1 # 18 +1 + 19
# 电影名单词个数
movie_title_max = len(title_set) # 5216

# 对电影类型嵌入向量做加和操作地标志
combiner = 'sum'

# 电影名长度
sentences_size = title_count # 15
# 文本卷积滑动窗口， 分别滑动2， 3， 4， 5个单词
window_sizes = {2, 3, 4, 5}
# 文本卷积核数量
filter_num = 8
# 电影ID转下标地字典，数据集中电影ID跟下标不一致
movieid2idx = {val[0]:i for i, val in enumerate(movies.values)}

### 超参

In [10]:
# 
num_epochs = 5
batch_size = 256

dropout_keep = 0.5
learning_rate = 0.001

show_every_n_batch = 20

save_dir = './save'

### 输入
定义输入占位符

In [11]:
def get_inputs():
    uid = tf.keras.layers.Input(shape=(1, ), dtype='int32', name='uid')
    user_gender = tf.keras.layers.Input(shape=(1, ),
                                        dtype='int32',
                                        name='user_gender')
    user_age = tf.keras.layers.Input(shape=(1, ),
                                     dtype='int32',
                                     name='user_age')
    user_job = tf.keras.layers.Input(shape=(1, ),
                                     dtype='int32',
                                     name='user_job')

    movie_id = tf.keras.layers.Input(shape=(1, ),
                                     dtype='int32',
                                     name='movie_id')
    movie_categories = tf.keras.layers.Input(shape=(18, ),
                                             dtype='int32',
                                             name='movie_categories')
    movie_titles = tf.keras.layers.Input(shape=(15, ),
                                         dtype='int32',
                                         name='movie_titles')
    return uid, user_gender, user_age, user_job, movie_id, movie_categories, movie_titles

### 构建神经网络
定义User地嵌入矩阵

In [12]:
def get_user_embedding(uid, user_gender, user_age, user_job):
    uid_embed_layer = tf.keras.layers.Embedding(uid_max,
                                                embed_dim,
                                                input_length=1,
                                                name='uid_embed_layer')(uid)
    gender_embed_layer = tf.keras.layers.Embedding(
        gender_max, embed_dim // 2, input_length=1,
        name='gender_embed_layer')(user_gender)
    age_embed_layer = tf.keras.layers.Embedding(
        age_max, embed_dim // 2, input_length=1,
        name='age_embed_layer')(user_age)
    job_embed_layer = tf.keras.layers.Embedding(
        job_max, embed_dim // 2, input_length=1,
        name='job_embed_layer')(user_job)
    return uid_embed_layer, gender_embed_layer, age_embed_layer, job_embed_layer

User地嵌入矩阵一起全连接生成User地特征

In [13]:
def get_user_feature_layer(uid_embed_layer, gender_embed_layer,
                           age_embed_layer, job_embed_layer):
    #第一层全连接
    uid_fc_layer = tf.keras.layers.Dense(embed_dim,
                                         name="uid_fc_layer",
                                         activation='relu')(uid_embed_layer)
    gender_fc_layer = tf.keras.layers.Dense(
        embed_dim, name="gender_fc_layer",
        activation='relu')(gender_embed_layer)
    age_fc_layer = tf.keras.layers.Dense(embed_dim,
                                         name="age_fc_layer",
                                         activation='relu')(age_embed_layer)
    job_fc_layer = tf.keras.layers.Dense(embed_dim,
                                         name="job_fc_layer",
                                         activation='relu')(job_embed_layer)

    #第二层全连接
    user_combine_layer = tf.keras.layers.concatenate(
        [uid_fc_layer, gender_fc_layer, age_fc_layer, job_fc_layer],
        2)  #(?, 1, 128)
    user_combine_layer = tf.keras.layers.Dense(200, activation='tanh')(
        user_combine_layer)  #(?, 1, 200)

    user_combine_layer_flat = tf.keras.layers.Reshape(
        [200], name="user_combine_layer_flat")(user_combine_layer)
    return user_combine_layer, user_combine_layer_flat

定义Movie ID地嵌入矩阵

In [14]:
def get_movie_id_embed_layer(movie_id):
    movie_id_embed_layer = tf.keras.layers.Embedding(
        movie_id_max, embed_dim, input_length=1,
        name='movie_id_embed_layer')(movie_id)
    return movie_id_embed_layer

合并电影类型地多个嵌入向量

In [15]:
def get_movie_categories_layers(movie_categories):
    movie_categories_embed_layer = tf.keras.layers.Embedding(
        movie_categories_max,
        embed_dim,
        input_length=18,
        name='movie_categories_embed_layer')(movie_categories)
    movie_categories_embed_layer = tf.keras.layers.Lambda(
        lambda layer: tf.reduce_sum(layer, axis=1, keepdims=True))(
            movie_categories_embed_layer)
    #     movie_categories_embed_layer = tf.keras.layers.Reshape([1, 18 * embed_dim])(movie_categories_embed_layer)

    return movie_categories_embed_layer

Movie Title地文本卷积网络实现

In [16]:
def get_movie_cnn_layer(movie_titles):
    #从嵌入矩阵中得到电影名对应的各个单词的嵌入向量
    movie_title_embed_layer = tf.keras.layers.Embedding(
        movie_title_max,
        embed_dim,
        input_length=15,
        name='movie_title_embed_layer')(movie_titles)
    sp = movie_title_embed_layer.shape
    movie_title_embed_layer_expand = tf.keras.layers.Reshape(
        [sp[1], sp[2], 1])(movie_title_embed_layer)
    #对文本嵌入层使用不同尺寸的卷积核做卷积和最大池化
    pool_layer_lst = []
    for window_size in window_sizes:
        conv_layer = tf.keras.layers.Conv2D(
            filter_num, (window_size, embed_dim), 1,
            activation='relu')(movie_title_embed_layer_expand)
        maxpool_layer = tf.keras.layers.MaxPooling2D(
            pool_size=(sentences_size - window_size + 1, 1),
            strides=1)(conv_layer)
        pool_layer_lst.append(maxpool_layer)
    #Dropout层
    pool_layer = tf.keras.layers.concatenate(pool_layer_lst,
                                             3,
                                             name="pool_layer")
    max_num = len(window_sizes) * filter_num
    pool_layer_flat = tf.keras.layers.Reshape(
        [1, max_num], name="pool_layer_flat")(pool_layer)

    dropout_layer = tf.keras.layers.Dropout(
        dropout_keep, name="dropout_layer")(pool_layer_flat)
    return pool_layer_flat, dropout_layer

将Movie地各个层一起做全连接

In [17]:
def get_movie_feature_layer(movie_id_embed_layer, movie_categories_embed_layer,
                            dropout_layer):
    #第一层全连接
    movie_id_fc_layer = tf.keras.layers.Dense(
        embed_dim, name="movie_id_fc_layer",
        activation='relu')(movie_id_embed_layer)
    movie_categories_fc_layer = tf.keras.layers.Dense(
        embed_dim, name="movie_categories_fc_layer",
        activation='relu')(movie_categories_embed_layer)

    #第二层全连接
    movie_combine_layer = tf.keras.layers.concatenate(
        [movie_id_fc_layer, movie_categories_fc_layer, dropout_layer], 2)
    movie_combine_layer = tf.keras.layers.Dense(
        200, activation='tanh')(movie_combine_layer)

    movie_combine_layer_flat = tf.keras.layers.Reshape(
        [200], name="movie_combine_layer_flat")(movie_combine_layer)
    return movie_combine_layer, movie_combine_layer_flat

### 构建计算图

In [18]:
import datetime
from tensorflow import keras
from tensorflow.python.ops import summary_ops_v2
import time
import os

In [19]:
MODEL_DIR = './models'

In [20]:
class mv_network(object):
    def __init__(self, batch_size=256):
        self.batch_size = batch_size
        self.best_loss = 9999
        self.losses = {'train': [], 'test': []}

        # 获取输入占位符
        uid, user_gender, user_age, user_job, movie_id, movie_categories, movie_titles = get_inputs(
        )
        # 获取User的4个嵌入向量
        uid_embed_layer, gender_embed_layer, age_embed_layer, job_embed_layer = get_user_embedding(
            uid, user_gender, user_age, user_job)
        # 得到用户特征
        user_combine_layer, user_combine_layer_flat = get_user_feature_layer(
            uid_embed_layer, gender_embed_layer, age_embed_layer,
            job_embed_layer)
        # 获取电影ID的嵌入向量
        movie_id_embed_layer = get_movie_id_embed_layer(movie_id)
        # 获取电影类型的嵌入向量
        movie_categories_embed_layer = get_movie_categories_layers(
            movie_categories)
        # 获取电影名的特征向量
        pool_layer_flat, dropout_layer = get_movie_cnn_layer(movie_titles)
        # 得到电影特征
        movie_combine_layer, movie_combine_layer_flat = get_movie_feature_layer(
            movie_id_embed_layer, movie_categories_embed_layer, dropout_layer)
        # 计算出评分
        # 将用户特征和电影特征做矩阵乘法得到一个预测评分的方案
        inference = tf.keras.layers.Lambda(
            lambda layer: tf.reduce_sum(layer[0] * layer[1], axis=1),
            name="inference")(
                (user_combine_layer_flat, movie_combine_layer_flat))
        inference = tf.keras.layers.Lambda(
            lambda layer: tf.expand_dims(layer, axis=1))(inference)

        # 将用户特征和电影特征作为输入，经过全连接，输出一个值的方案
        #         inference_layer = tf.keras.layers.concatenate([user_combine_layer_flat, movie_combine_layer_flat],
        #                                                       1)  # (?, 400)
        # 你可以使用下面这个全连接层，试试效果
        #inference_dense = tf.keras.layers.Dense(64, kernel_regularizer=tf.nn.l2_loss, activation='relu')(
        #    inference_layer)
        #         inference = tf.keras.layers.Dense(1, name="inference")(inference_layer)  # inference_dense

        self.model = tf.keras.Model(inputs=[
            uid, user_gender, user_age, user_job, movie_id, movie_categories,
            movie_titles
        ],
                                    outputs=[inference])

        self.model.summary()

        self.optimizer = tf.keras.optimizers.Adam(learning_rate)
        # MSE损失，将计算值回归到评分
        self.ComputeLoss = tf.keras.losses.MeanSquaredError()
        self.ComputeMetrics = tf.keras.metrics.MeanAbsoluteError()

        if tf.io.gfile.exists(MODEL_DIR):
            #             print('Removing existing model dir: {}'.format(MODEL_DIR))
            #             tf.io.gfile.rmtree(MODEL_DIR)
            pass
        else:
            tf.io.gfile.makedirs(MODEL_DIR)

        train_dir = os.path.join(MODEL_DIR, 'summaries', 'train')
        test_dir = os.path.join(MODEL_DIR, 'summaries', 'eval')

        #         self.train_summary_writer = summary_ops_v2.create_file_writer(train_dir, flush_millis=10000)
        #         self.test_summary_writer = summary_ops_v2.create_file_writer(test_dir, flush_millis=10000, name='test')

        checkpoint_dir = os.path.join(MODEL_DIR, 'checkpoints')
        self.checkpoint_prefix = os.path.join(checkpoint_dir, 'ckpt')
        self.checkpoint = tf.train.Checkpoint(model=self.model,
                                              optimizer=self.optimizer)

        # Restore variables on creation if a checkpoint exists.
        self.checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

    def compute_loss(self, labels, logits):
        return tf.reduce_mean(tf.keras.losses.mse(labels, logits))

    def compute_metrics(self, labels, logits):
        return tf.keras.metrics.mae(labels, logits)  #

    @tf.function
    def train_step(self, x, y):
        # Record the operations used to compute the loss, so that the gradient
        # of the loss with respect to the variables can be computed.
        #         metrics = 0
        with tf.GradientTape() as tape:
            logits = self.model([x[0], x[1], x[2], x[3], x[4], x[5], x[6]],
                                training=True)
            loss = self.ComputeLoss(y, logits)
            # loss = self.compute_loss(labels, logits)
            self.ComputeMetrics(y, logits)
            # metrics = self.compute_metrics(labels, logits)
        grads = tape.gradient(loss, self.model.trainable_variables)
        self.optimizer.apply_gradients(
            zip(grads, self.model.trainable_variables))
        return loss, logits

    def training(self, features, targets_values, epochs=5, log_freq=50):

        for epoch_i in range(epochs):
            # 将数据集分成训练集和测试集，随机种子不固定
            train_X, test_X, train_y, test_y = train_test_split(features,
                                                                targets_values,
                                                                test_size=0.2,
                                                                random_state=0)

            train_batches = get_batches(train_X, train_y, self.batch_size)
            batch_num = (len(train_X) // self.batch_size)

            train_start = time.time()
            #             with self.train_summary_writer.as_default():
            if True:
                start = time.time()
                # Metrics are stateful. They accumulate values and return a cumulative
                # result when you call .result(). Clear accumulated values with .reset_states()
                avg_loss = tf.keras.metrics.Mean('loss', dtype=tf.float32)
                #                 avg_mae = tf.keras.metrics.Mean('mae', dtype=tf.float32)

                # Datasets can be iterated over like any other Python iterable.
                for batch_i in range(batch_num):
                    x, y = next(train_batches)
                    categories = np.zeros([self.batch_size, 18])
                    for i in range(self.batch_size):
                        categories[i] = x.take(6, 1)[i]

                    titles = np.zeros([self.batch_size, sentences_size])
                    for i in range(self.batch_size):
                        titles[i] = x.take(5, 1)[i]

                    loss, logits = self.train_step([
                        np.reshape(x.take(0, 1), [self.batch_size, 1]).astype(
                            np.float32),
                        np.reshape(x.take(2, 1), [self.batch_size, 1]).astype(
                            np.float32),
                        np.reshape(x.take(3, 1), [self.batch_size, 1]).astype(
                            np.float32),
                        np.reshape(x.take(4, 1), [self.batch_size, 1]).astype(
                            np.float32),
                        np.reshape(x.take(1, 1), [self.batch_size, 1]).astype(
                            np.float32),
                        categories.astype(np.float32),
                        titles.astype(np.float32)
                    ],
                                                   np.reshape(
                                                       y,
                                                       [self.batch_size, 1
                                                        ]).astype(np.float32))
                    avg_loss(loss)
                    #                     avg_mae(metrics)
                    self.losses['train'].append(loss)

                    if tf.equal(self.optimizer.iterations % log_freq, 0):
                        #                         summary_ops_v2.scalar('loss', avg_loss.result(), step=self.optimizer.iterations)
                        #                         summary_ops_v2.scalar('mae', self.ComputeMetrics.result(), step=self.optimizer.iterations)
                        # summary_ops_v2.scalar('mae', avg_mae.result(), step=self.optimizer.iterations)

                        rate = log_freq / (time.time() - start)
                        print(
                            'Step #{}\tEpoch {:>3} Batch {:>4}/{}   Loss: {:0.6f} mae: {:0.6f} ({} steps/sec)'
                            .format(self.optimizer.iterations.numpy(), epoch_i,
                                    batch_i, batch_num, loss,
                                    (self.ComputeMetrics.result()), rate))
                        # print('Step #{}\tLoss: {:0.6f} mae: {:0.6f} ({} steps/sec)'.format(
                        #     self.optimizer.iterations.numpy(), loss, (avg_mae.result()), rate))
                        avg_loss.reset_states()
                        self.ComputeMetrics.reset_states()
                        # avg_mae.reset_states()
                        start = time.time()

            train_end = time.time()
            print('\nTrain time for epoch #{} ({} total steps): {}'.format(
                epoch_i + 1, self.optimizer.iterations.numpy(),
                train_end - train_start))
            #             with self.test_summary_writer.as_default():
            self.testing((test_X, test_y), self.optimizer.iterations)
            # self.checkpoint.save(self.checkpoint_prefix)
        self.export_path = os.path.join(MODEL_DIR, '/export')
        tf.saved_model.save(self.model, self.export_path)

    def testing(self, test_dataset, step_num):
        test_X, test_y = test_dataset
        test_batches = get_batches(test_X, test_y, self.batch_size)
        """Perform an evaluation of `model` on the examples from `dataset`."""
        avg_loss = tf.keras.metrics.Mean('loss', dtype=tf.float32)
        #         avg_mae = tf.keras.metrics.Mean('mae', dtype=tf.float32)

        batch_num = (len(test_X) // self.batch_size)
        for batch_i in range(batch_num):
            x, y = next(test_batches)
            categories = np.zeros([self.batch_size, 18])
            for i in range(self.batch_size):
                categories[i] = x.take(6, 1)[i]

            titles = np.zeros([self.batch_size, sentences_size])
            for i in range(self.batch_size):
                titles[i] = x.take(5, 1)[i]

            logits = self.model([
                np.reshape(x.take(0, 1), [self.batch_size, 1]).astype(
                    np.float32),
                np.reshape(x.take(2, 1), [self.batch_size, 1]).astype(
                    np.float32),
                np.reshape(x.take(3, 1), [self.batch_size, 1]).astype(
                    np.float32),
                np.reshape(x.take(4, 1), [self.batch_size, 1]).astype(
                    np.float32),
                np.reshape(x.take(1, 1), [self.batch_size, 1]).astype(
                    np.float32),
                categories.astype(np.float32),
                titles.astype(np.float32)
            ],
                                training=False)
            test_loss = self.ComputeLoss(
                np.reshape(y, [self.batch_size, 1]).astype(np.float32), logits)
            avg_loss(test_loss)
            # 保存测试损失
            self.losses['test'].append(test_loss)
            self.ComputeMetrics(
                np.reshape(y, [self.batch_size, 1]).astype(np.float32), logits)
            # avg_loss(self.compute_loss(labels, logits))
            # avg_mae(self.compute_metrics(labels, logits))

        print('Model test set loss: {:0.6f} mae: {:0.6f}'.format(
            avg_loss.result(), self.ComputeMetrics.result()))
        # print('Model test set loss: {:0.6f} mae: {:0.6f}'.format(avg_loss.result(), avg_mae.result()))
        #         summary_ops_v2.scalar('loss', avg_loss.result(), step=step_num)
        #         summary_ops_v2.scalar('mae', self.ComputeMetrics.result(), step=step_num)
        # summary_ops_v2.scalar('mae', avg_mae.result(), step=step_num)

        if avg_loss.result() < self.best_loss:
            self.best_loss = avg_loss.result()
            print("best loss = {}".format(self.best_loss))
            self.checkpoint.save(self.checkpoint_prefix)

    def forward(self, xs):
        predictions = self.model(xs)
        # logits = tf.nn.softmax(predictions)

        return predictions

In [21]:
# 取得batch
def get_batches(Xs, ys, batch_size):
    for start in range(0, len(Xs), batch_size):
        end = min(start + batch_size, len(Xs))
        yield Xs[start:end], ys[start:end]

### 训练网络
将用户特征和电影特征作为输入， 经过全连接，输出一个值的训练

In [22]:
mv_net = mv_network()
mv_net.training(features, targets_values, epochs=5)

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
movie_titles (InputLayer)       [(None, 15)]         0                                            
__________________________________________________________________________________________________
movie_title_embed_layer (Embedd (None, 15, 32)       166880      movie_titles[0][0]               
__________________________________________________________________________________________________
reshape (Reshape)               (None, 15, 32, 1)    0           movie_title_embed_layer[0][0]    
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 14, 1, 8)     520         reshape[0][0]                    
______________________________________________________________________________________________

Step #18800	Epoch   0 Batch   49/3125   Loss: 0.763133 mae: 0.675904 (15.141735359699895 steps/sec)
Step #18850	Epoch   0 Batch   99/3125   Loss: 0.800664 mae: 0.643598 (44.4028984077305 steps/sec)
Step #18900	Epoch   0 Batch  149/3125   Loss: 0.632010 mae: 0.646578 (46.06202613857976 steps/sec)
Step #18950	Epoch   0 Batch  199/3125   Loss: 0.688822 mae: 0.642744 (44.96157104110116 steps/sec)
Step #19000	Epoch   0 Batch  249/3125   Loss: 0.760208 mae: 0.644846 (46.199023392150394 steps/sec)
Step #19050	Epoch   0 Batch  299/3125   Loss: 0.641236 mae: 0.641642 (45.02257513469834 steps/sec)
Step #19100	Epoch   0 Batch  349/3125   Loss: 0.689330 mae: 0.641615 (45.19771868280505 steps/sec)
Step #19150	Epoch   0 Batch  399/3125   Loss: 0.728691 mae: 0.642189 (46.28016737332412 steps/sec)
Step #19200	Epoch   0 Batch  449/3125   Loss: 0.648716 mae: 0.648071 (44.014041497384746 steps/sec)
Step #19250	Epoch   0 Batch  499/3125   Loss: 0.619456 mae: 0.632197 (43.935423588909885 steps/sec)
Step #1

Step #22900	Epoch   1 Batch 1024/3125   Loss: 0.631318 mae: 0.617904 (44.92148734396162 steps/sec)
Step #22950	Epoch   1 Batch 1074/3125   Loss: 0.644288 mae: 0.622262 (44.85660189132698 steps/sec)
Step #23000	Epoch   1 Batch 1124/3125   Loss: 0.620328 mae: 0.632210 (43.53297752495284 steps/sec)
Step #23050	Epoch   1 Batch 1174/3125   Loss: 0.533237 mae: 0.626361 (45.65477304887341 steps/sec)
Step #23100	Epoch   1 Batch 1224/3125   Loss: 0.641421 mae: 0.618252 (45.80639735971579 steps/sec)
Step #23150	Epoch   1 Batch 1274/3125   Loss: 0.667527 mae: 0.632826 (44.40381976447143 steps/sec)
Step #23200	Epoch   1 Batch 1324/3125   Loss: 0.538211 mae: 0.629792 (45.005270621521305 steps/sec)
Step #23250	Epoch   1 Batch 1374/3125   Loss: 0.600053 mae: 0.616821 (44.800697276477386 steps/sec)
Step #23300	Epoch   1 Batch 1424/3125   Loss: 0.646433 mae: 0.616096 (44.77297880771175 steps/sec)
Step #23350	Epoch   1 Batch 1474/3125   Loss: 0.523652 mae: 0.622667 (45.42986382210114 steps/sec)
Step #23

Step #27000	Epoch   2 Batch 1999/3125   Loss: 0.665007 mae: 0.619339 (58.02510365321377 steps/sec)
Step #27050	Epoch   2 Batch 2049/3125   Loss: 0.600302 mae: 0.619831 (59.18973238414598 steps/sec)
Step #27100	Epoch   2 Batch 2099/3125   Loss: 0.630337 mae: 0.611807 (58.22732443318545 steps/sec)
Step #27150	Epoch   2 Batch 2149/3125   Loss: 0.594576 mae: 0.611183 (58.77346949928465 steps/sec)
Step #27200	Epoch   2 Batch 2199/3125   Loss: 0.599276 mae: 0.619994 (57.89116421817949 steps/sec)
Step #27250	Epoch   2 Batch 2249/3125   Loss: 0.508298 mae: 0.618357 (58.704551358420595 steps/sec)
Step #27300	Epoch   2 Batch 2299/3125   Loss: 0.653273 mae: 0.628758 (56.97013128802657 steps/sec)
Step #27350	Epoch   2 Batch 2349/3125   Loss: 0.605656 mae: 0.615799 (58.91156068925744 steps/sec)
Step #27400	Epoch   2 Batch 2399/3125   Loss: 0.545579 mae: 0.614135 (58.635925811497906 steps/sec)
Step #27450	Epoch   2 Batch 2449/3125   Loss: 0.538630 mae: 0.613800 (59.05033556574541 steps/sec)
Step #27

Step #31100	Epoch   3 Batch 2974/3125   Loss: 0.559344 mae: 0.613591 (60.04035610483426 steps/sec)
Step #31150	Epoch   3 Batch 3024/3125   Loss: 0.567439 mae: 0.605221 (60.54798159151408 steps/sec)
Step #31200	Epoch   3 Batch 3074/3125   Loss: 0.627377 mae: 0.605516 (59.4001164690394 steps/sec)
Step #31250	Epoch   3 Batch 3124/3125   Loss: 0.715656 mae: 0.601663 (58.42897354392986 steps/sec)

Train time for epoch #4 (31250 total steps): 53.22566604614258
Model test set loss: 0.806341 mae: 0.704461
Step #31300	Epoch   4 Batch   49/3125   Loss: 0.653065 mae: 0.698928 (57.560474602080646 steps/sec)
Step #31350	Epoch   4 Batch   99/3125   Loss: 0.678938 mae: 0.603214 (60.256824301881764 steps/sec)
Step #31400	Epoch   4 Batch  149/3125   Loss: 0.571468 mae: 0.610890 (60.54796411039199 steps/sec)
Step #31450	Epoch   4 Batch  199/3125   Loss: 0.637329 mae: 0.604941 (59.82548883181584 steps/sec)
Step #31500	Epoch   4 Batch  249/3125   Loss: 0.684213 mae: 0.605060 (60.69456523469324 steps/sec)


将用户特征和电影特征做矩阵乘法得到一个预测评分的训练

In [23]:
mv_net=mv_network()
mv_net.training(features, targets_values, epochs=5)

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
movie_titles (InputLayer)       [(None, 15)]         0                                            
__________________________________________________________________________________________________
movie_title_embed_layer (Embedd (None, 15, 32)       166880      movie_titles[0][0]               
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 15, 32, 1)    0           movie_title_embed_layer[0][0]    
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 14, 1, 8)     520         reshape_1[0][0]                  
____________________________________________________________________________________________

Step #21900	Epoch   0 Batch   24/3125   Loss: 0.609366 mae: 0.633084 (19.4568707139653 steps/sec)
Step #21950	Epoch   0 Batch   74/3125   Loss: 0.663428 mae: 0.628300 (42.021797252410465 steps/sec)
Step #22000	Epoch   0 Batch  124/3125   Loss: 0.612210 mae: 0.631136 (40.87523976161152 steps/sec)
Step #22050	Epoch   0 Batch  174/3125   Loss: 0.641168 mae: 0.631124 (39.82825523484243 steps/sec)
Step #22100	Epoch   0 Batch  224/3125   Loss: 0.598328 mae: 0.624879 (43.3538703186847 steps/sec)
Step #22150	Epoch   0 Batch  274/3125   Loss: 0.583821 mae: 0.629070 (43.62936740546199 steps/sec)
Step #22200	Epoch   0 Batch  324/3125   Loss: 0.704832 mae: 0.625740 (45.24837984998563 steps/sec)
Step #22250	Epoch   0 Batch  374/3125   Loss: 0.582943 mae: 0.630155 (45.12264314750334 steps/sec)
Step #22300	Epoch   0 Batch  424/3125   Loss: 0.714045 mae: 0.632422 (43.550343829652206 steps/sec)
Step #22350	Epoch   0 Batch  474/3125   Loss: 0.679895 mae: 0.624665 (43.44193444078415 steps/sec)
Step #2240

Step #26000	Epoch   1 Batch  999/3125   Loss: 0.725587 mae: 0.614716 (36.87385799547348 steps/sec)
Step #26050	Epoch   1 Batch 1049/3125   Loss: 0.691710 mae: 0.619049 (43.537288427643446 steps/sec)
Step #26100	Epoch   1 Batch 1099/3125   Loss: 0.599234 mae: 0.612265 (42.28754819765667 steps/sec)
Step #26150	Epoch   1 Batch 1149/3125   Loss: 0.580234 mae: 0.625496 (43.63211781856887 steps/sec)
Step #26200	Epoch   1 Batch 1199/3125   Loss: 0.677211 mae: 0.612743 (42.1990192507752 steps/sec)
Step #26250	Epoch   1 Batch 1249/3125   Loss: 0.697173 mae: 0.621404 (43.22362637872241 steps/sec)
Step #26300	Epoch   1 Batch 1299/3125   Loss: 0.566858 mae: 0.626397 (43.25533773844401 steps/sec)
Step #26350	Epoch   1 Batch 1349/3125   Loss: 0.599054 mae: 0.610217 (42.1797527725596 steps/sec)
Step #26400	Epoch   1 Batch 1399/3125   Loss: 0.618869 mae: 0.614029 (43.12486839334597 steps/sec)
Step #26450	Epoch   1 Batch 1449/3125   Loss: 0.628100 mae: 0.612924 (41.39741380654695 steps/sec)
Step #26500

Step #30100	Epoch   2 Batch 1974/3125   Loss: 0.581796 mae: 0.612082 (44.599625619979186 steps/sec)
Step #30150	Epoch   2 Batch 2024/3125   Loss: 0.719188 mae: 0.617075 (45.51120987675755 steps/sec)
Step #30200	Epoch   2 Batch 2074/3125   Loss: 0.674238 mae: 0.605665 (44.48971114756679 steps/sec)
Step #30250	Epoch   2 Batch 2124/3125   Loss: 0.545744 mae: 0.612997 (40.70864004950681 steps/sec)
Step #30300	Epoch   2 Batch 2174/3125   Loss: 0.568136 mae: 0.609232 (35.827359834546655 steps/sec)
Step #30350	Epoch   2 Batch 2224/3125   Loss: 0.587303 mae: 0.617807 (39.234619301173005 steps/sec)
Step #30400	Epoch   2 Batch 2274/3125   Loss: 0.640620 mae: 0.614160 (45.942620796018154 steps/sec)
Step #30450	Epoch   2 Batch 2324/3125   Loss: 0.506651 mae: 0.617400 (46.28870714659522 steps/sec)
Step #30500	Epoch   2 Batch 2374/3125   Loss: 0.557591 mae: 0.607143 (46.10328013044282 steps/sec)
Step #30550	Epoch   2 Batch 2424/3125   Loss: 0.571835 mae: 0.612330 (44.82077584119421 steps/sec)
Step #

Step #34200	Epoch   3 Batch 2949/3125   Loss: 0.711594 mae: 0.611860 (43.3087913985103 steps/sec)
Step #34250	Epoch   3 Batch 2999/3125   Loss: 0.617143 mae: 0.606910 (38.92084422973029 steps/sec)
Step #34300	Epoch   3 Batch 3049/3125   Loss: 0.607009 mae: 0.596963 (41.173984090363376 steps/sec)
Step #34350	Epoch   3 Batch 3099/3125   Loss: 0.572505 mae: 0.600058 (41.63199074435264 steps/sec)

Train time for epoch #4 (34375 total steps): 74.5869665145874
Model test set loss: 0.813084 mae: 0.703456
Step #34400	Epoch   4 Batch   24/3125   Loss: 0.554919 mae: 0.700476 (80.00091553782124 steps/sec)
Step #34450	Epoch   4 Batch   74/3125   Loss: 0.646027 mae: 0.602859 (44.670257494612585 steps/sec)
Step #34500	Epoch   4 Batch  124/3125   Loss: 0.578570 mae: 0.604482 (38.99296485514057 steps/sec)
Step #34550	Epoch   4 Batch  174/3125   Loss: 0.567312 mae: 0.603299 (40.675927899627546 steps/sec)
Step #34600	Epoch   4 Batch  224/3125   Loss: 0.543727 mae: 0.599157 (45.199004530133614 steps/sec)

### 指定用户和电影进行评分

这部分就是对网络做正向传播，计算得到预测的评分

In [27]:
def rating_movie(mv_net, user_id_val, movie_id_val):
    categories = np.zeros([1, 18])
    categories[0] = movies.values[movieid2idx[movie_id_val]][2]

    titles = np.zeros([1, sentences_size])
    titles[0] = movies.values[movieid2idx[movie_id_val]][1]

    inference_val = mv_net.model([
        np.reshape(users.values[user_id_val - 1][0], [1, 1]),
        np.reshape(users.values[user_id_val - 1][1], [1, 1]),
        np.reshape(users.values[user_id_val - 1][2], [1, 1]),
        np.reshape(users.values[user_id_val - 1][3], [1, 1]),
        np.reshape(movies.values[movieid2idx[movie_id_val]][0], [1, 1]),
        categories, titles
    ])

    return (inference_val.numpy())

In [28]:
rating_movie(mv_net, 234, 1401)

array([[3.8851764]], dtype=float32)

### 生成Movie特征矩阵

将训练好的电影特征组合成电影特征矩阵并保存到本地


In [29]:
movie_layer_model = keras.models.Model(
    inputs=[
        mv_net.model.input[4], mv_net.model.input[5], mv_net.model.input[6]
    ],
    outputs=mv_net.model.get_layer("movie_combine_layer_flat").output)
movie_matrics = []

for item in movies.values:
    categories = np.zeros([1, 18])
    categories[0] = item.take(2)

    titles = np.zeros([1, sentences_size])
    titles[0] = item.take(1)

    movie_combine_layer_flat_val = movie_layer_model(
        [np.reshape(item.take(0), [1, 1]), categories, titles])
    movie_matrics.append(movie_combine_layer_flat_val)

pickle.dump((np.array(movie_matrics).reshape(-1, 200)),
            open('movie_matrics.p', 'wb'))
movie_matrics = pickle.load(open('movie_matrics.p', mode='rb'))

In [30]:
movie_matrics = pickle.load(open('movie_matrics.p', mode='rb'))

### 生成User特征矩阵

将训练好的用户特征组合成用户特征矩阵并保存到本地


In [31]:
user_layer_model = keras.models.Model(
    inputs=[
        mv_net.model.input[0], mv_net.model.input[1], mv_net.model.input[2],
        mv_net.model.input[3]
    ],
    outputs=mv_net.model.get_layer("user_combine_layer_flat").output)
users_matrics = []

for item in users.values:

    user_combine_layer_flat_val = user_layer_model([
        np.reshape(item.take(0), [1, 1]),
        np.reshape(item.take(1), [1, 1]),
        np.reshape(item.take(2), [1, 1]),
        np.reshape(item.take(3), [1, 1])
    ])
    users_matrics.append(user_combine_layer_flat_val)

pickle.dump((np.array(users_matrics).reshape(-1, 200)),
            open('users_matrics.p', 'wb'))
users_matrics = pickle.load(open('users_matrics.p', mode='rb'))

In [32]:
users_matrics = pickle.load(open('users_matrics.p', mode='rb'))

### 开始推荐电影

使用生产的用户特征矩阵和电影特征矩阵做电影推荐

### 推荐同类型的电影

思路是计算当前看的电影特征向量与整个电影特征矩阵的余弦相似度，取相似度最大的top_k个，这里加了些随机选择在里面，保证每次的推荐稍稍有些不同。



In [33]:
def recommend_same_type_movie(movie_id_val, top_k=20):

    norm_movie_matrics = tf.sqrt(
        tf.reduce_sum(tf.square(movie_matrics), 1, keepdims=True))
    normalized_movie_matrics = movie_matrics / norm_movie_matrics

    #推荐同类型的电影
    probs_embeddings = (movie_matrics[movieid2idx[movie_id_val]]).reshape(
        [1, 200])
    probs_similarity = tf.matmul(probs_embeddings,
                                 tf.transpose(normalized_movie_matrics))
    sim = (probs_similarity.numpy())
    #     results = (-sim[0]).argsort()[0:top_k]
    #     print(results)

    print("您看的电影是：{}".format(movies_orig[movieid2idx[movie_id_val]]))
    print("以下是给您的推荐：")
    p = np.squeeze(sim)
    p[np.argsort(p)[:-top_k]] = 0
    p = p / np.sum(p)
    results = set()
    while len(results) != 5:
        c = np.random.choice(3883, 1, p=p)[0]
        results.add(c)
    for val in (results):
        print(val)
        print(movies_orig[val])

    return results

In [34]:
recommend_same_type_movie(1401, 20)

您看的电影是：[1401 'Ghosts of Mississippi (1996)' 'Drama']
以下是给您的推荐：
1380
[1401 'Ghosts of Mississippi (1996)' 'Drama']
3301
[3370 'Betrayed (1988)' 'Drama|Thriller']
1557
[1598 'Desperate Measures (1998)' 'Crime|Drama|Thriller']
792
[802 'Phenomenon (1996)' 'Drama|Romance']
346
[350 'Client, The (1994)' 'Drama|Mystery|Thriller']


{346, 792, 1380, 1557, 3301}

### 推荐您喜欢的电影

思路是使用用户特征向量与电影特征矩阵计算所有电影的评分，取评分最高的top_k个，同样加了些随机选择部分


In [35]:
def recommend_your_favorite_movie(user_id_val, top_k=10):

    #推荐您喜欢的电影
    probs_embeddings = (users_matrics[user_id_val - 1]).reshape([1, 200])

    probs_similarity = tf.matmul(probs_embeddings, tf.transpose(movie_matrics))
    sim = (probs_similarity.numpy())
    #     print(sim.shape)
    #     results = (-sim[0]).argsort()[0:top_k]
    #     print(results)

    #     sim_norm = probs_norm_similarity.eval()
    #     print((-sim_norm[0]).argsort()[0:top_k])

    print("以下是给您的推荐：")
    p = np.squeeze(sim)
    p[np.argsort(p)[:-top_k]] = 0
    p = p / np.sum(p)
    results = set()
    while len(results) != 5:
        c = np.random.choice(3883, 1, p=p)[0]
        results.add(c)
    for val in (results):
        print(val)
        print(movies_orig[val])

    return results

In [36]:
recommend_your_favorite_movie(234, 10)

以下是给您的推荐：
3846
[3916 'Remember the Titans (2000)' 'Drama']
145
[147 'Basketball Diaries, The (1995)' 'Drama']
2133
[2202 'Lifeboat (1944)' 'Drama|Thriller|War']
2263
[2332 'Belly (1998)' 'Crime|Drama']
315
[318 'Shawshank Redemption, The (1994)' 'Drama']


{145, 315, 2133, 2263, 3846}

### 看过这个电影的人还看了（喜欢）哪些电影
- 首先选出喜欢某个电影的top_k个人，得到这几个人的用户特征向量。
- 然后计算这几个人对所有电影的评分
- 选择每个人评分最高的电影作为推荐
- 同样加入了随机选择

In [37]:
import random


def recommend_other_favorite_movie(movie_id_val, top_k=20):

    probs_movie_embeddings = (
        movie_matrics[movieid2idx[movie_id_val]]).reshape([1, 200])
    probs_user_favorite_similarity = tf.matmul(probs_movie_embeddings,
                                               tf.transpose(users_matrics))
    favorite_user_id = np.argsort(
        probs_user_favorite_similarity.numpy())[0][-top_k:]
    #     print(normalized_users_matrics.numpy().shape)
    #     print(probs_user_favorite_similarity.numpy()[0][favorite_user_id])
    #     print(favorite_user_id.shape)

    print("您看的电影是：{}".format(movies_orig[movieid2idx[movie_id_val]]))

    print("喜欢看这个电影的人是：{}".format(users_orig[favorite_user_id - 1]))
    probs_users_embeddings = (users_matrics[favorite_user_id - 1]).reshape(
        [-1, 200])
    probs_similarity = tf.matmul(probs_users_embeddings,
                                 tf.transpose(movie_matrics))
    sim = (probs_similarity.numpy())
    #     results = (-sim[0]).argsort()[0:top_k]
    #     print(results)

    #     print(sim.shape)
    #     print(np.argmax(sim, 1))
    p = np.argmax(sim, 1)
    print("喜欢看这个电影的人还喜欢看：")

    if len(set(p)) < 5:
        results = set(p)
    else:
        results = set()
        while len(results) != 5:
            c = p[random.randrange(top_k)]
            results.add(c)
    for val in (results):
        print(val)
        print(movies_orig[val])

    return results

In [38]:
recommend_other_favorite_movie(1401, 20)

您看的电影是：[1401 'Ghosts of Mississippi (1996)' 'Drama']
喜欢看这个电影的人是：[[2517 'F' 35 9]
 [5690 'M' 18 4]
 [4849 'F' 18 4]
 [4127 'M' 50 17]
 [1880 'M' 35 0]
 [4800 'M' 18 4]
 [2390 'F' 25 6]
 [2696 'M' 25 7]
 [5050 'F' 18 4]
 [2065 'M' 25 6]
 [3764 'M' 25 1]
 [767 'M' 25 12]
 [4593 'F' 45 1]
 [100 'M' 35 17]
 [1415 'M' 45 14]
 [2294 'M' 56 13]
 [5296 'F' 1 0]
 [774 'M' 18 4]
 [4503 'M' 56 1]
 [4518 'M' 25 0]]
喜欢看这个电影的人还喜欢看：
643
[649 'Cold Fever (� k鰈dum klaka) (1994)' 'Comedy|Drama']
3822
[3892 'Anatomy (Anatomie) (2000)' 'Horror']
2995
[3064 'Poison Ivy: New Seduction (1997)' 'Thriller']
1180
[1198 'Raiders of the Lost Ark (1981)' 'Action|Adventure']
1790
[1859 'Taste of Cherry (1997)' 'Drama']


{643, 1180, 1790, 2995, 3822}

这个结果里面20个人最喜欢这两个电影，所以只输出了两个结果