## About this module
In this module you will learn how to build a machine learning classifier to predict co-authorships in the citation graph.
在本模块中，您将学习如何构建一个机器学习分类器来预测引文图中的合著者。

At the end of this module, you should be able to:  
在本单元结束时，您应该能够:
- Describe what link prediction is
- Use the link prediction functions in Neo4j
- Understand the challenges when building machine learning models on graph data
- Build a link prediction classifier using scikit-learn with features derived from the Neo4j Graph Algorithms library



- 描述链接预测是什么
- 使用Neo4j中的链接预测函数
- 了解在建立基于图形数据的机器学习模型时所面临的挑战
- 使用scikit-learn构建一个链接预测分类器，该分类器的特性来自于Neo4j图形算法库

## The Link Prediction problem

Link Prediction has been around for a long time, but was popularised by a paper written by Jon Kleinberg and David Liben-Nowell in 2004, titled [The Link Prediction Problem for Social Networks](https://www.cs.cornell.edu/home/kleinber/link-pred.pdf).  
链接预测已经存在很长时间了，但是在2004年Jon Kleinberg和David Liben-Nowell合著的一篇题为《社交网络的链接预测问题》的论文中得到了普及。
![LinkPrediction.png](./images/LinkPrediction.png)  


Kleinberg and Liben-Nowell approach this problem from the perspective of social networks, asking this question:  
Kleinberg和Liben-Nowell从社交网络的角度来研究这个问题，他们提出了这样一个问题:

Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future?  
给定一个社交网络的快照，我们能推断出在不久的将来它的成员之间可能会发生哪些新的交互吗?

We formalize this question as the Link Prediction problem, and develop approaches to Link Prediction based on measures for analyzing the “proximity” of nodes in a network.  
我们将这个问题形式化为链路预测问题，并开发了基于度量的链路预测方法来分析网络中节点的“接近度”。

For example, we could predict future associations between:  
例如，我们可以预测:

- People in a terrorist network
- Associations between molecules in a biology network
- Potential co-authorships in a citation network
- Interest in an artist or artwork


- 恐怖分子网络中的人
- 生物网络中分子之间的联系
- 引文网络中潜在的合作作者
- 对艺术家或艺术品的兴趣

In each these examples, predicting a link means that we are predicting some future behaviour. For example in a citation network, we’re actually predicting the action of two people collaborating on a paper.  
在每一个例子中，预测一个链接意味着我们在预测一些未来的行为。例如，在引文网络中，我们实际上是在预测两个人在一篇论文上的合作行为。

## Link Prediction Algorithms
Kleinberg and Liben-Nowell describe a set of methods that can be used for Link Prediction. These methods compute a score for a pair of nodes, where the score could be considered a measure of proximity or “similarity” between those nodes based on the graph topology. The closer two nodes are, the more likely there will be a relationship between them.  
Kleinberg和Liben-Nowell描述了一组用于链路预测的方法。这些方法计算一对节点的得分，其中的得分可以被认为是基于图拓扑的这些节点之间的接近度或“相似性”的度量。两个节点越接近，它们之间的关系就越有可能存在。

## Exercise 1: Running Link Prediction algorithms
You will gain some experience running the Link Prediction algorithms. In the query edit pane of Neo4j Browser, execute the browser command: `:play data-science-exercises` and follow the instructions for the Link Prediction exercise.  
您将获得一些经验，运行链接预测算法。在Neo4j浏览器的query edit窗格中，执行浏览器命令:`:play data-science-exercises`，并按照链接预测练习的说明操作。


## Applying Link Prediction Algorithms
Now that you have learned how to execute the link prediction algorithms, you will learn what to do with the results. There are two approaches:  
现在您已经了解了如何执行链接预测算法，您将了解如何处理结果。有两种方法:

### Using the measures directly
You can use the scores from the link prediction algorithms directly. With this approach you set a threshold value above which the algorithm would predict that a pair of nodes will have a link.  
您可以直接使用链接预测算法中的分数。使用这种方法，您可以设置一个阈值，超过这个阈值，算法将预测一对节点将有一个链接。

For example, you might say that every pair of nodes that has a preferential attachment score above 3 would have a link, and any with 3 or less would not.  
例如，您可能会说，每个优先附件得分高于3的节点对都有一个链接，而任何附件得分低于3的节点都没有。

### Supervised learning
You can take a supervised learning approach where you use the scores as features to train a binary classifier. The binary classifier then predicts whether a pair of nodes will have a link.  
您可以采用监督学习方法，使用分数作为特征来训练二进制分类器。然后，二进制分类器预测一对节点是否具有链接。

In the next part of this module you will use the supervised learning approach.  
在本模块的下一部分中，您将使用监督学习方法。

## Exercise 2: Building a binary classifier
In this exercise, you will build a binary classifier to predict co-authorships using a notebook.  
在本练习中，您将使用笔记本构建一个二进制分类器来预测合著者。

[<button>Exercise 2</button>](https://colab.research.google.com/github/neo4j-contrib/training-v2/blob/master/Courses/DataScience/notebooks/04_Predictions.ipynb)

[<button>Exercise 2 本地翻译版</button>](Predictions_Exercise.ipynb)

## Summary
You should now be able to:  
您现在应该能够:

- Describe what link prediction is
- Use the link prediction functions in Neo4j
- Understand the challenges when building machine learning models on graph data
- Build a link prediction classifier using scikit-learn with features derived from the Neo4j Graph Algorithms library  


- 描述链接预测是什么
- 使用Neo4j中的链接预测函数
- 了解在建立基于图形数据的机器学习模型时所面临的挑战
- 使用scikit-learn构建一个链接预测分类器，该分类器的特性来自于Neo4j图形算法库