# Recommendations: Part 2

In the 2nd part of our recommendations exercise, you will use the PageRank algorithm to make article recommendations to an author. 
Execute the code to import the libraries (remember to unset Reset all runtimes before running):  
在我们推荐练习的第二部分中，您将使用PageRank算法向作者推荐文章。
执行代码导入库(记得在运行前取消设置重置所有运行时):

In [1]:
from py2neo import Graph
import pandas as pd

import matplotlib
import matplotlib.pyplot as plt

plt.style.use("fivethirtyeight")
pd.set_option("display.float_format", lambda x: '% 3f' % x)
pd.set_option("display.max_colwidth",100)

  return f(*args, **kwds)


Next, create a connection to your Neo4j Sandbox, just as you did previously when you set up your environment.   
接下来，创建到Neo4j沙箱的连接，就像之前设置环境时所做的一样。

<div align="left">
    <img src="images/sandbox-citations.png" alt="Citation Sandbox"/>
</div>

Update the cell below to use the IP Address, Bolt Port, and Password, as you did previously.  
像前面一样，更新下面的单元格，使用IP地址、Bolt端口和密码。

In [2]:
# Change the line of code below to use the IP Address, Bolt Port,  and Password of your Sandbox.
# graph = Graph("<Bolt URL>", auth=("neo4j", "<Password>")) 
 
graph = Graph("bolt://100.25.48.12:37028", auth=("neo4j", "auto-development-gunnery"))

## PageRank

PageRank is an algorithm that measures the transitive influence or connectivity of nodes. It can be computed by either iteratively distributing one node’s rank (originally based on degree) over its neighbors or by randomly traversing the graph and counting the frequency of hitting each node during these walks.  
PageRank是一种度量节点间传递影响或连通性的算法。它可以通过在相邻节点上迭代分配一个节点的秩(最初基于度)来计算，也可以通过随机遍历图并计算在这些遍历过程中击中每个节点的频率来计算。

Run this PageRank code over the whole graph to find out the most influential article in terms of citations:  
在整个图表上运行这个PageRank代码，找出在引用方面最有影响力的文章:

In [4]:
query = """
CALL algo.pageRank('Article', 'CITED')
"""

graph.run(query).data()

[{'nodes': 51956,
  'iterations': 20,
  'loadMillis': 174,
  'computeMillis': 40,
  'writeMillis': 222,
  'dampingFactor': 0.85,
  'write': True,
  'writeProperty': 'pagerank'}]

This query stores a 'pagerank' property on each node. Execute this code to view the most influential articles:  
该查询在每个节点上存储一个“pagerank”属性。执行此代码查看最有影响力的文章:

In [24]:
query = """
MATCH (a:Article)
RETURN a.title as article,
       a.pagerank as score
ORDER BY score DESC 
LIMIT 10
"""
graph.run(query).to_data_frame()

Unnamed: 0,article,score
0,A method for obtaining digital signatures and public-key cryptosystems,93.943105
1,Secure communications over insecure channels,79.869224
2,Rough sets,25.609092
3,An axiomatic basis for computer programming,23.029374
4,"Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems",21.46955
5,SCRIBE: The Design of a Large-Scale Event Notification Infrastructure,19.486296
6,A field study of the software design process for large systems,19.028154
7,Productivity factors and programming environments,18.499351
8,Analyzing medium-scale software development,16.452748
9,A Calculus of Communicating Systems,15.430588


## Personalized PageRank

Personalized PageRank is a variant of PageRank that allows us to find influential nodes based on a set of source nodes.  
个性化PageRank是PageRank的一个变体，它允许我们基于一组源节点找到有影响力的节点。

For example, rather than finding the overall most influential articles, we could instead, find the most influential articles with respect to a given author.
Execute this code to use a personalized PageRank algorithm:  
例如，我们不需要找出总体上最有影响力的文章，相反，我们可以找到给定作者的最有影响力的文章。执行此代码使用个性化的PageRank算法:

In [9]:
query ="""
MATCH (a:Author {name:$author})<-[:AUTHOR]-(article)-[:CITED]->(other)
WITH collect(article) + collect(other) AS sourceNodes
CALL algo.pageRank.stream("Article", "CITED", {sourceNodes: sourceNodes})
YIELD nodeId, score
RETURN algo.getNodeById(nodeId).title AS article, score
ORDER BY score DESC
LIMIT 10
"""

author_name = "Peter G. Neumann"
graph.run(query, {"author":author_name}).to_data_frame()

Unnamed: 0,article,score
0,A technique for software module specification with examples,0.358528
1,A messy state of the union: taming the composite state machines of TLS,0.331688
2,Crypto policy perspectives,0.2775
3,Risks of automation: a cautionary total-system perspective of our cyberfuture,0.2775
4,The foresight saga,0.2775
5,Risks of e-voting,0.2775
6,Public interest and the NII,0.2775
7,Password security: a case history,0.2775
8,The challenges of partially automated driving,0.267938
9,Proof techniques for hierarchically structured programs,0.248436


## Topic Sensitive Search

You can also use Personalized PageRank to do 'Topic Specific PageRank'.   
你也可以使用个性化的PageRank来做“特定主题的PageRank”。

When an author is searching for articles to read, they want that search to take themselves as authors into account. Two authors using the same search term would expect to see different results depending on their area of research.  
当作者在搜索要阅读的文章时，他们希望将自己作为作者考虑在内。使用相同搜索词的两位作者希望根据他们的研究领域看到不同的结果。

Create a full text search index on the 'title' and 'abstract' properties of all nodes that have the label 'Article' by executing this code:  
执行以下代码，在标签为“Article”的所有节点的“title”和“abstract”属性上创建全文搜索索引:

In [None]:
query = """
    CALL db.index.fulltext.createNodeIndex('articles', ['Article'], ['title', 'abstract'])
"""
graph.run(query).data()

Check that the full text index has been created by running the following query:  
检查全文索引是否已通过运行以下查询创建:


In [25]:
query = """
CALL db.indexes()
YIELD description, indexName, tokenNames, properties, state, type, progress
WHERE type = "node_fulltext"
RETURN *
"""
graph.run(query).to_data_frame()

Unnamed: 0,description,indexName,progress,properties,state,tokenNames,type
0,"INDEX ON NODE:Article(title, abstract)",articles,100.0,"[title, abstract]",ONLINE,[Article],node_fulltext


You can search the full text index like this:  
你可以这样搜索全文索引:

In [13]:
query = """
CALL db.index.fulltext.queryNodes("articles", "open source")
YIELD node, score
RETURN node.title, score, [(author)<-[:AUTHOR]-(node) | author.name] AS authors
LIMIT 10
"""

graph.run(query).to_data_frame()

Unnamed: 0,authors,node.title,score
0,"[Rob Miller, Dean Nelson, Pankaj K. Garg, Jamie Dinkelacker]",Progressive open source,4.251786
1,"[Walt Scacchi, Joseph Feller, Brian Fitzgerald, Krishna K Lakhani, Scott A. Hissam]",Open source application spaces: the 5th workshop on open source software engineering,4.081271
2,"[Alan W. Brown, Grady Booch]",Reusing Open-Source Software and Practices: The Impact of Open-Source on Commercial Vendors,4.071275
3,[Roy T. Fielding],Software architecture in an open source world,3.815273
4,[Susan L. Graham],From Research Software to Open Source,3.784185
5,"[Stefan Baldi, Anett Mehler-Bicher, Hauke Heier]",Open courseware and open source software,3.693338
6,[Pamela Samuelson],IBM's pragmatic embrace of open source,3.689845
7,"[LiGuo Huang, Zeheng Li]",When to release in open source project,3.542871
8,"[Jaap-Henk Hoepman, Bart Jacobs]",Increased security through open source,3.515002
9,[Robert L. Glass],A sociopolitical look at open source,3.492101


Here is a query to find the authors that have published the most articles on 'open source':  
这里有一个查询，可以找到在“开放源码”上发表文章最多的作者:

In [15]:
query = """
CALL db.index.fulltext.queryNodes("articles", "open source")
YIELD node, score
MATCH (node)-[:AUTHOR]->(author)
RETURN author.name, sum(score) AS totalScore, collect(node.title) AS articles
ORDER BY totalScore DESC
LIMIT 20
 
"""

graph.run(query).to_data_frame()

Unnamed: 0,articles,author.name,totalScore
0,"[Open source application spaces: the 5th workshop on open source software engineering, The 3rd w...",Brian Fitzgerald,16.119015
1,"[Open source application spaces: the 5th workshop on open source software engineering, The 3rd w...",Joseph Feller,16.011879
2,"[Open source application spaces: the 5th workshop on open source software engineering, The futur...",Walt Scacchi,10.730826
3,"[Open source-style collaborative development practices in commercial projects using GitHub, Mach...",Daniel M. German,10.686845
4,"[Open source application spaces: the 5th workshop on open source software engineering, The 3rd w...",Scott A. Hissam,10.641744
5,"[A case study of a corporate open source development model, Managing a corporate open source sof...",James D. Herbsleb,10.4755
6,"[Machine learning-based detection of open source license exceptions, Recommending source code fo...",Denys Poshyvanyk,8.906547
7,"[Understanding broadcast based peer review on open source software projects, Peer Review on Open...",Margaret-Anne D. Storey,8.18079
8,"[Understanding broadcast based peer review on open source software projects, Peer Review on Open...",Peter C. Rigby,7.648662
9,"[An automated tool for generating change report from open-source software, Cross project change ...",Ruchika Malhotra,7.132015


Next, use full text search and Personalized PageRank to find interesting articles for different authors:  
    接下来，使用全文搜索和个性化的PageRank为不同的作者找到有趣的文章:
    

In [18]:
query = """
MATCH (a:Author {name: $author})<-[:AUTHOR]-(article)-[:CITED]->(other)
WITH a, collect(article) + collect(other) AS sourceNodes
CALL algo.pageRank.stream(
  'CALL db.index.fulltext.queryNodes("articles", $searchTerm)
   YIELD node, score
   RETURN id(node) as id',
  'MATCH (a1:Article)-[:CITED]->(a2:Article) 
   RETURN id(a1) as source,id(a2) as target', 
  {sourceNodes: sourceNodes,graph:'cypher', params: {searchTerm: $searchTerm}})
YIELD nodeId, score
WITH algo.getNodeById(nodeId) AS n, score
WHERE not(exists((a)<-[:AUTHOR]-(n))) AND score > 0
RETURN n.title as article, score, [(n)-[:AUTHOR]->(author) | author.name][..5] AS authors
order by score desc limit 10
"""

params = {"author": "Tao Xie", "searchTerm": "open source"}
graph.run(query, params).to_data_frame()

Unnamed: 0,article,authors,score
0,Static detection of cross-site scripting vulnerabilities,"[Zhendong Su, Gary Wassermann]",0.385875
1,Concern graphs: finding and describing concerns using structural program dependencies,"[Gail C. Murphy, Martin P. Robillard]",0.2775
2,Characterizing logging practices in open-source software,"[Ding Yuan, Soyeon Park, Yuanyuan Zhou]",0.2775
3,"Automated, contract-based user testing of commercial-off-the-shelf components","[Lionel C. Briand, Yvan Labiche, Michal M. Sówka]",0.2775
4,Who should fix this bug,"[Lyndon Hiew, John Anvik, Gail C. Murphy]",0.2775
5,Conceptual module querying for software reengineering,"[Gail C. Murphy, Elisa L. A. Baniassad]",0.235875
6,Semantics-based code search,[Steven P. Reiss],0.15
7,Bandera: extracting finite-state models from Java source code,"[Matthew B. Dwyer, Hongjun Zheng, James C. Corbett, Shawn Laubach, John Hatcliff]",0.15
8,AsDroid: detecting stealthy behaviors in Android applications by user interface and program beha...,"[Lin Tan, Jianjun Huang, Xiangyu Zhang, Bin Liang, Peng Wang]",0.15
9,EXSYST: search-based GUI testing,"[Andreas Zeller, Gordon Fraser, Florian Gross]",0.1275


Execute the same query with a different author:  
    使用不同的作者执行相同的查询:

In [23]:
params = {"author": "Marco Aurélio Gerosa", "searchTerm": "open source"}
graph.run(query, params).to_data_frame()

Unnamed: 0,article,authors,score
0,Toward an understanding of the motivation of open source software developers,"[Yunwen Ye, Kouichi Kishida]",0.388425
1,Hipikat: recommending pertinent software development artifacts,"[Gail C. Murphy, Davor Cubranic]",0.322429
2,Version Sensitive Editing: Change History as a Programming Tool,[David L. Atkins],0.274065
3,Which bug should I fix: helping new developers onboard a new project,"[Jianguo Wang, Anita Sarma]",0.23925
4,Tesseract: Interactive visual exploration of socio-technical relationships in software development,"[Anita Sarma, Larry Maccherone, Patrick Wagstrom, James D. Herbsleb]",0.203363
5,Role Migration and Advancement Processes in OSSD Projects: A Comparative Case Study,"[Walt Scacchi, Chris Jensen]",0.1755
6,Does the initial environment impact the future of developers,"[Minghui Zhou, Audris Mockus]",0.1755
7,Unifying artifacts and activities in a visual tool for distributed software development teams,"[Jon Froehlich, Paul Dourish]",0.172858
8,A case study of open source software development: the Apache server,"[James D. Herbsleb, Audris Mockus, Roy Fielding]",0.110054
9,A case study of the evolution of Jun: an object-oriented open-source 3D multimedia library,"[Yoshiyuki Nishinaka, Atsushi Aoki, Kouichi Kishida, Y. Yamamoto, Kaoru Hayashi]",0.110054
