In this Jupyter notebook, you'll explore the PageRank algorithm and how changing the topology of a network of webpages can affect the PageRank of individual pages. 

First, let's create a simple data abstraction to represent a webpage.

In [1]:
class Webpage:    
    # fixed_pagerank is used to model an external page whose PR should not be updated
    # The default, -1, means that the page's PR should be updated.
    def __init__(self, name, fixed_pagerank=-1):
        # create a new webpage, with no links
        self.name = name
        self.links = []
        self.backlinks = []
        self.fixed_pagerank = fixed_pagerank

    def add_link(self, target_page):
        if not target_page in self.links:
            self.links.append(target_page)
        # when adding a link, add a backlink on the target
        target_page.add_back_link(self) 
        
    def add_back_link(self, source_page):
        if not source_page in self.backlinks:
            self.backlinks.append(source_page)

    # debugging methods        
    def print_links(self):
        for page in self.links:
            print (page.name + ", ")
    def print_back_links(self):
        for page in self.backlinks:
            print (page.name + ", ")
    def __str__(self):
        return self.name        

We provide a simple iterative implementation of PageRank:

In [2]:
class PageRank:
    def __init__(self, pages, damping_factor=0.85, debug=False, supernode=True):
        self.page_rank_table = {}
        self.damping_factor = damping_factor
        self.debug=debug
        self.supernode=supernode

        # create a "supernode" that has a link and backlink to every page
        self.pages = pages.copy() # don't update the actual pages
        supernode = Webpage("supernode")
        
        if self.supernode:
            for page in self.pages:
                if page.fixed_pagerank == -1:
                    page.add_link(supernode)
                    supernode.add_link(page)

            self.pages.append(supernode)
        for page in self.pages:
            # initialize each page's PR to be 1/n, where n is the total number of pages
            self.page_rank_table[page] = 1/len(self.pages)
          
    def run_page_rank(self, iterations):
        for ii in range(iterations):
            if self.debug:
                print("\nIteration #" + str(ii))
                self.print_table(show_supernode=True)
                
            new_page_rank_table = {}
            for page in self.page_rank_table:
                if page.fixed_pagerank == -1:
                    new_page_rank = 0
                    for backlink in page.backlinks:
                        new_page_rank += self.page_rank_table[backlink] / len(backlink.links)
                    new_page_rank_table[page] = (1-self.damping_factor) + self.damping_factor * new_page_rank
                else:
                    new_page_rank_table[page] = page.fixed_pagerank
            self.page_rank_table = new_page_rank_table

    # debugging & validation methods
    def calc_average_pagerank(self):
        sum_pagerank = 0
        for page in self.page_rank_table:
            sum_pagerank += self.page_rank_table[page]
        return sum_pagerank / len(self.page_rank_table)

    def print_table(self, show_supernode=False):
        for page in self.page_rank_table:
            if str(page) != "supernode" or show_supernode:
                print(str(page) + ": " + str(self.page_rank_table[page]))        

Create a simple network, consisting of two pages, each linking to each other, and calculate the PageRank of each one.

In [3]:
pages = []
pageA = Webpage("A")
pageB = Webpage("B")
pageC = Webpage("C")
pageD = Webpage("D")

pageA.add_link(pageB)
pageA.add_link(pageC)

pageB.add_link(pageC)

pageC.add_link(pageA)

pageD.add_link(pageC)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(100)
pageRank.print_table(show_supernode=True)


Iteration #0
A: 0.25
B: 0.25
C: 0.25
D: 0.25

Iteration #1
A: 0.36250000000000004
B: 0.25625000000000003
C: 0.68125
D: 0.15000000000000002

Iteration #2
A: 0.7290625000000001
B: 0.3040625
C: 0.6493750000000001
D: 0.15000000000000002

Iteration #3
A: 0.7019687500000001
B: 0.45985156250000003
C: 0.8458046875000002
D: 0.15000000000000002

Iteration #4
A: 0.8689339843750001
B: 0.4483367187500001
C: 0.966710546875
D: 0.15000000000000002

Iteration #5
A: 0.97170396484375
B: 0.519296943359375
C: 1.0278831542968754
D: 0.15000000000000002

Iteration #6
A: 1.0237006811523441
B: 0.5629741850585938
C: 1.1318765869140623
D: 0.15000000000000002

Iteration #7
A: 1.112095098876953
B: 0.5850727894897463
C: 1.191100846789551
D: 0.15000000000000002

Iteration #8
A: 1.1624357197711181
B: 0.622640417022705
C: 1.2474522880889891
D: 0.15000000000000002

Iteration #9
A: 1.2103344448756408
B: 0.6440351809027252
C: 1.3007795353720244
D: 0.15000000000000002

Iteration #10
A: 1.2556626050662207
B: 0.664392139072

# Assignment Exercises
Complete all three of the following exercises.

### Exercise 1
While the inputs to Google's overall ranking algorithm are secret, the output of PageRank is not. There are tools that  allow you to enter a URL and see the Google PageRank of the corresponding page. For example, if you enter nuevaschool.org into [this one](https://dnschecker.org/pagerank.php), it will tell you that Nueva's homepage has a PR of 4.

Two notes:
1. Recall that this is not the raw output -- the PageRank algorithm can produce an arbitrarily high output, so to make it easier to understand, Google maps the output to a scale of 0-10, where 0 is low and 10 is high.
2. This tool is kind of a pain to use since it requires that you enter a captcha every time you submit. If you find a better one, let me know! Just be careful, as many of these tools have a bunch of spammy ads on them.

#### Questions
1. Play around with this tool and report back the values of some webpages. Can you find webpages of each rank between 0 and 10? 
2. Describe what kinds of webpages are represented in each rank. For example, you might say that pages of rank 9 are usually the homepages of the services of the biggest Internet companies in the world.


### Exercise 2
In the given implementation, we set the damping factor to 0.85 and we initialize each page's PageRank to be 1/n, where n is the total number of pages. 

1. What happens if you change the damping factor? How does the algorithm behave with a small damping factor vs a large one? 
2. What happens if you change the initial PageRank?

You may want to use the `debug=True` flag and the `calc_average_pagerank` method to get a better look at what's going on.

### Exercise 3
In class, we discussed the major advantage of PageRank compared to the search engine algorithms that came before it: since PageRank is determined by backlinks, which the owner of a page has less control over, it's highly resistant to *keyword stuffing*, which is the practice of putting a ton of keywords on your webpage so that the indexer thinks your page is relevant to all those keywords. In this exercise, we'll explore what kinds of practices can manipulate PageRank; this is known as search engine optimization, or SEO, and is a $80 billion industry in the US alone.


1. You've just created a startup, and you need to make a website for your startup. You have four pages to start with:
    1. Homepage
    2. About (the story of the company)
    3. Team (the bios of you and your cofounders)
    4. Blog
    
  How should you structure your website so that your homepage has the highest possible PageRank? List the PageRank values for each of the four webpages in this structure.
  
  
2. Your product is a hit! Glowing reviews are rolling in, and you want to add a page for Reviews that links to each of the 5 reviews so far. How does this affect your homepage's PageRank? Should you restructure your website, and if so, how? List the PageRank values for each of the five webpages you control (Home, About, Team, Blog, and Reviews).

3. It's been a month, and you've produced a blog post every week (total of 4 posts). Your homepage still has a link to the most recent post, and each post links to the post before it and after it, like so:

  `Home ==> Blog post 4 <==> Blog post 3 <==> Blog post 2 <== Blog post 1`
  
 How does this affect your homepage's PageRank? Should you restructure your website, and if so, how? (You may want to refer to your answer to question 1.) Do websites you visit structure their sites in this way? If not, what are some reasons why?
 
4. Copycats of your product are popping up all over the web, and your homepage is getting pushed off the first page of search results. You decide you need to take drastic action. What are some unscrupulous ways you might boost the PageRank of your homepage quickly? Model each of these and give their impact on the PageRank of your homepage.

5. Google figures out what you're doing, and they send you a nice cease and desist letter telling you to stop, or they'll remove your website from the search results entirely. What are some legitimate ways you might boost the PageRank of your homepage?


## Exercise 3
### 1

In [5]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD = Webpage("Blog")

pageB.add_link(pageA)
pageC.add_link(pageA)
pageD.add_link(pageA)

pageC.add_link(pageB)
pageD.add_link(pageC)

pageD.add_link(pageC)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.25
About: 0.25
Team: 0.25
Blog: 0.25

Iteration #1
Home: 0.575
About: 0.25625000000000003
Team: 0.25625000000000003
Blog: 0.15000000000000002

Iteration #2
Home: 0.54046875
About: 0.25890625
Team: 0.21375000000000002
Blog: 0.15000000000000002

Iteration #3
Home: 0.5246640625000001
About: 0.24084375000000002
Team: 0.21375000000000002
Blog: 0.15000000000000002

Iteration #4
Home: 0.5093109375
About: 0.24084375000000002
Team: 0.21375000000000002
Blog: 0.15000000000000002

Iteration #5
Home: 0.5093109375
About: 0.24084375000000002
Team: 0.21375000000000002
Blog: 0.15000000000000002

Iteration #6
Home: 0.5093109375
About: 0.24084375000000002
Team: 0.21375000000000002
Blog: 0.15000000000000002

Iteration #7
Home: 0.5093109375
About: 0.24084375000000002
Team: 0.21375000000000002
Blog: 0.15000000000000002

Iteration #8
Home: 0.5093109375
About: 0.24084375000000002
Team: 0.21375000000000002
Blog: 0.15000000000000002

Iteration #9
Home: 0.5093109375
About: 0.240843750000000

In [6]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD = Webpage("Blog")

pageB.add_link(pageA)
pageC.add_link(pageA)
pageD.add_link(pageA)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.25
About: 0.25
Team: 0.25
Blog: 0.25

Iteration #1
Home: 0.7875
About: 0.15000000000000002
Team: 0.15000000000000002
Blog: 0.15000000000000002

Iteration #2
Home: 0.5325000000000001
About: 0.15000000000000002
Team: 0.15000000000000002
Blog: 0.15000000000000002

Iteration #3
Home: 0.5325000000000001
About: 0.15000000000000002
Team: 0.15000000000000002
Blog: 0.15000000000000002

Iteration #4
Home: 0.5325000000000001
About: 0.15000000000000002
Team: 0.15000000000000002
Blog: 0.15000000000000002

Iteration #5
Home: 0.5325000000000001
About: 0.15000000000000002
Team: 0.15000000000000002
Blog: 0.15000000000000002

Iteration #6
Home: 0.5325000000000001
About: 0.15000000000000002
Team: 0.15000000000000002
Blog: 0.15000000000000002

Iteration #7
Home: 0.5325000000000001
About: 0.15000000000000002
Team: 0.15000000000000002
Blog: 0.15000000000000002

Iteration #8
Home: 0.5325000000000001
About: 0.15000000000000002
Team: 0.15000000000000002
Blog: 0.15000000000000002

Iteratio

In [11]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD = Webpage("Blog")

#pageC.add_link(pageB)
pageC.add_link(pageA)

pageB.add_link(pageA)

pageD.add_link(pageA)

pageA.add_link(pageB)
pageA.add_link(pageC)
pageA.add_link(pageD)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.25
About: 0.25
Team: 0.25
Blog: 0.25

Iteration #1
Home: 0.7875
About: 0.22083333333333335
Team: 0.22083333333333335
Blog: 0.22083333333333335

Iteration #2
Home: 0.7131250000000001
About: 0.37312500000000004
Team: 0.37312500000000004
Blog: 0.37312500000000004

Iteration #3
Home: 1.1014687500000002
About: 0.3520520833333334
Team: 0.3520520833333334
Blog: 0.3520520833333334

Iteration #4
Home: 1.0477328125
About: 0.46208281250000005
Team: 0.46208281250000005
Blog: 0.46208281250000005

Iteration #5
Home: 1.3283111718750003
About: 0.44685763020833336
Team: 0.44685763020833336
Blog: 0.44685763020833336

Iteration #6
Home: 1.28948695703125
About: 0.5263548320312501
Team: 0.5263548320312501
Blog: 0.5263548320312501

Iteration #7
Home: 1.492204821679688
About: 0.5153546378255208
Team: 0.5153546378255208
Blog: 0.5153546378255208

Iteration #8
Home: 1.4641543264550783
About: 0.5727913661425783
Team: 0.5727913661425783
Blog: 0.5727913661425783

Iteration #9
Home: 1.61061798

Home: 1.9184253116761014
About: 0.6935624647775871
Team: 0.6935624647775871
Blog: 0.6935624647775871

The maximum value for a single page is 1.9184253116761014, which occurs when all the pages link to that page, and that page also links back to some amount of the other pages. Interestingly, no matter how many pages the home page links to, it still has the same value.

### 2

In [15]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD = Webpage("Blog")
pageE = Webpage("Reviews")
pageR1 = Webpage("R1")
pageR2 = Webpage("R2")
pageR3 = Webpage("R3")
pageR4 = Webpage("R4")
pageR5 = Webpage("R5")

#pageC.add_link(pageB)
pageC.add_link(pageA)

pageB.add_link(pageA)

pageD.add_link(pageA)

pageE.add_link(pageA)
pageE.add_link(pageR1)
pageE.add_link(pageR2)
pageE.add_link(pageR3)
pageE.add_link(pageR4)
pageE.add_link(pageR5)


pageA.add_link(pageB)
pageA.add_link(pageC)
pageA.add_link(pageD)
pageA.add_link(pageE)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD)
pages.append(pageE)
pages.append(pageR1)
pages.append(pageR2)
pages.append(pageR3)
pages.append(pageR4)
pages.append(pageR5)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.1
About: 0.1
Team: 0.1
Blog: 0.1
Reviews: 0.1
R1: 0.1
R2: 0.1
R3: 0.1
R4: 0.1
R5: 0.1

Iteration #1
Home: 0.41916666666666674
About: 0.17125
Team: 0.17125
Blog: 0.17125
Reviews: 0.17125
R1: 0.16416666666666668
R2: 0.16416666666666668
R3: 0.16416666666666668
R4: 0.16416666666666668
R5: 0.16416666666666668

Iteration #2
Home: 0.6109479166666667
About: 0.2390729166666667
Team: 0.2390729166666667
Blog: 0.2390729166666667
Reviews: 0.2390729166666667
R1: 0.1742604166666667
R2: 0.1742604166666667
R3: 0.1742604166666667
R4: 0.1742604166666667
R5: 0.1742604166666667

Iteration #3
Home: 0.7935046006944445
About: 0.2798264322916667
Team: 0.2798264322916667
Blog: 0.2798264322916667
Reviews: 0.2798264322916667
R1: 0.18386866319444448
R2: 0.18386866319444448
R3: 0.18386866319444448
R4: 0.18386866319444448
R5: 0.18386866319444448

Iteration #4
Home: 0.9031994802517362
About: 0.31861972764756946
Team: 0.31861972764756946
Blog: 0.31861972764756946
Reviews: 0.31861972764756946
R1: 

In [26]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD = Webpage("Blog")
pageE = Webpage("Reviews")
pageR1 = Webpage("R1")
pageR2 = Webpage("R2")
pageR3 = Webpage("R3")
pageR4 = Webpage("R4")
pageR5 = Webpage("R5")

#pageC.add_link(pageB)
pageC.add_link(pageA)

pageB.add_link(pageA)

pageD.add_link(pageA)

pageE.add_link(pageA)
pageE.add_link(pageR1)
pageE.add_link(pageR2)
pageE.add_link(pageR3)
pageE.add_link(pageR4)
pageE.add_link(pageR5)


pageA.add_link(pageB)
pageA.add_link(pageC)
pageA.add_link(pageD)
#pageA.add_link(pageE)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD)
pages.append(pageE)
pages.append(pageR1)
pages.append(pageR2)
pages.append(pageR3)
pages.append(pageR4)
pages.append(pageR5)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.1
About: 0.1
Team: 0.1
Blog: 0.1
Reviews: 0.1
R1: 0.1
R2: 0.1
R3: 0.1
R4: 0.1
R5: 0.1

Iteration #1
Home: 0.41916666666666674
About: 0.17833333333333334
Team: 0.17833333333333334
Blog: 0.17833333333333334
Reviews: 0.15000000000000002
R1: 0.16416666666666668
R2: 0.16416666666666668
R3: 0.16416666666666668
R4: 0.16416666666666668
R5: 0.16416666666666668

Iteration #2
Home: 0.6260000000000001
About: 0.2687638888888889
Team: 0.2687638888888889
Blog: 0.2687638888888889
Reviews: 0.15000000000000002
R1: 0.17125
R2: 0.17125
R3: 0.17125
R4: 0.17125
R5: 0.17125

Iteration #3
Home: 0.8565979166666667
About: 0.3273666666666667
Team: 0.3273666666666667
Blog: 0.3273666666666667
Reviews: 0.15000000000000002
R1: 0.17125
R2: 0.17125
R3: 0.17125
R4: 0.17125
R5: 0.17125

Iteration #4
Home: 1.0060350000000002
About: 0.3927027430555556
Team: 0.3927027430555556
Blog: 0.3927027430555556
Reviews: 0.15000000000000002
R1: 0.17125
R2: 0.17125
R3: 0.17125
R4: 0.17125
R5: 0.17125

Iteration #

Home: 1.2937444102087614
About: 0.42492062589340013
Team: 0.42492062589340013
Blog: 0.42492062589340013
Reviews: 0.42492062589340013
R1: 0.2101970742215138
R2: 0.2101970742215138
R3: 0.2101970742215138
R4: 0.2101970742215138
R5: 0.2101970742215138

The reviews page linking to the reviews lowers the page rank of the home page because its voting power is effectively being spread among the reviews. Although this might defeat the purpose of the home page, unlinking the reviews from the homepage would mitigate the effect of this change, increasing the page rank to 1.99.

### 3

In [29]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD1 = Webpage("Blog1")
pageD2 = Webpage("Blog2")
pageD3 = Webpage("Blog3")
pageD4 = Webpage("Blog4")
pageE = Webpage("Reviews")
pageR1 = Webpage("R1")
pageR2 = Webpage("R2")
pageR3 = Webpage("R3")
pageR4 = Webpage("R4")
pageR5 = Webpage("R5")

#pageC.add_link(pageB)
pageC.add_link(pageA)

pageB.add_link(pageA)

pageD4.add_link(pageD3)
pageD3.add_link(pageD4)
pageD3.add_link(pageD2)
pageD2.add_link(pageD3)
pageD2.add_link(pageD1)
pageD1.add_link(pageD2)

pageE.add_link(pageA)
pageE.add_link(pageR1)
pageE.add_link(pageR2)
pageE.add_link(pageR3)
pageE.add_link(pageR4)
pageE.add_link(pageR5)


pageA.add_link(pageB)
pageA.add_link(pageC)
pageA.add_link(pageD4)
pageA.add_link(pageE)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD1)
pages.append(pageD2)
pages.append(pageD3)
pages.append(pageD4)
pages.append(pageE)
pages.append(pageR1)
pages.append(pageR2)
pages.append(pageR3)
pages.append(pageR4)
pages.append(pageR5)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.07692307692307693
About: 0.07692307692307693
Team: 0.07692307692307693
Blog1: 0.07692307692307693
Blog2: 0.07692307692307693
Blog3: 0.07692307692307693
Blog4: 0.07692307692307693
Reviews: 0.07692307692307693
R1: 0.07692307692307693
R2: 0.07692307692307693
R3: 0.07692307692307693
R4: 0.07692307692307693
R5: 0.07692307692307693

Iteration #1
Home: 0.29166666666666674
About: 0.16634615384615387
Team: 0.16634615384615387
Blog1: 0.1826923076923077
Blog2: 0.24807692307692308
Blog3: 0.24807692307692308
Blog4: 0.19903846153846155
Reviews: 0.16634615384615387
R1: 0.1608974358974359
R2: 0.1608974358974359
R3: 0.1608974358974359
R4: 0.1608974358974359
R5: 0.1608974358974359

Iteration #2
Home: 0.4563541666666667
About: 0.21197916666666672
Team: 0.21197916666666672
Blog1: 0.25543269230769233
Blog2: 0.4107211538461539
Blog3: 0.4246153846153847
Blog4: 0.317411858974359
Reviews: 0.21197916666666672
R1: 0.17356570512820516
R2: 0.17356570512820516
R3: 0.17356570512820516
R4: 0.173

In [30]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD1 = Webpage("Blog1")
pageD2 = Webpage("Blog2")
pageD3 = Webpage("Blog3")
pageD4 = Webpage("Blog4")
pageE = Webpage("Reviews")
pageR1 = Webpage("R1")
pageR2 = Webpage("R2")
pageR3 = Webpage("R3")
pageR4 = Webpage("R4")
pageR5 = Webpage("R5")

#pageC.add_link(pageB)
pageC.add_link(pageA)

pageB.add_link(pageA)

pageD4.add_link(pageA)
pageD4.add_link(pageD3)
pageD3.add_link(pageD4)
pageD3.add_link(pageD2)
pageD2.add_link(pageD3)
pageD2.add_link(pageD1)
pageD1.add_link(pageD2)

pageE.add_link(pageA)
pageE.add_link(pageR1)
pageE.add_link(pageR2)
pageE.add_link(pageR3)
pageE.add_link(pageR4)
pageE.add_link(pageR5)


pageA.add_link(pageB)
pageA.add_link(pageC)
pageA.add_link(pageD4)
pageA.add_link(pageE)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD1)
pages.append(pageD2)
pages.append(pageD3)
pages.append(pageD4)
pages.append(pageE)
pages.append(pageR1)
pages.append(pageR2)
pages.append(pageR3)
pages.append(pageR4)
pages.append(pageR5)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.07692307692307693
About: 0.07692307692307693
Team: 0.07692307692307693
Blog1: 0.07692307692307693
Blog2: 0.07692307692307693
Blog3: 0.07692307692307693
Blog4: 0.07692307692307693
Reviews: 0.07692307692307693
R1: 0.07692307692307693
R2: 0.07692307692307693
R3: 0.07692307692307693
R4: 0.07692307692307693
R5: 0.07692307692307693

Iteration #1
Home: 0.32435897435897443
About: 0.16634615384615387
Team: 0.16634615384615387
Blog1: 0.1826923076923077
Blog2: 0.24807692307692308
Blog3: 0.2153846153846154
Blog4: 0.19903846153846155
Reviews: 0.16634615384615387
R1: 0.1608974358974359
R2: 0.1608974358974359
R3: 0.1608974358974359
R4: 0.1608974358974359
R5: 0.1608974358974359

Iteration #2
Home: 0.5409455128205128
About: 0.2189262820512821
Team: 0.2189262820512821
Blog1: 0.25543269230769233
Blog2: 0.39682692307692313
Blog3: 0.3400240384615385
Blog4: 0.31046474358974363
Reviews: 0.2189262820512821
R1: 0.17356570512820516
R2: 0.17356570512820516
R3: 0.17356570512820516
R4: 0.1735

In [31]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD1 = Webpage("Blog1")
pageD2 = Webpage("Blog2")
pageD3 = Webpage("Blog3")
pageD4 = Webpage("Blog4")
pageE = Webpage("Reviews")
pageR1 = Webpage("R1")
pageR2 = Webpage("R2")
pageR3 = Webpage("R3")
pageR4 = Webpage("R4")
pageR5 = Webpage("R5")

#pageC.add_link(pageB)
pageC.add_link(pageA)

pageB.add_link(pageA)

#pageD4.add_link(pageA)
pageD4.add_link(pageD3)
pageD3.add_link(pageD4)
pageD3.add_link(pageD2)
pageD2.add_link(pageD3)
pageD2.add_link(pageD1)
pageD1.add_link(pageD2)
pageD1.add_link(pageA)

pageE.add_link(pageA)
pageE.add_link(pageR1)
pageE.add_link(pageR2)
pageE.add_link(pageR3)
pageE.add_link(pageR4)
pageE.add_link(pageR5)


pageA.add_link(pageB)
pageA.add_link(pageC)
pageA.add_link(pageD4)
pageA.add_link(pageE)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD1)
pages.append(pageD2)
pages.append(pageD3)
pages.append(pageD4)
pages.append(pageE)
pages.append(pageR1)
pages.append(pageR2)
pages.append(pageR3)
pages.append(pageR4)
pages.append(pageR5)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.07692307692307693
About: 0.07692307692307693
Team: 0.07692307692307693
Blog1: 0.07692307692307693
Blog2: 0.07692307692307693
Blog3: 0.07692307692307693
Blog4: 0.07692307692307693
Reviews: 0.07692307692307693
R1: 0.07692307692307693
R2: 0.07692307692307693
R3: 0.07692307692307693
R4: 0.07692307692307693
R5: 0.07692307692307693

Iteration #1
Home: 0.32435897435897443
About: 0.16634615384615387
Team: 0.16634615384615387
Blog1: 0.1826923076923077
Blog2: 0.2153846153846154
Blog3: 0.24807692307692308
Blog4: 0.19903846153846155
Reviews: 0.16634615384615387
R1: 0.1608974358974359
R2: 0.1608974358974359
R3: 0.1608974358974359
R4: 0.1608974358974359
R5: 0.1608974358974359

Iteration #2
Home: 0.5339983974358975
About: 0.2189262820512821
Team: 0.2189262820512821
Blog1: 0.24153846153846156
Blog2: 0.3330769230769231
Blog3: 0.4107211538461539
Blog4: 0.32435897435897443
Reviews: 0.2189262820512821
R1: 0.17356570512820516
R2: 0.17356570512820516
R3: 0.17356570512820516
R4: 0.17356

In [32]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD1 = Webpage("Blog1")
pageD2 = Webpage("Blog2")
pageD3 = Webpage("Blog3")
pageD4 = Webpage("Blog4")
pageE = Webpage("Reviews")
pageR1 = Webpage("R1")
pageR2 = Webpage("R2")
pageR3 = Webpage("R3")
pageR4 = Webpage("R4")
pageR5 = Webpage("R5")

#pageC.add_link(pageB)
pageC.add_link(pageA)

pageB.add_link(pageA)

pageD4.add_link(pageA)
pageD4.add_link(pageD3)
pageD3.add_link(pageD4)
pageD3.add_link(pageA)
pageD3.add_link(pageD2)
pageD2.add_link(pageD3)
pageD2.add_link(pageA)
pageD2.add_link(pageD1)
pageD1.add_link(pageD2)
pageD1.add_link(pageA)

pageE.add_link(pageA)
pageE.add_link(pageR1)
pageE.add_link(pageR2)
pageE.add_link(pageR3)
pageE.add_link(pageR4)
pageE.add_link(pageR5)


pageA.add_link(pageB)
pageA.add_link(pageC)
pageA.add_link(pageD4)
pageA.add_link(pageE)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD1)
pages.append(pageD2)
pages.append(pageD3)
pages.append(pageD4)
pages.append(pageE)
pages.append(pageR1)
pages.append(pageR2)
pages.append(pageR3)
pages.append(pageR4)
pages.append(pageR5)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.07692307692307693
About: 0.07692307692307693
Team: 0.07692307692307693
Blog1: 0.07692307692307693
Blog2: 0.07692307692307693
Blog3: 0.07692307692307693
Blog4: 0.07692307692307693
Reviews: 0.07692307692307693
R1: 0.07692307692307693
R2: 0.07692307692307693
R3: 0.07692307692307693
R4: 0.07692307692307693
R5: 0.07692307692307693

Iteration #1
Home: 0.40064102564102566
About: 0.16634615384615387
Team: 0.16634615384615387
Blog1: 0.17179487179487182
Blog2: 0.2044871794871795
Blog3: 0.2044871794871795
Blog4: 0.18814102564102567
Reviews: 0.16634615384615387
R1: 0.1608974358974359
R2: 0.1608974358974359
R3: 0.1608974358974359
R4: 0.1608974358974359
R5: 0.1608974358974359

Iteration #2
Home: 0.7252029914529915
About: 0.23513621794871797
Team: 0.23513621794871797
Blog1: 0.20793803418803422
Blog2: 0.28095085470085474
Blog3: 0.28789797008547013
Blog4: 0.2930742521367522
Reviews: 0.23513621794871797
R1: 0.17356570512820516
R2: 0.17356570512820516
R3: 0.17356570512820516
R4: 0.1

In [33]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD1 = Webpage("Blog1")
pageD2 = Webpage("Blog2")
pageD3 = Webpage("Blog3")
pageD4 = Webpage("Blog4")
pageE = Webpage("Reviews")
pageR1 = Webpage("R1")
pageR2 = Webpage("R2")
pageR3 = Webpage("R3")
pageR4 = Webpage("R4")
pageR5 = Webpage("R5")

#pageC.add_link(pageB)
pageC.add_link(pageA)

pageB.add_link(pageA)

pageD4.add_link(pageA)
pageD3.add_link(pageA)
pageD2.add_link(pageA)
pageD1.add_link(pageA)

pageE.add_link(pageA)
pageE.add_link(pageR1)
pageE.add_link(pageR2)
pageE.add_link(pageR3)
pageE.add_link(pageR4)
pageE.add_link(pageR5)


pageA.add_link(pageB)
pageA.add_link(pageC)
pageA.add_link(pageD4)
pageA.add_link(pageD3)
pageA.add_link(pageD2)
pageA.add_link(pageD1)
pageA.add_link(pageE)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD1)
pages.append(pageD2)
pages.append(pageD3)
pages.append(pageD4)
pages.append(pageE)
pages.append(pageR1)
pages.append(pageR2)
pages.append(pageR3)
pages.append(pageR4)
pages.append(pageR5)

pageRank = PageRank(pages, debug=True, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)


Iteration #0
Home: 0.07692307692307693
About: 0.07692307692307693
Team: 0.07692307692307693
Blog1: 0.07692307692307693
Blog2: 0.07692307692307693
Blog3: 0.07692307692307693
Blog4: 0.07692307692307693
Reviews: 0.07692307692307693
R1: 0.07692307692307693
R2: 0.07692307692307693
R3: 0.07692307692307693
R4: 0.07692307692307693
R5: 0.07692307692307693

Iteration #1
Home: 0.5532051282051282
About: 0.15934065934065936
Team: 0.15934065934065936
Blog1: 0.15934065934065936
Blog2: 0.15934065934065936
Blog3: 0.15934065934065936
Blog4: 0.15934065934065936
Reviews: 0.15934065934065936
R1: 0.1608974358974359
R2: 0.1608974358974359
R3: 0.1608974358974359
R4: 0.1608974358974359
R5: 0.1608974358974359

Iteration #2
Home: 0.9852106227106228
About: 0.21717490842490844
Team: 0.21717490842490844
Blog1: 0.21717490842490844
Blog2: 0.21717490842490844
Blog3: 0.21717490842490844
Blog4: 0.21717490842490844
Reviews: 0.21717490842490844
R1: 0.1725732600732601
R2: 0.1725732600732601
R3: 0.1725732600732601
R4: 0.17

The homepage pagerank goes down a lot, to 0.7, because its voting power is being directed at a bunch of blogs which don't reciprocate. This can be for the most part mitigated by having the most recent blog link back to the homepage, however this doesn't restore the pagerank completely, only up to 1.27. Having the blog at the end of the chain, Blog 1, by itself link back increases the pagerank by even less, up to 1.09, because the voting weight of the homepage has been dampened through the chain of blogs. The best solution, if maintaining the chain structure for the blogs is a requirement, is to have each blog also link back to the homepage, thereby minimizing the loss in voting power caused by dampening. This gives a pagerank of 1.84.

I sometimes see blogs using this last strategy. However, I most often see is the homepage linking to each blog separately, and that blog linking back to the homepage. Trying this out yields a pagerank of 2.57, which is the best. Intuitively, this also keeps the original structure from question 1, which I found was the best.

### 4

In [37]:
pages = []
pageA = Webpage("Home")
pageB = Webpage("About")
pageC = Webpage("Team")
pageD = Webpage("Blog")
pageE = Webpage("Reviews")
page1 = Webpage("SEO1")
page2 = Webpage("SEO2")
page3 = Webpage("SEO3")
page4 = Webpage("SEO4")
page5 = Webpage("SEO5")
page6 = Webpage("SEO6")
page7 = Webpage("SEO7")
page8 = Webpage("SEO8")
page9 = Webpage("SEO9")
page10 = Webpage("SEO10")

pageB.add_link(pageA)
pageC.add_link(pageA)
pageD.add_link(pageA)
pageE.add_link(pageA)
page1.add_link(pageA)
page2.add_link(pageA)
page3.add_link(pageA)
page4.add_link(pageA)
page5.add_link(pageA)
page6.add_link(pageA)
page7.add_link(pageA)
page8.add_link(pageA)
page9.add_link(pageA)
page10.add_link(pageA)

pageA.add_link(pageB)
pageA.add_link(pageC)
pageA.add_link(pageD)
pageA.add_link(pageE)
# pageA.add_link(page1)
# pageA.add_link(page2)
# pageA.add_link(page3)
# pageA.add_link(page4)
# pageA.add_link(page5)
# pageA.add_link(page6)
# pageA.add_link(page7)
# pageA.add_link(page8)
# pageA.add_link(page9)
# pageA.add_link(page10)


pages.append(pageA)
pages.append(pageB)
pages.append(pageC)
pages.append(pageD)
pages.append(pageE)
pages.append(page1)
pages.append(page2)
pages.append(page3)
pages.append(page4)
pages.append(page5)
pages.append(page6)
pages.append(page7)
pages.append(page8)
pages.append(page9)
pages.append(page10)

pageRank = PageRank(pages, debug=False, damping_factor=0.85, supernode=False)
pageRank.run_page_rank(50)
pageRank.print_table(show_supernode=True)

Home: 6.970930331610803
About: 1.63123224077431
Team: 1.63123224077431
Blog: 1.63123224077431
Reviews: 1.63123224077431
SEO1: 0.15000000000000002
SEO2: 0.15000000000000002
SEO3: 0.15000000000000002
SEO4: 0.15000000000000002
SEO5: 0.15000000000000002
SEO6: 0.15000000000000002
SEO7: 0.15000000000000002
SEO8: 0.15000000000000002
SEO9: 0.15000000000000002
SEO10: 0.15000000000000002


Since pagerank is based on the idea of having more pages for you, it would make sense to create a lot of pages to vote for the homepage. Using this strategy of having a bunch of superfluous, and having them only link to the homepage, gives a pagerank of 6.97. It doesn't matter whether the homepage links to these pages either, so they can be completely hidden.

This model of superfluous pages also models linking from other websites, another method of pagerank optimization. This can yield even higher pageranks if the website being linked from has an already high pagerank. You could pay to have your website linked to in an review or op-ed in an online newspaper with a high pagerank, for example.

### Exercise 5

Aside from turning each of your superfluous pages into a semi-legitimate page, you can have genuine reviews on other sites link to your page. You should also minimize the number of outbound links on your website.