# INSTALLING NETWORKX PACKAGE

Networkx is a Python library designed for working with complex networks or graphs. It offers a wide range of tools to examine the structure and behavior of networks, along with numerous graph theory algorithms.

In [4]:
pip install networkx




 # IMPORTING NECESSARY PACKAGES

The cosine_distance function is part of the util module within the cluster package of the NLTK (Natural Language Toolkit) library. It facilitates the computation of cosine distance, which is a measure used to determine the similarity between vectors, commonly used in tasks related to clustering and comparing text similarity.








In [6]:
from nltk.corpus import stopwords #you can remove stop words for speed
from nltk.cluster.util import cosine_distance
import numpy as np
import networkx as nx

# OPENING FILE AND SPLITTING INTO SENTENCES

In [10]:
file = open("C:/Users/teja/Downloads/Text1.txt", "r")
#This fileA contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ") #Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

It was the best of times
It was the worst of times
It was the age of wisdom
It was the age of foolishness
What is the importance of age
This is the best example.


# PRINTING LIST OF SENTENCES

In [11]:
print("Sentences are ", sentences)

Sentences are  [['It', 'was', 'the', 'best', 'of', 'times'], ['It', 'was', 'the', 'worst', 'of', 'times'], ['It', 'was', 'the', 'age', 'of', 'wisdom'], ['It', 'was', 'the', 'age', 'of', 'foolishness'], ['What', 'is', 'the', 'importance', 'of', 'age'], ['This', 'is', 'the', 'best', 'example.']]


# FUNCTION TO CALCULATE SIMILARITY

The sentence_similarity function measures the similarity between two sentences by using their word frequency vectors and the cosine distance metric. It initially converts the sentences to lowercase and then creates vectors that reflect the frequency of each unique word in the sentences. The function concludes by providing a similarity score derived from the cosine distance between these 

In [12]:
def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
     # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
     # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

# CREATING SIMILARITY MATRIX

The similarity_matrix, a numpy array filled initially with zeros, serves to quantify the similarity between pairs of sentences. As the code progresses, it assesses every possible sentence pairing, employing the sentence_similarity function to evaluate their similarity and recording these scores within the matrix. It disregards the matrix's diagonal elements, where idx1 equals idx2, since these indicate a sentence being compared to itself. Upon completion, the code outputs the fully populated similarity matrix.

In [13]:
similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2: #ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.83333333 0.66666667 0.66666667 0.33333333 0.36514837]
 [0.83333333 0.         0.66666667 0.66666667 0.33333333 0.18257419]
 [0.66666667 0.66666667 0.         0.83333333 0.5        0.18257419]
 [0.66666667 0.66666667 0.83333333 0.         0.5        0.18257419]
 [0.33333333 0.33333333 0.5        0.5        0.         0.36514837]
 [0.36514837 0.18257419 0.18257419 0.18257419 0.36514837 0.        ]]


# GETTING PAGERANK SCORES

In [14]:
# Step 3 - Rank sentences in similarity martix
sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.19306449754902083, 1: 0.18095893645850156, 2: 0.1911855225552033, 3: 0.1911855225552033, 4: 0.14434636291527997, 5: 0.09925915796679063}


# SORTING SENTENCE BY PAGE RANK

In [15]:
# Step 4 - Sort the rank and pick top sentences
ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.19306449754902083, ['It', 'was', 'the', 'best', 'of', 'times']), (0.1911855225552033, ['It', 'was', 'the', 'age', 'of', 'wisdom']), (0.1911855225552033, ['It', 'was', 'the', 'age', 'of', 'foolishness']), (0.18095893645850156, ['It', 'was', 'the', 'worst', 'of', 'times']), (0.14434636291527997, ['What', 'is', 'the', 'importance', 'of', 'age']), (0.09925915796679063, ['This', 'is', 'the', 'best', 'example.'])]


# PICKING TOP "N" SENTENCES

In [19]:
#Step 5 - How many sentences to pick
n = int(input("How many sentences do you want in the summary? "))
#n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))

How many sentences do you want in the summary?  3


# PRINTING SUMMARY

In [20]:
### Step 6 - Offcourse, output the summarize text
print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 It was the best of times. It was the age of wisdom. It was the age of foolishness


# INSTALLING NETWORKX PACKAGE


Networkx is a Python library designed for constructing, modifying, and analyzing complex networks or graphs. It offers a comprehensive collection of tools for examining network structures and dynamics, alongside a variety of graph theory algorithms.

In [21]:
pip install networkx

Note: you may need to restart the kernel to use updated packages.


# IMPORTING NECESSARY PACKAGES

The cosine_distance function is part of the util module within the cluster package of the NLTK (Natural Language Toolkit) library. It facilitates the computation of cosine distance, which is a measure used to determine the similarity between vectors, commonly used in tasks related to clustering and comparing text similarity.






In [22]:
from nltk.corpus import stopwords #you can remove stop words for speed
from nltk.cluster.util import cosine_distance
import numpy as np
import networkx as nx

# OPENING FILE AND SPLITTING INTO SENTENCES

In [23]:
file = open("C:/Users/teja/Downloads/Text1.txt", "r")
#This fileA contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ") #Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

It was the best of times
It was the worst of times
It was the age of wisdom
It was the age of foolishness
What is the importance of age
This is the best example.


# PRINTING LIST OF SENTENCES

In [24]:
print("Sentences are ", sentences)

Sentences are  [['It', 'was', 'the', 'best', 'of', 'times'], ['It', 'was', 'the', 'worst', 'of', 'times'], ['It', 'was', 'the', 'age', 'of', 'wisdom'], ['It', 'was', 'the', 'age', 'of', 'foolishness'], ['What', 'is', 'the', 'importance', 'of', 'age'], ['This', 'is', 'the', 'best', 'example.']]


# FUNCTION TO CALCULATE SIMILARITY


The sentence_similarity function determines how similar two sentences are by analyzing their word frequency vectors and using cosine distance as the measurement. Initially, it converts the sentences to lowercase and constructs vectors that capture the frequency of every unique word in the sentences. The process concludes by computing a similarity score that reflects the cosine distance between these vectors.

In [25]:
def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
     # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
     # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

# CREATING SIMILARITY MATRIX

In [26]:
similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2: #ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.83333333 0.66666667 0.66666667 0.33333333 0.36514837]
 [0.83333333 0.         0.66666667 0.66666667 0.33333333 0.18257419]
 [0.66666667 0.66666667 0.         0.83333333 0.5        0.18257419]
 [0.66666667 0.66666667 0.83333333 0.         0.5        0.18257419]
 [0.33333333 0.33333333 0.5        0.5        0.         0.36514837]
 [0.36514837 0.18257419 0.18257419 0.18257419 0.36514837 0.        ]]


# GETTING PAGERANK SCORES

In [27]:
# Step 3 - Rank sentences in similarity martix
sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.19306449754902083, 1: 0.18095893645850156, 2: 0.1911855225552033, 3: 0.1911855225552033, 4: 0.14434636291527997, 5: 0.09925915796679063}


# SORTING SENTENCE BY PAGE RANK

In [28]:
# Step 4 - Sort the rank and pick top sentences
ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.19306449754902083, ['It', 'was', 'the', 'best', 'of', 'times']), (0.1911855225552033, ['It', 'was', 'the', 'age', 'of', 'wisdom']), (0.1911855225552033, ['It', 'was', 'the', 'age', 'of', 'foolishness']), (0.18095893645850156, ['It', 'was', 'the', 'worst', 'of', 'times']), (0.14434636291527997, ['What', 'is', 'the', 'importance', 'of', 'age']), (0.09925915796679063, ['This', 'is', 'the', 'best', 'example.'])]


# PICKING TOP "N" SENTENCES

In [29]:
#Step 5 - How many sentences to pick
n = int(input("How many sentences do you want in the summary? "))
#n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))

How many sentences do you want in the summary?  3


# PRINTING SUMMARY

In [30]:
### Step 6 - Offcourse, output the summarize text
print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 It was the best of times. It was the age of wisdom. It was the age of foolishness


# TEXTFILE-2

# OPENING FILE AND SPLITTING INTO SENTENCES

In [31]:
file = open("C:/Users/teja/Downloads/Text2.txt", "r")
#This fileA contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ") #Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

I AM SAM
I AM SAM
SAM I AM
THAT SAM-I-AM! THAT SAM-I-AM!
I DO NOT LIKE THAT SAM-I-AM!
DO WOULD YOU LIKE GREEN EGGS AND HAM?
I DO NOT LIKE THEM, SAM-I-AM
I DO NOT LIKE GREEN EGGS AND HAM
WOULD YOU LIKE THEM HERE OR THERE?
I WOULD NOT LIKE THEM HERE OR THERE
I WOULD NOT LIKE THEM ANYWHERE
I DO NOT LIKE GREEN EGGS AND HAM
I DO NOT LIKE THEM, SAM-I-AM
WOULD YOU LIKE THEM IN A HOUSE?
WOULD YOU LIKE THEN WITH A MOUSE?
I DO NOT LIKE THEM IN A HOUSE
I DO NOT LIKE THEM WITH A MOUSE
I DO NOT LIKE THEM HERE OR THERE
I DO NOT LIKE THEM ANYWHERE
I DO NOT LIKE GREEN EGGS AND HAM
I DO NOT LIKE THEM, SAM-I-AM.



# PRINTING LIST OF SENTENCES

In [32]:
print("Sentences are ", sentences)

Sentences are  [['I', 'AM', 'SAM'], ['I', 'AM', 'SAM'], ['SAM', 'I', 'AM'], ['THAT', 'SAM-I-AM!', 'THAT', 'SAM-I-AM!'], ['I', 'DO', 'NOT', 'LIKE', 'THAT', 'SAM-I-AM!'], ['DO', 'WOULD', 'YOU', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM?'], ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM'], ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM'], ['WOULD', 'YOU', 'LIKE', 'THEM', 'HERE', 'OR', 'THERE?'], ['I', 'WOULD', 'NOT', 'LIKE', 'THEM', 'HERE', 'OR', 'THERE'], ['I', 'WOULD', 'NOT', 'LIKE', 'THEM', 'ANYWHERE'], ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM'], ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM'], ['WOULD', 'YOU', 'LIKE', 'THEM', 'IN', 'A', 'HOUSE?'], ['WOULD', 'YOU', 'LIKE', 'THEN', 'WITH', 'A', 'MOUSE?'], ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'IN', 'A', 'HOUSE'], ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'WITH', 'A', 'MOUSE'], ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'HERE', 'OR', 'THERE'], ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'ANYWHERE'], ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AN

# FUNCTION TO CALCULATE SIMILARITY

The sentence_similarity function assesses the similarity between two sentences by using their word frequency vectors and cosine distance as the measurement tool. It starts by converting the sentences to lowercase and then constructs vectors that denote the frequency of each unique word found in the sentences. In the end, the function provides a similarity score that is derived from the cosine distance between these two vectors.







In [33]:
def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
     # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
     # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

# CREATING SIMILARITY MATRIX

The similarity_matrix, a numpy array starting off with zeros, maps out the similarity levels between pairs of sentences. It goes through every sentence combination, uses the sentence_similarity function to determine the similarity for each pair, and then fills the similarity_matrix with these calculated scores. It overlooks the matrix's diagonal elements, where idx1 equals idx2, since this would only compare a sentence to itself. At the end, it outputs the completed similarity matrix.

In [34]:
similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2: #ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         1.         1.         0.         0.23570226 0.
  0.23570226 0.20412415 0.         0.20412415 0.23570226 0.20412415
  0.23570226 0.         0.         0.20412415 0.20412415 0.20412415
  0.23570226 0.20412415 0.23570226]
 [1.         0.         1.         0.         0.23570226 0.
  0.23570226 0.20412415 0.         0.20412415 0.23570226 0.20412415
  0.23570226 0.         0.         0.20412415 0.20412415 0.20412415
  0.23570226 0.20412415 0.23570226]
 [1.         1.         0.         0.         0.23570226 0.
  0.23570226 0.20412415 0.         0.20412415 0.23570226 0.20412415
  0.23570226 0.         0.         0.20412415 0.20412415 0.20412415
  0.23570226 0.20412415 0.23570226]
 [0.         0.         0.         0.         0.57735027 0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.23570226 0.23570226 0.23570226 0.57735027 0.         0.28867513
 

# GETTING PAGERANK SCORES

In [35]:
# Step 3 - Rank sentences in similarity martix
sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.03492638917817515, 1: 0.03492638917817515, 2: 0.03492638917817515, 3: 0.010552425887368076, 4: 0.06321847556853293, 5: 0.03960609463088077, 6: 0.057289451630209653, 7: 0.056226314678287335, 8: 0.03486412173792891, 9: 0.053338934277549654, 10: 0.0566062415473087, 11: 0.056226314678287335, 12: 0.05728945163020967, 13: 0.0350182718613405, 14: 0.02940805086606966, 15: 0.056983599690836606, 16: 0.05702071446687991, 17: 0.057531865612924, 18: 0.06143601920162538, 19: 0.056226314678287335, 20: 0.05637816982094812}


# SORTING SENTENCE BY PAGE RANK

In [36]:
# Step 4 - Sort the rank and pick top sentences
ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.06321847556853293, ['I', 'DO', 'NOT', 'LIKE', 'THAT', 'SAM-I-AM!']), (0.06143601920162538, ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'ANYWHERE']), (0.057531865612924, ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'HERE', 'OR', 'THERE']), (0.05728945163020967, ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM']), (0.057289451630209653, ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM']), (0.05702071446687991, ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'WITH', 'A', 'MOUSE']), (0.056983599690836606, ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'IN', 'A', 'HOUSE']), (0.0566062415473087, ['I', 'WOULD', 'NOT', 'LIKE', 'THEM', 'ANYWHERE']), (0.05637816982094812, ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM.\n']), (0.056226314678287335, ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM']), (0.056226314678287335, ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM']), (0.056226314678287335, ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM']), (0.053338934277549654, ['I', 

# PICKING TOP "N" SENTENCES

In [37]:
#Step 5 - How many sentences to pick
n = int(input("How many sentences do you want in the summary? "))
#n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))


How many sentences do you want in the summary?  5


# PRINTING SUMMARY

In [38]:
### Step 6 - Offcourse, output the summarize text
print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 I DO NOT LIKE THAT SAM-I-AM!. I DO NOT LIKE THEM ANYWHERE. I DO NOT LIKE THEM HERE OR THERE. I DO NOT LIKE THEM, SAM-I-AM. I DO NOT LIKE THEM, SAM-I-AM


# TEXTFILE-3

# OPENING FILE AND SPLITTING INTO SENTENCES

In [39]:
file = open("C:/Users/teja/Downloads/Text3.txt", "r")
#This fileA contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ") #Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

As an institution of higher learning, Sacred Heart University places special emphasis on academic integrity, which is a commitment to the fundamental values of honesty, trust, fairness, respect, and responsibility
Only when these values are widely respected and practiced by all members of the University students, faculty, administrators, and staff can the University maintain a culture that promotes free exploration of knowledge, constructive debate, genuine learning, effective research, fair assessment of student progress, and development of members characters
These aims of the University require that its members exercise mutual responsibilities
At its core, academic integrity is secured by a principled commitment to carry out these responsibilities, not by rules and penalties
Students and faculty should strive to create an academic environment that is honest, fair, and respectful of all
They do this by evaluating others work fairly, by responding to others ideas critically yet courteo

# PRINTING LIST OF SENTENCES

In [40]:
print("Sentences are ", sentences)

Sentences are  [['As', 'an', 'institution', 'of', 'higher', 'learning,', 'Sacred', 'Heart', 'University', 'places', 'special', 'emphasis', 'on', 'academic', 'integrity,', 'which', 'is', 'a', 'commitment', 'to', 'the', 'fundamental', 'values', 'of', 'honesty,', 'trust,', 'fairness,', 'respect,', 'and', 'responsibility'], ['Only', 'when', 'these', 'values', 'are', 'widely', 'respected', 'and', 'practiced', 'by', 'all', 'members', 'of', 'the', 'University', 'students,', 'faculty,', 'administrators,', 'and', 'staff', 'can', 'the', 'University', 'maintain', 'a', 'culture', 'that', 'promotes', 'free', 'exploration', 'of', 'knowledge,', 'constructive', 'debate,', 'genuine', 'learning,', 'effective', 'research,', 'fair', 'assessment', 'of', 'student', 'progress,', 'and', 'development', 'of', 'members', 'characters'], ['These', 'aims', 'of', 'the', 'University', 'require', 'that', 'its', 'members', 'exercise', 'mutual', 'responsibilities'], ['At', 'its', 'core,', 'academic', 'integrity', 'is', 

# FUNCTION TO CALCULATE SIMILARITY

The sentence_similarity function assesses the similarity between two sentences by using their word frequency vectors and cosine distance as the measurement tool. It starts by converting the sentences to lowercase and then constructs vectors that denote the frequency of each unique word found in the sentences. In the end, the function provides a similarity score that is derived from the cosine distance between these two vectors.

In [41]:
def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
     # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
     # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

# CREATING SIMILARITY MATRIX


similarity_matrix is a numpy array initialized with zeros, where each element represents the similarity between two sentences. The code iterates through all combinations of sentences, calculates the similarity between each pair using the sentence_similarity function, and stores the resulting similarity scores in the similarity_matrix. The diagonal elements (where idx1 == idx2) are ignored, as they represent comparisons of a sentence with itself. The final similarity matrix is printed out.

In [42]:
similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2: #ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.375      0.20412415 0.22116293 0.31622777 0.25315802
  0.30439039 0.33348648 0.26382243 0.25259074 0.3006689 ]
 [0.375      0.         0.40824829 0.17201562 0.31622777 0.36822985
  0.32780503 0.26274693 0.22613351 0.3127314  0.33407655]
 [0.20412415 0.40824829 0.         0.12038585 0.12909944 0.15032921
  0.22941573 0.14852213 0.18463724 0.29462783 0.21821789]
 [0.22116293 0.17201562 0.12038585 0.         0.2331262  0.35290144
  0.16571045 0.17879963 0.13336627 0.         0.11821656]
 [0.31622777 0.31622777 0.12909944 0.2331262  0.         0.26200013
  0.26655699 0.26843775 0.14301939 0.09128709 0.25354628]
 [0.25315802 0.36822985 0.15032921 0.35290144 0.26200013 0.
  0.27590308 0.22327214 0.19429458 0.1860229  0.22143052]
 [0.30439039 0.32780503 0.22941573 0.16571045 0.26655699 0.27590308
  0.         0.3180176  0.25415212 0.32444284 0.37546963]
 [0.33348648 0.26274693 0.14852213 0.17879963 0.26843775 0.22327214
  0.3180176  0.         0.32907259 0.31

# GETTING PAGERANK SCORES

In [44]:
# Step 3 - Rank sentences in similarity martix
sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.10052808533762261, 1: 0.1092617813187325, 2: 0.07743541734389628, 3: 0.06555131921078501, 4: 0.08353031285994028, 5: 0.09032582020361654, 6: 0.10094508912146115, 7: 0.10059598487085707, 8: 0.08058955768995534, 9: 0.08737172856802625, 10: 0.10386490347510732}


# SORTING SENTENCE BY PAGE RANK

In [45]:
# Step 4 - Sort the rank and pick top sentences
ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.1092617813187325, ['Only', 'when', 'these', 'values', 'are', 'widely', 'respected', 'and', 'practiced', 'by', 'all', 'members', 'of', 'the', 'University', 'students,', 'faculty,', 'administrators,', 'and', 'staff', 'can', 'the', 'University', 'maintain', 'a', 'culture', 'that', 'promotes', 'free', 'exploration', 'of', 'knowledge,', 'constructive', 'debate,', 'genuine', 'learning,', 'effective', 'research,', 'fair', 'assessment', 'of', 'student', 'progress,', 'and', 'development', 'of', 'members', 'characters']), (0.10386490347510732, ['All', 'matriculated', 'students', 'will', 'be', 'provided', 'with', 'a', 'full', 'description', 'of', 'the', 'University', 'standards', 'for', 'academic', 'integrity,', 'consequences', 'for', 'violations,', 'and', 'the', 'appeals', 'procedure.']), (0.10094508912146115, ['Appropriate', 'disciplinary', 'action', 'will', 'be', 'taken', 'for', 'violations', 'of', 'academic', 'integrity,', 'including', 'plagiari

# PICKING TOP "N" SENTENCES

In [47]:
#Step 5 - How many sentences to pick
n = int(input("How many sentences do you want in the summary? "))
#n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))


How many sentences do you want in the summary?  5


# PRINTING SUMMARY

In [48]:
### Step 6 - Offcourse, output the summarize text
print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 Only when these values are widely respected and practiced by all members of the University students, faculty, administrators, and staff can the University maintain a culture that promotes free exploration of knowledge, constructive debate, genuine learning, effective research, fair assessment of student progress, and development of members characters. All matriculated students will be provided with a full description of the University standards for academic integrity, consequences for violations, and the appeals procedure.. Appropriate disciplinary action will be taken for violations of academic integrity, including plagiarism, cheating, any use of materials for an assignment or exam that is not permitted by the instructor, and theft or mutilation of intellectual materials or other University equipment. Faculty will assign failing grades for violations of the University policy on academic integrity and students may immediately receive an F for a course in which they c