# INSTALLING NETWORKX PACKAGE

Networkx is a Python library designed for working with complex networks or graphs. It offers a wide range of tools to examine the structure and behavior of networks, along with numerous graph theory algorithms.

In [1]:
pip install networkx

Note: you may need to restart the kernel to use updated packages.


 # IMPORTING NECESSARY PACKAGES

The cosine_distance function is part of the util module within the cluster package of the NLTK (Natural Language Toolkit) library. It facilitates the computation of cosine distance, which is a measure used to determine the similarity between vectors, commonly used in tasks related to clustering and comparing text similarity.








In [2]:
from nltk.corpus import stopwords #you can remove stop words for speed
from nltk.cluster.util import cosine_distance
import numpy as np
import networkx as nx

# OPENING FILE AND SPLITTING INTO SENTENCES

In [3]:
file = open("C:/Users/teja/OneDrive/Desktop/text1.txt", "r")
#This fileA contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ") #Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

In the heart of an ancient forest, where the thick canopy barely let the sunlight touch the moss-covered ground, a small stream meandered through the underbrush, its gentle babble a constant in the otherwise silent expanse
This forest, untouched by time and human influence, held secrets from ages past, its trees standing as silent witnesses to the history that unfolded in its depths
Among these towering sentinels, wildlife flourished in a delicate balance, a symphony of life that played out in the rustling leaves and the soft footfalls on the forest floor.



# PRINTING LIST OF SENTENCES

In [4]:
print("Sentences are ", sentences)

Sentences are  [['In', 'the', 'heart', 'of', 'an', 'ancient', 'forest,', 'where', 'the', 'thick', 'canopy', 'barely', 'let', 'the', 'sunlight', 'touch', 'the', 'moss-covered', 'ground,', 'a', 'small', 'stream', 'meandered', 'through', 'the', 'underbrush,', 'its', 'gentle', 'babble', 'a', 'constant', 'in', 'the', 'otherwise', 'silent', 'expanse'], ['This', 'forest,', 'untouched', 'by', 'time', 'and', 'human', 'influence,', 'held', 'secrets', 'from', 'ages', 'past,', 'its', 'trees', 'standing', 'as', 'silent', 'witnesses', 'to', 'the', 'history', 'that', 'unfolded', 'in', 'its', 'depths'], ['Among', 'these', 'towering', 'sentinels,', 'wildlife', 'flourished', 'in', 'a', 'delicate', 'balance,', 'a', 'symphony', 'of', 'life', 'that', 'played', 'out', 'in', 'the', 'rustling', 'leaves', 'and', 'the', 'soft', 'footfalls', 'on', 'the', 'forest', 'floor.\n']]


# FUNCTION TO CALCULATE SIMILARITY

The sentence_similarity function measures the similarity between two sentences by using their word frequency vectors and the cosine distance metric. It initially converts the sentences to lowercase and then creates vectors that reflect the frequency of each unique word in the sentences. The function concludes by providing a similarity score derived from the cosine distance between these 

In [5]:
def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
     # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
     # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

# CREATING SIMILARITY MATRIX

The similarity_matrix, a numpy array filled initially with zeros, serves to quantify the similarity between pairs of sentences. As the code progresses, it assesses every possible sentence pairing, employing the sentence_similarity function to evaluate their similarity and recording these scores within the matrix. It disregards the matrix's diagonal elements, where idx1 equals idx2, since these indicate a sentence being compared to itself. Upon completion, the code outputs the fully populated similarity matrix.

In [6]:
similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2: #ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.26633806 0.51675233]
 [0.26633806 0.         0.20814536]
 [0.51675233 0.20814536 0.        ]]


# GETTING PAGERANK SCORES

In [7]:
# Step 3 - Rank sentences in similarity martix
sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.38835725455122294, 1: 0.2504316620240267, 2: 0.36121108342474983}


# SORTING SENTENCE BY PAGE RANK

In [8]:
# Step 4 - Sort the rank and pick top sentences
ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.38835725455122294, ['In', 'the', 'heart', 'of', 'an', 'ancient', 'forest,', 'where', 'the', 'thick', 'canopy', 'barely', 'let', 'the', 'sunlight', 'touch', 'the', 'moss-covered', 'ground,', 'a', 'small', 'stream', 'meandered', 'through', 'the', 'underbrush,', 'its', 'gentle', 'babble', 'a', 'constant', 'in', 'the', 'otherwise', 'silent', 'expanse']), (0.36121108342474983, ['Among', 'these', 'towering', 'sentinels,', 'wildlife', 'flourished', 'in', 'a', 'delicate', 'balance,', 'a', 'symphony', 'of', 'life', 'that', 'played', 'out', 'in', 'the', 'rustling', 'leaves', 'and', 'the', 'soft', 'footfalls', 'on', 'the', 'forest', 'floor.\n']), (0.2504316620240267, ['This', 'forest,', 'untouched', 'by', 'time', 'and', 'human', 'influence,', 'held', 'secrets', 'from', 'ages', 'past,', 'its', 'trees', 'standing', 'as', 'silent', 'witnesses', 'to', 'the', 'history', 'that', 'unfolded', 'in', 'its', 'depths'])]


# PICKING TOP "N" SENTENCES

In [11]:
#Step 5 - How many sentences to pick
n = int(input("How many sentences do you want in the summary? "))
#n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))

How many sentences do you want in the summary?  2


# PRINTING SUMMARY

In [12]:
### Step 6 - Offcourse, output the summarize text
print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 In the heart of an ancient forest, where the thick canopy barely let the sunlight touch the moss-covered ground, a small stream meandered through the underbrush, its gentle babble a constant in the otherwise silent expanse. Among these towering sentinels, wildlife flourished in a delicate balance, a symphony of life that played out in the rustling leaves and the soft footfalls on the forest floor.



# INSTALLING NETWORKX PACKAGE


Networkx is a Python library designed for constructing, modifying, and analyzing complex networks or graphs. It offers a comprehensive collection of tools for examining network structures and dynamics, alongside a variety of graph theory algorithms.

In [13]:
pip install networkx

Note: you may need to restart the kernel to use updated packages.


# IMPORTING NECESSARY PACKAGES

The cosine_distance function is part of the util module within the cluster package of the NLTK (Natural Language Toolkit) library. It facilitates the computation of cosine distance, which is a measure used to determine the similarity between vectors, commonly used in tasks related to clustering and comparing text similarity.






In [14]:
from nltk.corpus import stopwords #you can remove stop words for speed
from nltk.cluster.util import cosine_distance
import numpy as np
import networkx as nx

# OPENING FILE AND SPLITTING INTO SENTENCES

In [15]:
file = open("C:/Users/abhil/OneDrive/Desktop/text1.txt", "r")
#This fileA contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ") #Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

In the heart of an ancient forest, where the thick canopy barely let the sunlight touch the moss-covered ground, a small stream meandered through the underbrush, its gentle babble a constant in the otherwise silent expanse
This forest, untouched by time and human influence, held secrets from ages past, its trees standing as silent witnesses to the history that unfolded in its depths
Among these towering sentinels, wildlife flourished in a delicate balance, a symphony of life that played out in the rustling leaves and the soft footfalls on the forest floor.



# PRINTING LIST OF SENTENCES

In [16]:
print("Sentences are ", sentences)

Sentences are  [['In', 'the', 'heart', 'of', 'an', 'ancient', 'forest,', 'where', 'the', 'thick', 'canopy', 'barely', 'let', 'the', 'sunlight', 'touch', 'the', 'moss-covered', 'ground,', 'a', 'small', 'stream', 'meandered', 'through', 'the', 'underbrush,', 'its', 'gentle', 'babble', 'a', 'constant', 'in', 'the', 'otherwise', 'silent', 'expanse'], ['This', 'forest,', 'untouched', 'by', 'time', 'and', 'human', 'influence,', 'held', 'secrets', 'from', 'ages', 'past,', 'its', 'trees', 'standing', 'as', 'silent', 'witnesses', 'to', 'the', 'history', 'that', 'unfolded', 'in', 'its', 'depths'], ['Among', 'these', 'towering', 'sentinels,', 'wildlife', 'flourished', 'in', 'a', 'delicate', 'balance,', 'a', 'symphony', 'of', 'life', 'that', 'played', 'out', 'in', 'the', 'rustling', 'leaves', 'and', 'the', 'soft', 'footfalls', 'on', 'the', 'forest', 'floor.\n']]


# FUNCTION TO CALCULATE SIMILARITY


The sentence_similarity function determines how similar two sentences are by analyzing their word frequency vectors and using cosine distance as the measurement. Initially, it converts the sentences to lowercase and constructs vectors that capture the frequency of every unique word in the sentences. The process concludes by computing a similarity score that reflects the cosine distance between these vectors.

In [17]:
def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
     # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
     # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

# CREATING SIMILARITY MATRIX

In [18]:
similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2: #ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.26633806 0.51675233]
 [0.26633806 0.         0.20814536]
 [0.51675233 0.20814536 0.        ]]


# GETTING PAGERANK SCORES

In [19]:
# Step 3 - Rank sentences in similarity martix
sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.38835725455122294, 1: 0.2504316620240267, 2: 0.36121108342474983}


# SORTING SENTENCE BY PAGE RANK

In [20]:
# Step 4 - Sort the rank and pick top sentences
ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.38835725455122294, ['In', 'the', 'heart', 'of', 'an', 'ancient', 'forest,', 'where', 'the', 'thick', 'canopy', 'barely', 'let', 'the', 'sunlight', 'touch', 'the', 'moss-covered', 'ground,', 'a', 'small', 'stream', 'meandered', 'through', 'the', 'underbrush,', 'its', 'gentle', 'babble', 'a', 'constant', 'in', 'the', 'otherwise', 'silent', 'expanse']), (0.36121108342474983, ['Among', 'these', 'towering', 'sentinels,', 'wildlife', 'flourished', 'in', 'a', 'delicate', 'balance,', 'a', 'symphony', 'of', 'life', 'that', 'played', 'out', 'in', 'the', 'rustling', 'leaves', 'and', 'the', 'soft', 'footfalls', 'on', 'the', 'forest', 'floor.\n']), (0.2504316620240267, ['This', 'forest,', 'untouched', 'by', 'time', 'and', 'human', 'influence,', 'held', 'secrets', 'from', 'ages', 'past,', 'its', 'trees', 'standing', 'as', 'silent', 'witnesses', 'to', 'the', 'history', 'that', 'unfolded', 'in', 'its', 'depths'])]


# PICKING TOP "N" SENTENCES

In [21]:
#Step 5 - How many sentences to pick
n = int(input("How many sentences do you want in the summary? "))
#n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))

How many sentences do you want in the summary?  2


# PRINTING SUMMARY

In [22]:
### Step 6 - Offcourse, output the summarize text
print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 In the heart of an ancient forest, where the thick canopy barely let the sunlight touch the moss-covered ground, a small stream meandered through the underbrush, its gentle babble a constant in the otherwise silent expanse. Among these towering sentinels, wildlife flourished in a delicate balance, a symphony of life that played out in the rustling leaves and the soft footfalls on the forest floor.



# TEXTFILE-2

# OPENING FILE AND SPLITTING INTO SENTENCES

In [23]:
file = open("C:/Users/abhil/OneDrive/Desktop/text2.txt", "r")
#This fileA contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ") #Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

Amidst a bustling city, where skyscrapers stretched towards the sky like fingers reaching for the stars, there existed a small, secluded park
This oasis of green was a stark contrast to the concrete jungle that surrounded it, offering a haven for those seeking solace from the relentless pace of urban life
In this park, a tranquil pond mirrored the few fluffy clouds that adorned the otherwise clear blue sky, while willow trees whispered secrets to the gentle breeze that danced through their branches.



# PRINTING LIST OF SENTENCES

In [24]:
print("Sentences are ", sentences)

Sentences are  [['Amidst', 'a', 'bustling', 'city,', 'where', 'skyscrapers', 'stretched', 'towards', 'the', 'sky', 'like', 'fingers', 'reaching', 'for', 'the', 'stars,', 'there', 'existed', 'a', 'small,', 'secluded', 'park'], ['This', 'oasis', 'of', 'green', 'was', 'a', 'stark', 'contrast', 'to', 'the', 'concrete', 'jungle', 'that', 'surrounded', 'it,', 'offering', 'a', 'haven', 'for', 'those', 'seeking', 'solace', 'from', 'the', 'relentless', 'pace', 'of', 'urban', 'life'], ['In', 'this', 'park,', 'a', 'tranquil', 'pond', 'mirrored', 'the', 'few', 'fluffy', 'clouds', 'that', 'adorned', 'the', 'otherwise', 'clear', 'blue', 'sky,', 'while', 'willow', 'trees', 'whispered', 'secrets', 'to', 'the', 'gentle', 'breeze', 'that', 'danced', 'through', 'their', 'branches.\n']]


# FUNCTION TO CALCULATE SIMILARITY

The sentence_similarity function assesses the similarity between two sentences by using their word frequency vectors and cosine distance as the measurement tool. It starts by converting the sentences to lowercase and then constructs vectors that denote the frequency of each unique word found in the sentences. In the end, the function provides a similarity score that is derived from the cosine distance between these two vectors.







In [25]:
def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
     # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
     # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

# CREATING SIMILARITY MATRIX

The similarity_matrix, a numpy array starting off with zeros, maps out the similarity levels between pairs of sentences. It goes through every sentence combination, uses the sentence_similarity function to determine the similarity for each pair, and then fills the similarity_matrix with these calculated scores. It overlooks the matrix's diagonal elements, where idx1 equals idx2, since this would only compare a sentence to itself. At the end, it outputs the completed similarity matrix.

In [26]:
similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2: #ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.29834709 0.24806947]
 [0.29834709 0.         0.32071349]
 [0.24806947 0.32071349 0.        ]]


# GETTING PAGERANK SCORES

In [27]:
# Step 3 - Rank sentences in similarity martix
sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.31700154440863204, 1: 0.35454424577017063, 2: 0.32845420982119716}


# SORTING SENTENCE BY PAGE RANK

In [28]:
# Step 4 - Sort the rank and pick top sentences
ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.35454424577017063, ['This', 'oasis', 'of', 'green', 'was', 'a', 'stark', 'contrast', 'to', 'the', 'concrete', 'jungle', 'that', 'surrounded', 'it,', 'offering', 'a', 'haven', 'for', 'those', 'seeking', 'solace', 'from', 'the', 'relentless', 'pace', 'of', 'urban', 'life']), (0.32845420982119716, ['In', 'this', 'park,', 'a', 'tranquil', 'pond', 'mirrored', 'the', 'few', 'fluffy', 'clouds', 'that', 'adorned', 'the', 'otherwise', 'clear', 'blue', 'sky,', 'while', 'willow', 'trees', 'whispered', 'secrets', 'to', 'the', 'gentle', 'breeze', 'that', 'danced', 'through', 'their', 'branches.\n']), (0.31700154440863204, ['Amidst', 'a', 'bustling', 'city,', 'where', 'skyscrapers', 'stretched', 'towards', 'the', 'sky', 'like', 'fingers', 'reaching', 'for', 'the', 'stars,', 'there', 'existed', 'a', 'small,', 'secluded', 'park'])]


# PICKING TOP "N" SENTENCES

In [29]:
#Step 5 - How many sentences to pick
n = int(input("How many sentences do you want in the summary? "))
#n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))


How many sentences do you want in the summary?  2


# PRINTING SUMMARY

In [30]:
### Step 6 - Offcourse, output the summarize text
print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 This oasis of green was a stark contrast to the concrete jungle that surrounded it, offering a haven for those seeking solace from the relentless pace of urban life. In this park, a tranquil pond mirrored the few fluffy clouds that adorned the otherwise clear blue sky, while willow trees whispered secrets to the gentle breeze that danced through their branches.



# TEXTFILE-3

# OPENING FILE AND SPLITTING INTO SENTENCES

In [31]:
file = open("C:/Users/abhil/OneDrive/Desktop/text3.txt", "r")
#This fileA contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ") #Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

On the rugged coastline, where the ocean's mighty waves met the land with a symphony of roars and whispers, stood a solitary lighthouse
Its beacon, a guiding light in the veil of night, offered solace to the sailors braving the tumultuous sea
The lighthouse, weathered by storms and the salt-laden air, bore the marks of its unyielding vigil over the treacherous waters, a sentinel of safety in the vast, unpredictable ocean.



# PRINTING LIST OF SENTENCES

In [32]:
print("Sentences are ", sentences)

Sentences are  [['On', 'the', 'rugged', 'coastline,', 'where', 'the', "ocean's", 'mighty', 'waves', 'met', 'the', 'land', 'with', 'a', 'symphony', 'of', 'roars', 'and', 'whispers,', 'stood', 'a', 'solitary', 'lighthouse'], ['Its', 'beacon,', 'a', 'guiding', 'light', 'in', 'the', 'veil', 'of', 'night,', 'offered', 'solace', 'to', 'the', 'sailors', 'braving', 'the', 'tumultuous', 'sea'], ['The', 'lighthouse,', 'weathered', 'by', 'storms', 'and', 'the', 'salt-laden', 'air,', 'bore', 'the', 'marks', 'of', 'its', 'unyielding', 'vigil', 'over', 'the', 'treacherous', 'waters,', 'a', 'sentinel', 'of', 'safety', 'in', 'the', 'vast,', 'unpredictable', 'ocean.\n']]


# FUNCTION TO CALCULATE SIMILARITY

The sentence_similarity function assesses the similarity between two sentences by using their word frequency vectors and cosine distance as the measurement tool. It starts by converting the sentences to lowercase and then constructs vectors that denote the frequency of each unique word found in the sentences. In the end, the function provides a similarity score that is derived from the cosine distance between these two vectors.

In [33]:
def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
     # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
     # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

# CREATING SIMILARITY MATRIX


similarity_matrix is a numpy array initialized with zeros, where each element represents the similarity between two sentences. The code iterates through all combinations of sentences, calculates the similarity between each pair using the sentence_similarity function, and stores the resulting similarity scores in the similarity_matrix. The diagonal elements (where idx1 == idx2) are ignored, as they represent comparisons of a sentence with itself. The final similarity matrix is printed out.

In [34]:
similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2: #ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.43105272 0.50299545]
 [0.43105272 0.         0.56011203]
 [0.50299545 0.56011203 0.        ]]


# GETTING PAGERANK SCORES

In [35]:
# Step 3 - Rank sentences in similarity martix
sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.3147858025181738, 1: 0.33176553948092746, 2: 0.3534486580008985}


# SORTING SENTENCE BY PAGE RANK

In [36]:
# Step 4 - Sort the rank and pick top sentences
ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.3534486580008985, ['The', 'lighthouse,', 'weathered', 'by', 'storms', 'and', 'the', 'salt-laden', 'air,', 'bore', 'the', 'marks', 'of', 'its', 'unyielding', 'vigil', 'over', 'the', 'treacherous', 'waters,', 'a', 'sentinel', 'of', 'safety', 'in', 'the', 'vast,', 'unpredictable', 'ocean.\n']), (0.33176553948092746, ['Its', 'beacon,', 'a', 'guiding', 'light', 'in', 'the', 'veil', 'of', 'night,', 'offered', 'solace', 'to', 'the', 'sailors', 'braving', 'the', 'tumultuous', 'sea']), (0.3147858025181738, ['On', 'the', 'rugged', 'coastline,', 'where', 'the', "ocean's", 'mighty', 'waves', 'met', 'the', 'land', 'with', 'a', 'symphony', 'of', 'roars', 'and', 'whispers,', 'stood', 'a', 'solitary', 'lighthouse'])]


# PICKING TOP "N" SENTENCES

In [37]:
#Step 5 - How many sentences to pick
n = int(input("How many sentences do you want in the summary? "))
#n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))


How many sentences do you want in the summary?  2


# PRINTING SUMMARY

In [38]:
### Step 6 - Offcourse, output the summarize text
print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 The lighthouse, weathered by storms and the salt-laden air, bore the marks of its unyielding vigil over the treacherous waters, a sentinel of safety in the vast, unpredictable ocean.
. Its beacon, a guiding light in the veil of night, offered solace to the sailors braving the tumultuous sea
