### Importing Libraries

In [1]:
import numpy as np
import networkx as nx

from sklearn.feature_extraction.text import TfidfVectorizer

### Cleaning punctuations

In [2]:
def read_article(file_name):
    file = open(file_name, "r")
    filedata = file.readlines()
    article = filedata[0].split(". ")
    sentences = []
    for sentence in article:
        sentences.append(sentence.replace("[^a-zA-Z]", " "))
    return article, sentences

### Similarity Matrix
Using Cosine similarity to find similarity between sentences

In [3]:
def build_similarity_matrix(sentences):
    similarity_matrix = np.zeros((len(sentences), len(sentences)))
    
    vect = TfidfVectorizer(min_df=1, stop_words="english")
    tfidf = vect.fit_transform(sentences) 
    pairwise_similarity = tfidf * tfidf.T 
    similarity_matrix = pairwise_similarity.toarray()
    
    return similarity_matrix

### Generate Summary method

In [4]:
def generate_summary(file_name, top_n=5):
    summarize_text = []
    
    # Step 1 – Read the text and tokenize
    article, sentences =  read_article(file_name)

    # Step 2 – Generate Similarly Matrix across sentences
    sentence_similarity_martix = build_similarity_matrix(sentences)
    
    # Step 3 – Rank sentences in the similarity matrix
    sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_martix)
    scores = nx.pagerank(sentence_similarity_graph)
    
    # Step 4 – Sort the rank and pick top sentences
    ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)
    for i in range(top_n):
        summarize_text.append(ranked_sentence[i][1])
    
    summarized_text = ". ".join(summarize_text)
    article = ". ".join(article)
    
    return article, summarized_text

### Testing

In [5]:
article, summarized_text = generate_summary( "../Data/SachinEssay.txt",5)
print("Original Article: \n", article)
print("Summarized Text: \n", summarized_text)

Original Article: 
 We have often heard that ‘Cricket is religion in India and Sachin is God’. There is no better statement that can justify the status of Sachin Tendulkar in India. Sachin Tendulkar in India is not just a cricketer, he is God. Also for common people in India, he is an example of hard work and determination. Thus, he is worshipped by everyone in India. The essay on Sachin Tendulkar is a small insight into one of the greatest ever sportsman to have played any game. Sachin Tendulkar was born to a middle-class family and came to limelight with a performance in the school tournament. He held a record partnership at that time with Vinod Kambli playing for their school tournament. That innings changed Sachin’s life completely and he became famous in a fortnight. In those days, the school tournaments had a different fanbase and a good performance can bring you in limelight. Because of that inning, people all over India came to know who Sachin Tendulkar was! Thus, it also caugh