# Text Summarizer

In [1]:
import summarizer
import pickle
import torch

In [2]:
from summarizer import Summarizer

In [3]:
def summary_perc(initial_text, summarized_text):
    '''
    The purpose of this function is to output a % associated to the length of the text which was reduced and summarized.
    
    args:
        initial_text (str) : This variable holds the initial body of text passed throguh the model
        summarized_text (str) : This holds the output text which was summarized by the model
    
    returns:
        A percentage associated to how much from the initial text was summarized
        
    example:
        summary_perc(initial_text = body, summarized_text = result)
    '''
    percentage = len(summarized_text) / len(initial_text)
    return print('The initial text was reduced by : ', 1 - percentage)

### Call Pre Trained Bert Model

In [4]:
model = Summarizer()

### Pass in Text

In [10]:
text = '''
Clearly, the number of pieces of information that must be found and used for bauxite to become, say, the aluminum sheeting that forms the cas- ing of the printing press that produced the pages that you are now reading is staggeringly large. It is a number far larger than the mere one billion pieces of the jigsaw puzzle in my example.
It’s foolish to expect any one person (or small group of people) to find all the pieces of information necessary for the production of aluminum sheeting (and for the production of fuselages for airliners, the production of oven foil, the production of soda cans ... the list is long).
Not only is the mere finding of all the many pieces of information too difficult to entrust to a small group of people; so, too, is the task of putting these pieces together in a way that yields useful final products.
Let’s now amend the example to make the jigsaw puzzle an even bet- ter metaphor for economic reality. Suppose that, unlike with regular jigsaw puzzles, each piece of this puzzle can be made to fit snugly and smoothly with any other piece. In this case, merely assembling all of the one billion puzzle pieces so that they fit together neatly is easy. But note that it is possible to create an unfathomably large number of scenes with these pieces.
Trouble is, only a tiny handful of these scenes will please the human eye. Most of the scenes will be visual gibberish. The challenge is to arrange the pieces together so that the final result is a recognizable scene—say, of a field of sunflowers or of a bustling city street. Only if the scene is recognizable is the assembled puzzle valuable.
Now imagine yourself standing alone before a gigantic table covered with these one billion puzzle pieces. What are the chances that you alone can put these pieces together so that the final result is a coherent visual image—a useful and valuable final result?
'''


In [6]:
result = model(text, min_length=60)

In [7]:
result

'Clearly, the number of pieces of information that must be found and used for bauxite to become, say, the aluminum sheeting that forms the cas- ing of the printing press that produced the pages that you are now reading is staggeringly large. Trouble is, only a tiny handful of these scenes will please the human eye.'

In [11]:
'The Summarized Text : {}'.format(result)

'The Summarized Text : Clearly, the number of pieces of information that must be found and used for bauxite to become, say, the aluminum sheeting that forms the cas- ing of the printing press that produced the pages that you are now reading is staggeringly large. Trouble is, only a tiny handful of these scenes will please the human eye.\n'

In [12]:
len(text)

1892

In [13]:
len(result)

315

In [14]:
summary_perc(initial_text = text, summarized_text = result)


The initial text was reduced by :  0.8335095137420718


### Save Model

In [15]:
filename = '../summarizer_model.pt'

In [16]:
torch.save(model, filename)

In [17]:
model = torch.load(filename)
# model.eval()

# Text Summarizer

In [1]:
'''
https://towardsdatascience.com/understand-text-summarization-and-create-your-own-summarizer-in-python-b26a9f09fc70
https://github.com/rohithreddy024/Text-Summarizer-Pytorch/blob/master/model.py
'''
import numpy as np
import networkx as nx

In [2]:
from nltk.corpus import stopwords
from nltk.cluster.util import cosine_distance

### Functions

In [3]:
def read_article(data):
    article = data.split(". ")
    sentences = []
    for sentence in article:
        sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))
    sentences.pop() 
    
    return sentences


def sentence_similarity(sent1, sent2, stopwords=None):
    if stopwords is None:
        stopwords = []
 
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
 
    all_words = list(set(sent1 + sent2))
 
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
 
    # build the vector for the first sentence
    for w in sent1:
        if w in stopwords:
            continue
        vector1[all_words.index(w)] += 1
 
    # build the vector for the second sentence
    for w in sent2:
        if w in stopwords:
            continue
        vector2[all_words.index(w)] += 1
 
    return 1 - cosine_distance(vector1, vector2)
 
def build_similarity_matrix(sentences, stop_words):
    # Create an empty similarity matrix
    similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
    for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
            if idx1 == idx2: #ignore if both are same sentences
                continue 
            similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2], stop_words)

    return similarity_matrix


def generate_summary(data, top_n=5):
    stop_words = stopwords.words('english')
    summarize_text = []

    # Step 1 - Read text anc split it
    sentences =  read_article(data)

    # Step 2 - Generate Similary Martix across sentences
    sentence_similarity_martix = build_similarity_matrix(sentences, stop_words)

    # Step 3 - Rank sentences in similarity martix
    sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_martix)
    scores = nx.pagerank(sentence_similarity_graph)

    # Step 4 - Sort the rank and pick top sentences
    ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
#     print("Indexes of top ranked_sentence order are ", ranked_sentence)    

    for i in range(top_n):
        summarize_text.append(" ".join(ranked_sentence[i][1]))

    # Step 5 - Offcourse, output the summarize texr
    print("Summarize Text: \n", ". ".join(summarize_text))

### Run Predictions

In [4]:
text = '''
In an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has been launched to empower the next generation of students with AI-ready skills. 
Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services. As part of the program, the Redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in India with the program will set up the core AI infrastructure and IoT Hub for the selected campuses. 
The company will provide AI development tools and Azure AI services such as Microsoft Cognitive Services, Bot Services and Azure Machine Learning.According to Manish Prakash, Country General Manager-PS, Health and Education, Microsoft India, said, "With AI being the defining technology of our time, it is transforming lives and industry and the jobs of tomorrow will require a different skillset. This will require more collaborations and training and working with AI. That’s why it has become more critical than ever for educational institutions to integrate new cloud and AI technologies. The program is an attempt to ramp up the institutional set-up and build capabilities among the educators to educate the workforce of tomorrow." The program aims to build up the cognitive skills and in-depth understanding of developing intelligent cloud connected solutions for applications across industry. Earlier in April this year, the company announced Microsoft Professional Program In AI as a learning track open to the public. The program was developed to provide job ready skills to programmers who wanted to hone their skills in AI and data science with a series of online courses which featured hands-on labs and expert instructors as well. This program also included developer-focused AI school that provided a bunch of assets to help build AI skills.
'''

In [5]:
generate_summary(data = text, top_n = 5)

Summarize Text: 
 
Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services. 
The company will provide AI development tools and Azure AI services such as Microsoft Cognitive Services, Bot Services and Azure Machine Learning.According to Manish Prakash, Country General Manager-PS, Health and Education, Microsoft India, said, "With AI being the defining technology of our time, it is transforming lives and industry and the jobs of tomorrow will require a different skillset. Earlier in April this year, the company announced Microsoft Professional Program In AI as a learning track open to the public. 
In an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has been launched to empower the next generation of students with AI-ready skills. As part of the program, t