## Text Summarization - NLP based approach

### Creating an Article Summarizer

In [5]:
# Importing necessary libraries

import bs4 as bs
import urllib.request
import re
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to C:\Users\Chaitanya
[nltk_data]     V\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [2]:
# Getting the data

source = urllib.request.urlopen('https://en.wikipedia.org/wiki/Global_warming')

# Parsing the downloaded gibberish html document
soup = bs.BeautifulSoup(source,'lxml') # Prettifying gibberish HTML Doc into an readable HTML Doc

In [3]:
text = ""
for paragraph in soup.find_all('p'): # 'p' implies paragraph tag in a HTML Doc
    text += paragraph.text

### Pre-processing the text 

In [6]:
text = re.sub(r"\[[0-9]*\]"," ",text)
text = re.sub(r'\s+',' ',text)
clean_text = text.lower()
clean_text = re.sub(r'\W',' ',clean_text)
clean_text = re.sub(r'\d',' ',clean_text)
clean_text = re.sub(r'\s+',' ',clean_text)

In [8]:
sentences = nltk.sent_tokenize(text)

stop_words = nltk.corpus.stopwords.words('english')

word2count = {}
for word in nltk.word_tokenize(clean_text):
    if word not in stop_words:
        if word not in word2count.keys():
            word2count[word] = 1
        else:
            word2count[word] += 1
            
for key in word2count.keys():
    word2count[key] = word2count[key]/max(word2count.values())

In [12]:
sent2score = {}

for sentence in sentences:
    for word in nltk.word_tokenize(sentence.lower()):
        if word in word2count.keys():
            if len(sentence.split(' ')) < 25: # To create summary, we need smaller sentences
                if sentence not in sent2score.keys():
                    sent2score[sentence] = word2count[word]
                else:
                    sent2score[sentence] += word2count[word]

In [16]:
import heapq

best_sentences = heapq.nlargest(5,sent2score,key=sent2score.get)

print('-------------------------------------------------------------------------------------------------------------------------------\n')
for sentence in best_sentences:
    print(sentence,end=" ")
print('\n-------------------------------------------------------------------------------------------------------------------------------')

-------------------------------------------------------------------------------------------------------------------------------

People who regard climate change as catastrophic, irreversible, or rapid might label climate change as a climate crisis or a climate emergency. Abrupt climate change, tipping points in the climate system: Climate change could result in global, large-scale changes. Scientists have determined that the major factors in the current climate change are greenhouse gases, land use changes, and aerosols and soot. In the late 19th century, scientists first argued that human emissions of greenhouse gases could change the climate. One potential source of abrupt climate change would be the rapid release of methane and carbon dioxide from permafrost, which would amplify global warming. 
-------------------------------------------------------------------------------------------------------------------------------


#### So, this is the summary we got from text classification using NLP