**Getting our document / text.**

In [None]:
text = """
Python is an interpreted high-level general-purpose programming language. Its design philosophy emphasizes code readability with its use of significant indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[30]

Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.[31]

Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0.[32] Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting. Python 3.0 was released in 2008 and was a major revision of the language that is not completely backward-compatible. Python 2 was discontinued with version 2.7.18 in 2020.[33]

Python consistently ranks as one of the most popular programming languages."""

Importing important libraries.

In [None]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [None]:
# Stop words are a set of commonly used words in any language.
stopwords = list(STOP_WORDS)
print(stopwords)

['last', 'using', 'whoever', 'yet', 'indeed', 'his', 'have', 'many', 'same', '‘ll', 'himself', 'must', 'please', 'our', 'over', 'whereupon', 'us', 'else', 'their', 'seemed', 'somewhere', 'whereafter', 'everywhere', 'by', 'very', 'it', 'themselves', '‘re', 'along', 'latterly', 'mine', 'me', 'under', 'whom', 'noone', 'even', 'one', "n't", 'nevertheless', '’s', 'should', 'afterwards', 'few', 'he', 'n’t', 'toward', '’d', '‘ve', 'whence', 'unless', 'third', 'amount', 'put', 'go', 'both', 'five', '‘s', 'why', 'during', 'had', 'four', 'every', 'ours', 'twenty', 'beforehand', 'someone', 'thence', 'an', 'six', 'them', 'was', 'though', 'were', 'myself', 'nor', 'is', 'besides', 'because', 'everyone', 'this', 'not', 'beside', 'hereafter', 'than', 'see', 'next', 'around', 'itself', 'became', 'since', 'really', 'much', '’ll', 'via', 'thus', 'here', 'to', 'herein', 'thereupon', 'no', 'eleven', 'give', 'eight', 'sometime', 'together', 'ten', 'becoming', 'former', 'meanwhile', 'per', 'against', 'everyt

**Let’s understand what are these terms tokenizer, tagger, parser and NER.**

Tokenization —The process of segmenting a document /paragraph /text into words, sentences, punctuations marks etc is called tokenization.

Part-of-speech (POS) Tagging — Assigning word types to tokens, like verb or noun.

Dependency Parsing —Assigning syntactic dependency labels, describing the relations between individual tokens, like subject or object.

Named Entity Recognition (NER) — Labelling named “real-world” objects, like persons, companies or locations.

In [None]:
nlp = spacy.load('en_core_web_sm')

 Calling the ‘nlp’ object on a string of text which will return a processed document.

In [None]:
# A Doc is a sequence of Token objects
doc = nlp(text)
doc


Python is an interpreted high-level general-purpose programming language. Its design philosophy emphasizes code readability with its use of significant indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[30]

Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.[31]

Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0.[32] Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting. Python 3.0 was released in 2008 and was a major revision of the language that is

performing word tokenization here , to check the tokens.

In [None]:
tokens = [token.text for token in doc]
print(tokens)

['\n', 'Python', 'is', 'an', 'interpreted', 'high', '-', 'level', 'general', '-', 'purpose', 'programming', 'language', '.', 'Its', 'design', 'philosophy', 'emphasizes', 'code', 'readability', 'with', 'its', 'use', 'of', 'significant', 'indentation', '.', 'Its', 'language', 'constructs', 'as', 'well', 'as', 'its', 'object', '-', 'oriented', 'approach', 'aim', 'to', 'help', 'programmers', 'write', 'clear', ',', 'logical', 'code', 'for', 'small', 'and', 'large', '-', 'scale', 'projects.[30', ']', '\n\n', 'Python', 'is', 'dynamically', '-', 'typed', 'and', 'garbage', '-', 'collected', '.', 'It', 'supports', 'multiple', 'programming', 'paradigms', ',', 'including', 'structured', '(', 'particularly', ',', 'procedural', ')', ',', 'object', '-', 'oriented', 'and', 'functional', 'programming', '.', 'It', 'is', 'often', 'described', 'as', 'a', '"', 'batteries', 'included', '"', 'language', 'due', 'to', 'its', 'comprehensive', 'standard', 'library.[31', ']', '\n\n', 'Guido', 'van', 'Rossum', 'be

Adding extra punctuations.

In [None]:
punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

 Text Preprocessing and cleaning.

In [None]:
word_frequencies = {}
for word in doc:
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            if word.text not in word_frequencies.keys():
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text] += 1

print(word_frequencies)       
print(len(word_frequencies))

{'Python': 8, 'interpreted': 1, 'high': 1, 'level': 1, 'general': 1, 'purpose': 1, 'programming': 5, 'language': 5, 'design': 1, 'philosophy': 1, 'emphasizes': 1, 'code': 2, 'readability': 1, 'use': 1, 'significant': 1, 'indentation': 1, 'constructs': 1, 'object': 2, 'oriented': 2, 'approach': 1, 'aim': 1, 'help': 1, 'programmers': 1, 'write': 1, 'clear': 1, 'logical': 1, 'small': 1, 'large': 1, 'scale': 1, 'projects.[30': 1, '\n\n': 3, 'dynamically': 1, 'typed': 1, 'garbage': 2, 'collected': 1, 'supports': 1, 'multiple': 1, 'paradigms': 1, 'including': 1, 'structured': 1, 'particularly': 1, 'procedural': 1, 'functional': 1, 'described': 1, 'batteries': 1, 'included': 1, 'comprehensive': 1, 'standard': 1, 'library.[31': 1, 'Guido': 1, 'van': 1, 'Rossum': 1, 'began': 1, 'working': 1, 'late': 1, '1980s': 1, 'successor': 1, 'ABC': 1, 'released': 3, '1991': 1, '0.9.0.[32': 1, '2.0': 1, '2000': 1, 'introduced': 1, 'new': 1, 'features': 1, 'list': 1, 'comprehensions': 1, 'collection': 1, 'sy

In [None]:
max_frequency = max(word_frequencies.values())
max_frequency

8

Normalizing frequency counts.

In [None]:
# updating the values of words as total probability
for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word]/max_frequency

print(word_frequencies)

{'Python': 1.0, 'interpreted': 0.125, 'high': 0.125, 'level': 0.125, 'general': 0.125, 'purpose': 0.125, 'programming': 0.625, 'language': 0.625, 'design': 0.125, 'philosophy': 0.125, 'emphasizes': 0.125, 'code': 0.25, 'readability': 0.125, 'use': 0.125, 'significant': 0.125, 'indentation': 0.125, 'constructs': 0.125, 'object': 0.25, 'oriented': 0.25, 'approach': 0.125, 'aim': 0.125, 'help': 0.125, 'programmers': 0.125, 'write': 0.125, 'clear': 0.125, 'logical': 0.125, 'small': 0.125, 'large': 0.125, 'scale': 0.125, 'projects.[30': 0.125, '\n\n': 0.375, 'dynamically': 0.125, 'typed': 0.125, 'garbage': 0.25, 'collected': 0.125, 'supports': 0.125, 'multiple': 0.125, 'paradigms': 0.125, 'including': 0.125, 'structured': 0.125, 'particularly': 0.125, 'procedural': 0.125, 'functional': 0.125, 'described': 0.125, 'batteries': 0.125, 'included': 0.125, 'comprehensive': 0.125, 'standard': 0.125, 'library.[31': 0.125, 'Guido': 0.125, 'van': 0.125, 'Rossum': 0.125, 'began': 0.125, 'working': 0.1

Sentence Tokenization

In [None]:
sentence_tokens = [sent for sent in doc.sents]
print(sentence_tokens)
print(len(sentence_tokens))

[
Python is an interpreted high-level general-purpose programming language., Its design philosophy emphasizes code readability with its use of significant indentation., Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[30]

Python is dynamically-typed and garbage-collected., It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming., It is often described as a "batteries included" language due to its comprehensive standard library.[31]

Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0.[32] Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting., Python 3.0 was released in 2008 and was a major revision of the language t

Finding sentence scores.

In [None]:
sentence_scores = {}
for sent in sentence_tokens:
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if sent not in sentence_scores.keys():
                sentence_scores[sent] = word_frequencies[word.text.lower()]
            else:
                sentence_scores[sent] += word_frequencies[word.text.lower()]
                
sentence_scores

{
 Python is an interpreted high-level general-purpose programming language.: 1.875,
 Its design philosophy emphasizes code readability with its use of significant indentation.: 1.125,
 Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[30]
 
 Python is dynamically-typed and garbage-collected.: 3.875,
 It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming.: 2.75,
 It is often described as a "batteries included" language due to its comprehensive standard library.[31]
 
 Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0.[32] Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting.: 6.375,
 Python 3.0 was released in 200

In [None]:
from heapq import nlargest

In [None]:
select_length = int(len(sentence_tokens)*0.3)
select_length

2

In [None]:
summary = nlargest(select_length, sentence_scores, key = sentence_scores.get)
summary

[It is often described as a "batteries included" language due to its comprehensive standard library.[31]
 
 Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0.[32] Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting.,
 Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[30]
 
 Python is dynamically-typed and garbage-collected.]

In [None]:
final_summary = [word.text for word in summary]
summary = ' '.join(final_summary)

In [None]:
print(text)


Python is an interpreted high-level general-purpose programming language. Its design philosophy emphasizes code readability with its use of significant indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[30]

Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.[31]

Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0.[32] Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting. Python 3.0 was released in 2008 and was a major revision of the language that is

In [None]:
print(summary)

It is often described as a "batteries included" language due to its comprehensive standard library.[31]

Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0.[32] Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a garbage collection system using reference counting. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[30]

Python is dynamically-typed and garbage-collected.


In [None]:
len(text)

1172

In [None]:
len(summary)

616