Make summarizer using Natural Language Toolkit (NLTK) in python. The script takes user input for a block of text and the desired number of sentences in the summary, then generates and prints a summary by extracting the most important sentences from the input text.

First download the ntlk library using "pip install ntlk" and download the necessary NTLK data

In [None]:
import nltk

nltk.download('punkt')
nltk.download('stopwords')

Importing necessary libraries

In [None]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from collections import defaultdict
import string

1. Converts the text into lowercase and tokenizes it into words.
2. Removes stopwords and punctuation.
3. Creates a frequency table (dictionary) where keys are words and values are the frequency of those words in the text.

In [None]:
def compute_word_frequencies(text):
    stop_words = set(stopwords.words('english'))
    words = word_tokenize(text)
    
    freq_table = defaultdict(int)
    for word in words:
        word = word.lower()
        if word not in stop_words and word not in string.punctuation:
            freq_table[word] += 1
    
    return freq_table

Tokenizes each sentence into words. Scores each sentence by summing the frequencies of its words (as provided by the frequency table).

In [None]:
def score_sentences(sentences, freq_table):
    sentence_scores = defaultdict(int)
    
    for sentence in sentences:
        words = word_tokenize(sentence.lower())
        for word in words:
            if word in freq_table:
                sentence_scores[sentence] += freq_table[word]
    
    return sentence_scores

1. Splits the text into sentences.
2. Computes word frequencies for the text.
3. Scores each sentence based on word frequencies.
4. Sorts the sentences by their scores in descending order.
5. Selects the top n sentences and joins them into a single string to form the summary.

In [None]:
def generate_summary(text, n):
    sentences = sent_tokenize(text)
    word_frequencies = compute_word_frequencies(text)
    sentence_scores = score_sentences(sentences, word_frequencies)
    
    # Get the top n sentences with the highest scores
    summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:n]
    
    return ' '.join(summary_sentences)

Prompts the user to input the text they want to summarize and prompts the user to specify the number of sentences they want in the summary.

In [None]:
text = input("Enter the text you want to summarize: ")
num_sentences = int(input("Enter the number of sentences you want in the summary: "))

Generate and print the summary

In [None]:
# Generate summary
summary = generate_summary(text, num_sentences)

# Print summary
print("\nSummary:")
print(summary)

Here is the completed code

In [4]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from collections import defaultdict
import string

# Function to compute word frequencies
def compute_word_frequencies(text):
    stop_words = set(stopwords.words('english'))
    words = word_tokenize(text)
    
    freq_table = defaultdict(int)
    for word in words:
        word = word.lower()
        if word not in stop_words and word not in string.punctuation:
            freq_table[word] += 1
    
    return freq_table

# Function to score sentences based on word frequencies
def score_sentences(sentences, freq_table):
    sentence_scores = defaultdict(int)
    
    for sentence in sentences:
        words = word_tokenize(sentence.lower())
        for word in words:
            if word in freq_table:
                sentence_scores[sentence] += freq_table[word]
    
    return sentence_scores

# Function to generate summary
def generate_summary(text, n):
    sentences = sent_tokenize(text)
    word_frequencies = compute_word_frequencies(text)
    sentence_scores = score_sentences(sentences, word_frequencies)
    
    # Get the top n sentences with the highest scores
    summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:n]
    
    return ' '.join(summary_sentences)

# User input
text = input("Enter the text you want to summarize: ")
num_sentences = int(input("Enter the number of sentences you want in the summary: "))

# Generate summary
summary = generate_summary(text, num_sentences)

# Print summary
print("\nSummary:")
print(summary)

Enter the text you want to summarize:  A UN agreement from 1967 says no nation can own the Moon. Instead, the fantastically named Outer Space Treaty says it belongs to everyone, and that any exploration has to be carried out for the benefit of all humankind and in the interests of all nations. While it sounds very peaceful and collaborative - and it is - the driving force behind the Outer Space Treaty wasn’t cooperation, but the politics of the Cold War. As tensions grew between the US and Soviet Union after World War Two, the fear was that space could become a military battleground, so the key part of the treaty was that no nuclear weapons could be sent into space. More than 100 nations signed up.
Enter the number of sentences you want in the summary:  4



Summary:
As tensions grew between the US and Soviet Union after World War Two, the fear was that space could become a military battleground, so the key part of the treaty was that no nuclear weapons could be sent into space. Instead, the fantastically named Outer Space Treaty says it belongs to everyone, and that any exploration has to be carried out for the benefit of all humankind and in the interests of all nations. While it sounds very peaceful and collaborative - and it is - the driving force behind the Outer Space Treaty wasn’t cooperation, but the politics of the Cold War. A UN agreement from 1967 says no nation can own the Moon.
