In [1]:
import nltk

## SUMMA Extractive Summarizer
Summarize the text using SUMMA extractive summarizer. This is used to find important sentences and useful sentences from the complete text.

In [2]:
encodings = ['utf-8', 'latin-1', 'ascii']
for encoding in encodings:
    try:
        with open('volcano.txt', encoding=encoding) as file:
            full_text = file.read()
            print(full_text)
        break
    except UnicodeDecodeError:
        pass

A volcano is a rupture in the crust of a planetary-mass object, such as Earth, that allows hot lava, volcanic ash, and gases to escape from a magma chamber below the surface.

On Earth, volcanoes are most often found where tectonic plates are diverging or converging, and most are found underwater. For example, a mid-ocean ridge, such as the Mid-Atlantic Ridge, has volcanoes caused by divergent tectonic plates whereas the Pacific Ring of Fire has volcanoes caused by convergent tectonic plates. Volcanoes can also form where there is stretching and thinning of the crust's plates, such as in the East African Rift and the Wells Gray-Clearwater volcanic field and Rio Grande rift in North America. Volcanism away from plate boundaries has been postulated to arise from upwelling diapirs from the core–mantle boundary, 3,000 kilometers (1,900 mi) deep in the Earth. This results in hotspot volcanism, of which the Hawaiian hotspot is an example. Volcanoes are usually not created where two tectonic 

In [3]:
from summa.summarizer import summarize
result = summarize(full_text)
summarized_text = ''.join(result)
print (summarized_text)

A volcano is a rupture in the crust of a planetary-mass object, such as Earth, that allows hot lava, volcanic ash, and gases to escape from a magma chamber below the surface.
On Earth, volcanoes are most often found where tectonic plates are diverging or converging, and most are found underwater.
For example, a mid-ocean ridge, such as the Mid-Atlantic Ridge, has volcanoes caused by divergent tectonic plates whereas the Pacific Ring of Fire has volcanoes caused by convergent tectonic plates.
Volcanoes can also form where there is stretching and thinning of the crust's plates, such as in the East African Rift and the Wells Gray-Clearwater volcanic field and Rio Grande rift in North America.
Most divergent plate boundaries are at the bottom of the oceans, and so most volcanic activity on the Earth is submarine, forming new seafloor.
When it does reach the surface, however, a volcano is formed.
However, rifting often fails to completely split the continental lithosphere (such as in an aul

## Keyword Extraction
Get important keywords from the text and filter those keywords that are present in the summarized text.

In [4]:
import yake

def get_nouns_multipartite(text):
    out=[]

    # Here no. of keywords ('top') can be increased/decreased to increase/decrease the no. of MCQs generated at the end.
    # n reprents n-words phrases
    kw_extractor = yake.KeywordExtractor(lan="en", n=2, top=20, features=None)

    # extracting keywords using MultipartiteRank
    keyphrases = kw_extractor.extract_keywords(text)

    for key in keyphrases:
        out.append(key[0])

    return out

keywords = get_nouns_multipartite(summarized_text) 
print ("Keywords extracted from the summarized text are")
print(keywords)

Keywords extracted from the summarized text are
['lava', 'volcanoes', 'volcano', 'volcanic', 'Shield volcanoes', 'Earth', 'tectonic plates', 'Main article', 'magma', 'submarine volcanoes', 'Main', 'eruptions', 'submarine', 'planetary-mass object', 'Shield', 'erupted', 'Pacific Ring', 'eruption', 'plates', 'silica']


## Sentence Mapping
For each keyword get the sentences from the summarized text containing that keyword. 

In [5]:
from nltk.tokenize import sent_tokenize
from flashtext import KeywordProcessor

def tokenize_sentences(text):
    sentences = [sent_tokenize(text)]
    sentences = [y for x in sentences for y in x]
    # Remove any short sentences less than 20 letters.
    sentences = [sentence.strip() for sentence in sentences if len(sentence) > 20]
    return sentences

def get_sentences_for_keyword(keywords, sentences):
    keyword_processor = KeywordProcessor()
    keyword_sentences = {}
    for word in keywords:
        keyword_sentences[word] = []
        keyword_processor.add_keyword(word)
    for sentence in sentences:
        keywords_found = keyword_processor.extract_keywords(sentence)
        for key in keywords_found:
            keyword_sentences[key].append(sentence)

    for key in keyword_sentences.keys():
        values = keyword_sentences[key]
        values = sorted(values, key=len, reverse=True)
        keyword_sentences[key] = values
    return keyword_sentences

sentences = tokenize_sentences(summarized_text)
keyword_sentence_mapping = get_sentences_for_keyword(keywords, sentences)
        
print (keyword_sentence_mapping)

{'lava': ['Vents that issue volcanic material (including lava and ash) and gases (mainly steam and magmatic gases) can develop anywhere on the landform and may give rise to smaller cones such as Puʻu ʻŌʻō on a flank of Kīlauea in Hawaii.', 'However, rifting often fails to completely split the continental lithosphere (such as in an aulacogen), and failed rifts are characterized by volcanoes that erupt unusual alkali lava or carbonatites.', 'Layers of lava emitted by the volcano\nStratovolcanoes (composite volcanoes) are tall conical mountains composed of lava flows and tephra in alternate layers, the strata that gives rise to the name.', 'Layers of lava emitted by the volcano\nStratovolcanoes (composite volcanoes) are tall conical mountains composed of lava flows and tephra in alternate layers, the strata that gives rise to the name.', 'The most common perception of a volcano is of a conical mountain, spewing lava and poisonous gases from a crater at its summit; however, this describes 

## Generate MCQ
Get distractors (wrong answer choices) from Wordnet/Conceptnet and generate MCQ Questions.

In [6]:
import requests
import json
import re
import random
from pywsd.similarity import max_similarity
from pywsd.lesk import adapted_lesk
from pywsd.lesk import simple_lesk
from pywsd.lesk import cosine_lesk
from nltk.corpus import wordnet as wn

# Distractors from Wordnet
def get_distractors_wordnet(syn,word):
    distractors=[]
    word= word.lower()
    orig_word = word
    if len(word.split())>0:
        word = word.replace(" ","_")
    hypernym = syn.hypernyms()
    if len(hypernym) == 0: 
        return distractors
    for item in hypernym[0].hyponyms():
        name = item.lemmas()[0].name()
        #print ("name ",name, " word",orig_word)
        if name == orig_word:
            continue
        name = name.replace("_"," ")
        name = " ".join(w.capitalize() for w in name.split())
        if name is not None and name not in distractors:
            distractors.append(name)
    return distractors

def get_wordsense(sent,word):
    word= word.lower()
    
    if len(word.split())>0:
        word = word.replace(" ","_")
    
    
    synsets = wn.synsets(word,'n')
    if synsets:
        wup = max_similarity(sent, word, 'wup', pos='n')
        adapted_lesk_output =  adapted_lesk(sent, word, pos='n')
        lowest_index = min (synsets.index(wup),synsets.index(adapted_lesk_output))
        return synsets[lowest_index]
    else:
        return None

# Distractors from http://conceptnet.io/
def get_distractors_conceptnet(word):
    word = word.lower()
    original_word= word
    if (len(word.split())>0):
        word = word.replace(" ","_")
    distractor_list = [] 
    url = "http://api.conceptnet.io/query?node=/c/en/%s/n&rel=/r/PartOf&start=/c/en/%s&limit=5"%(word,word)
    obj = requests.get(url).json()

    for edge in obj['edges']:
        link = edge['end']['term'] 

        url2 = "http://api.conceptnet.io/query?node=%s&rel=/r/PartOf&end=%s&limit=10"%(link,link)
        obj2 = requests.get(url2).json()
        for edge in obj2['edges']:
            word2 = edge['start']['label']
            if word2 not in distractor_list and original_word.lower() not in word2.lower():
                distractor_list.append(word2)
                   
    return distractor_list

key_distractor_list = {}

for keyword in keyword_sentence_mapping:
    if keyword_sentence_mapping[keyword]:
        wordsense = get_wordsense(keyword_sentence_mapping[keyword][0],keyword)
        if wordsense:
            distractors = get_distractors_wordnet(wordsense,keyword)
            if len(distractors) ==0:
                distractors = get_distractors_conceptnet(keyword)
            if len(distractors) != 0:
                key_distractor_list[keyword] = distractors
        else:

            distractors = get_distractors_conceptnet(keyword)
            if len(distractors) != 0:
                key_distractor_list[keyword] = distractors

index = 1
print ("#############################################################################")
print ("NOTE::::::::  Since the algorithm might have errors along the way, wrong answer choices generated might not be correct for some questions. ")
print ("#############################################################################\n\n")
for each in key_distractor_list:
    sentence = keyword_sentence_mapping[each][0]
    pattern = re.compile(each, re.IGNORECASE)
    output = pattern.sub( " _______ ", sentence)
    print ("%s)" % (index), output)
    choices = [each.capitalize()] + key_distractor_list[each]
    top4choices = choices[:4]
    random.shuffle(top4choices)
    optionchoices = ['a', 'b', 'c', 'd']
    for idx, choice in enumerate(top4choices):
        if choice == each.capitalize():
            print("\t", optionchoices[idx], ")", "\033[1m", choice, "\033[0m")
        else:
            print("\t", optionchoices[idx], ")", choice)
    print ("\nMore options:", choices[4:], "\n\n")
    index = index + 1
    


Warming up PyWSD (takes ~10 secs)... took 7.462974548339844 secs.


#############################################################################
NOTE::::::::  Since the algorithm might have errors along the way, wrong answer choices generated might not be correct for some questions. 
#############################################################################


1) Vents that issue volcanic material (including  _______  and ash) and gases (mainly steam and magmatic gases) can develop anywhere on the landform and may give rise to smaller cones such as Puʻu ʻŌʻō on a flank of Kīlauea in Hawaii.
	 a ) [1m Lava [0m
	 b ) Agglomerate
	 c ) Basalt
	 d ) Amygdaloid

More options: ['Dacite', 'Tuff', 'Volcanic Glass'] 


2)  _______  can also form where there is stretching and thinning of the crust's plates, such as in the East African Rift and the Wells Gray-Clearwater volcanic field and Rio Grande rift in North America.
	 a ) [1m Volcanoes [0m
	 b ) Chink
	 c ) Chap
	 d ) Crevasse

More options: ['Fatigue Crack', 'Fault', 'Rift', 'Slit', 'Split', 'Vent']