# Document Summarisation

## Import Data

In [137]:
report_url = "http://hansardpublic.parliament.sa.gov.au/Pages/HansardResult.aspx#/docid/HANSARD-10-25756"
report_text = "When I tried to get a copy of the commissioner's report after being tabled, why was I basically told that there was a very limited— The PRESIDENT: This is a matter of personal explanation in a supplementary. Just please, the Hon. Mr Wortley, ask your supplementary. The Hon. R.P. WORTLEY: Why weren't all members of parliament given a copy of the royal commission's report? The Hon. D.W. Ridgway: But you told us before you never read reports. The Hon. R.I. LUCAS (Treasurer) (15:26): Mr President, I won't go down that particular path, as delicious as that interjection might have been in relation to the Hon. Mr Wortley saying he couldn't trust himself to read his own reports. I don't know why the Hon. Mr Wortley was unable to get a copy of the royal commission report. It was certainly publicly available. If it pleases the member, I will see whether there is not a spare copy somewhere. If we do find a spare copy and give it to him, I will be asking questions afterwards of the honourable member just to make sure he did read it. The Hon. D.W. Ridgway: Do you want it delivered to Scuzzi or something more convenient for you? The PRESIDENT: Are you finished, the Hon. Mr Ridgway? The Hon. R.P. WORTLEY: You just worry about our trade exports, mate, for the state. The PRESIDENT: The Hon. Mr Wortley, I am waiting patiently here to give you the call for your question. Have you finished your private conversation with the Hon. Mr Ridgway? Yes? The Hon. Mr Wortley."
report_title = "Murray-Darling Basin Royal Commission"
ai_text = "In an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has been launched to empower the next generation of students with AI-ready skills. Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services. As part of the program, the Redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in India with the program will set up the core AI infrastructure and IoT Hub for the selected campuses. The company will provide AI development tools and Azure AI services such as Microsoft Cognitive Services, Bot Services and Azure Machine Learning. According to Manish Prakash, Country General Manager-PS, Health and Education, Microsoft India, said, \"With AI being the defining technology of our time, it is transforming lives and industry and the jobs of tomorrow will require a different skillset. This will require more collaborations and training and working with AI. That’s why it has become more critical than ever for educational institutions to integrate new cloud and AI technologies. The program is an attempt to ramp up the institutional set-up and build capabilities among the educators to educate the workforce of tomorrow.\" The program aims to build up the cognitive skills and in-depth understanding of developing intelligent cloud connected solutions for applications across industry. Earlier in April this year, the company announced Microsoft Professional Program In AI as a learning track open to the public. The program was developed to provide job ready skills to programmers who wanted to hone their skills in AI and data science with a series of online courses which featured hands-on labs and expert instructors as well. This program also included developer-focused AI school that provided a bunch of assets to help build AI skills."

In [17]:
# Import data from spreadsheet
import pandas as pd
import numpy as np

data = pd.read_excel ("..\\data\\Hansard1102019.xlsx", sheet_name="Text")
df = pd.DataFrame(data, columns= ['HansardID','Text'])
df = df.astype({"HansardID":'str', "Text":'str'}) 

#df.dtypes 
df.head(2)

Unnamed: 0,HansardID,Text
0,HANSARD-10-26147.xml,Climate Change
1,HANSARD-10-26147.xml,The Hon. M.C. PARNELL (14:55): I seek leave t...


In [None]:
#hansardFilesInfo = pd.read_excel ("..\\data\\HANSARDfullDataset.xlsx", sheet_name="HANSARDFilesInfo")
#hansardFilesInfo = pd.DataFrame(hansardFilesInfo, columns= ['FileName','URL'])
#hansardFilesInfo = hansardFilesInfo.astype({"FileName":'str', "URL":'str'}) 
#hansardFilesInfo.head(2)

In [None]:
#header = pd.read_excel ("..\\data\\HANSARDfullDataset.xlsx", sheet_name="header")
#header = pd.DataFrame(header)
#header.head(2)

In [None]:
#bill = pd.read_excel ("..\\data\\HANSARDfullDataset.xlsx", sheet_name="bill")
#bill = pd.DataFrame(bill, columns= ['question','bname'])
#bill.head(2)

In [18]:
# Group text into one document
grouped_text = df.groupby('HansardID')['Text'].agg(lambda col: '. '.join(col))
grouped_text_df = pd.DataFrame(grouped_text, columns= ['Text'])
grouped_text_df.head(5)

Unnamed: 0_level_0,Text
HansardID,Unnamed: 1_level_1
HANSARD-10-21289.xml,Bills. Children and Young People (Safety) Bill...
HANSARD-10-21290.xml,Sentencing Bill. Assent. His Excellency the Go...
HANSARD-10-21291.xml,Statutes Amendment (Possession of Firearms and...
HANSARD-10-21292.xml,Public Interest Disclosure Bill. Conference. T...
HANSARD-10-21310.xml,Bills. Statutes Amendment (Heavy Vehicles Regi...


In [19]:
grouped_text.iloc[4].replace("..",".")

"Bills. Statutes Amendment (Heavy Vehicles Registration Fees) Bill. Second Reading. Adjourned debate on second reading. (Continued from 1 June 2017.). The Hon. J.M.A. LENSINK (16:35):  I rise to indicate opposition support for this bill, which is part of the harmonisation of national laws which relate to heavy vehicles. In particular, this bill amends the Highways Act and the Motor Vehicles Act so that South Australia can meet its obligations under the Heavy Vehicle National Law (South Australia) Act 2013, which contains the national law as a schedule. This bill provides for the creation of a national heavy vehicle regulator. For the benefit of readers, heavy vehicles are defined as trucks with a gross vehicle mass of 4.5 tonnes or more. The section relating to registration of the national law has not commenced yet, so heavy vehicle registrations remain under state legislation; however, those jurisdictions which are participants, which I understand includes all states and territories e

## Basic Text Analytics

### Word Count

In [20]:
import re

# Count number of words in sentence using regex
grouped_text_df['WordCount'] = grouped_text_df.apply(lambda x: len(re.findall(r'\w+', x.Text)), axis=1)
grouped_text_df.head(2)

Unnamed: 0_level_0,Text,WordCount
HansardID,Unnamed: 1_level_1,Unnamed: 2_level_1
HANSARD-10-21289.xml,Bills. Children and Young People (Safety) Bill...,16
HANSARD-10-21290.xml,Sentencing Bill. Assent. His Excellency the Go...,11


### Length of Sentences

In [73]:
from statistics import median

all_text = df['Text'].agg(lambda col: ''.join(col))
all_text = ". ".join(all_text)

# Average number of words in a sentence
parts = [len(l.split()) for l in re.split(r'[?!.] ', ' '.join(all_text)) if l.strip()]
print("Average = ", sum(parts)/len(parts))

# Median number of words in a sentence
print("Median = ", median(parts))
print("Min = ", min(parts))
print("Max = ", max(parts))
print("Q1 = ", np.percentile(parts,25))
print("Q3 = ", np.percentile(parts,75))

Average =  70.2002116730459
Median =  46
Min =  1
Max =  5479
Q1 =  6.0
Q3 =  110.0


In [127]:
# Number of sentences
total_sentences = len(parts)
less_than_five = sum(i < 5 for i in parts)
greater_than_50 = sum(i > 50 for i in parts)
greater_than_100 = sum(i > 100 for i in parts)
greater_than_200 = sum(i > 200 for i in parts)
greater_than_1000 = sum(i > 1000 for i in parts)

print("Number of sentences = ", total_sentences)
print("Number of sentences with less than 5 words =", less_than_five, ":", (less_than_five / total_sentences) * 100)
print("Number of sentences with more than 50 words =", greater_than_50, ":", (greater_than_50 / total_sentences) * 100)
print("Number of sentences with more than 100 words =", greater_than_100, ":", (greater_than_100 / total_sentences) * 100)
print("Number of sentences with more than 200 words =", greater_than_200, ":", (greater_than_200 / total_sentences) * 100)
print("Number of sentences with more than 1000 words =", greater_than_1000, ":", (greater_than_1000 / total_sentences) * 100)

remaining_sentences = total_sentences - less_than_five - greater_than_200
print("Remaining sentences [5, 200] =", remaining_sentences, ":", remaining_sentences/total_sentences)

remaining_sentences = total_sentences - less_than_five - greater_than_100
print("Remaining sentences [5, 100] =", remaining_sentences, ":", remaining_sentences/total_sentences)

remaining_sentences = total_sentences - less_than_five - greater_than_50
print("Remaining sentences [5, 50] =", remaining_sentences, ":", remaining_sentences/total_sentences)

Number of sentences =  470537
Number of sentences with less than 5 words = 83091 : 17.658760097505617
Number of sentences with more than 50 words = 224979 : 47.81324316684979
Number of sentences with more than 100 words = 132153 : 28.085570316468207
Number of sentences with more than 200 words = 30845 : 6.555276205696896
Number of sentences with more than 1000 words = 18 : 0.0038254164922205906
Remaining sentences [5, 200] = 356601 : 0.7578596369679749
Remaining sentences [5, 100] = 255293 : 0.5425566958602618
Remaining sentences [5, 50] = 162467 : 0.34527996735644595


### Trim Tags

In [98]:
import re
s = "1.&#x9;TEST"
s = re.sub(r'[0-9]?.?&#x9;','', s) 
s

'TEST'

## Document Summarisation

Articles and libraries to look into further: 
* https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/
* https://stackabuse.com/text-summarization-with-nltk-in-python/
* https://github.com/alanbuxton/PyTeaserPython3
* https://github.com/abisee/pointer-generator
* https://github.com/DerwenAI/pytextrank
* https://github.com/tensorflow/models/tree/master/research/textsum
* https://radimrehurek.com/gensim/models/lsimodel.html
* https://towardsdatascience.com/text-summarization-in-python-76c0a41f0dc4 (additional links to articles at the end)

### Feature Base Text Summarisation

In [None]:
# The feature base model extracts the features of the sentence, then evaluate its importance
# Feature base text summarization by TextTeaser
#from pyteaser import SummarizeUrl
#url = 'http://www.huffingtonpost.com/2013/11/22/twitter-forward-secrecy_n_4326599.html'
#summaries = SummarizeUrl(url)
#print summaries

In [None]:
# TextTeasor - automatic summarization algorithm that combines the power of natural language processing and machine learning
#from textteaser import TextTeaser
#tt = TextTeaser()
#tt.summarize(title, text)

### Topic Model Summary

In [None]:
# Topic Model summarisation
from gensim.test.utils import common_dictionary, common_corpus
from gensim.models import LsiModel
model = LsiModel(common_corpus, id2word=common_dictionary)
vectorized_corpus = model[common_corpus]
#print(vectorized_corpus)
#model.print_topics(1)

### Truncated Sentences

In [36]:
def smart_truncate(content, length=200, suffix='...'):
    if len(content) <= length:
        return content
    else:
        return ' '.join(content[:length+1].split(' ')[0:-1]) + suffix

smart_truncate(report_text)

"When I tried to get a copy of the commissioner's report after being tabled, why was I basically told that there was a very limited— The PRESIDENT: This is a matter of personal explanation in a..."

In [37]:
grouped_text_df['Truncated'] = grouped_text_df.apply(lambda x: smart_truncate(x.Text), axis=1)
grouped_text_df.head(5)

Unnamed: 0_level_0,Text,WordCount,Truncated
HansardID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HANSARD-10-21289.xml,Bills. Children and Young People (Safety) Bill...,16,Bills. Children and Young People (Safety) Bill...
HANSARD-10-21290.xml,Sentencing Bill. Assent. His Excellency the Go...,11,Sentencing Bill. Assent. His Excellency the Go...
HANSARD-10-21291.xml,Statutes Amendment (Possession of Firearms and...,18,Statutes Amendment (Possession of Firearms and...
HANSARD-10-21292.xml,Public Interest Disclosure Bill. Conference. T...,69,Public Interest Disclosure Bill. Conference. T...
HANSARD-10-21310.xml,Bills. Statutes Amendment (Heavy Vehicles Regi...,327,Bills. Statutes Amendment (Heavy Vehicles Regi...


### Extract Keywords

In [None]:
# https://rare-technologies.com/text-summarization-with-gensim/
#
from gensim.summarization import keywords

def get_keywords(text):
    keyword_list = keywords(text, split=True, lemmatize=True, deacc=True)
    return ', '.join(keyword_list[:10])

grouped_text_df['KeyWords'] = grouped_text_df.apply(lambda x: 
                                                    get_keywords(x.Text), 
                                                    axis=1)
grouped_text_df.head(5)

In [None]:
get_keywords(report_text)

### PyTextRank

In [None]:
# https://github.com/DerwenAI/pytextrank
# https://github.com/DerwenAI/pytextrank/blob/master/example.ipynb
# Requires JSON input

### TextRank Summary

In [None]:
from gensim.summarization.summarizer import summarize

#  TextRank summarization with default parameters
grouped_text_df['TextRank'] = grouped_text_df.apply(lambda x: 
                                                    summarize(x.Text).replace("\n"," ").replace("..","."), 
                                                    axis=1)

#  TextRank summarization with no more than 50 words for the summary
grouped_text_df['TextRank50'] = grouped_text_df.apply(lambda x: 
                                                      summarize(x.Text, word_count = 50).replace("\n"," ").replace("..","."), 
                                                      axis=1)
grouped_text_df.head(5)

In [None]:
print(summarize(report_text)) #  TextRank summarization

In [None]:
print(summarize(report_text, word_count = 50)) #  TextRank summarization - no more than 50 words for summary

In [None]:
print(summarize(report_text, ratio = 0.2)) #  TextRank summarization - use no more than 20% of original text for summary

In [None]:
summarize(grouped_text.iloc[4], word_count = 50)

In [None]:
summarize(grouped_text.iloc[2], word_count = 50)

In [None]:
def generate_text_rank_summary(text):
    # Return original text if less than 200 characters long
    if len(text) <= 200:
        return text
    
    sentences = []    
    text_sentences = re.split(r'[?!.] ', text)
    
    for sentence in text_sentences:
        processed = sentence.replace("[^a-zA-Z]", " ")
        word_count = len(re.findall(r'\w+', processed)) 
        if word_count > 1: # Include sentences with more than one word
            sentences.append(processed)
    
    summary = summarize('. '.join(sentences))             
    return summary.replace("\n"," ").replace("..",".")


In [None]:
#  TextRank summarisation with no more than 50 words for the summary
grouped_text_df['TextRankProcessed'] = grouped_text_df.apply(lambda x: generate_text_rank_summary(x.Text), axis=1)
grouped_text_df.head(5)

In [None]:
generate_text_rank_summary(grouped_text.iloc[4])

In [None]:
generate_text_rank_summary(grouped_text.iloc[2])

In [None]:
generate_text_rank_summary(report_text)

In [148]:
# https://towardsdatascience.com/understand-text-summarization-and-create-your-own-summarizer-in-python-b26a9f09fc70
# Approach uses TextRank algorithm
# TextRank does not rely on any previous training data and can work with any arbitrary piece of text. 
# TextRank is a general purpose graph-based ranking algorithm for NLP

# Import all necessary libraries
import os
import nltk
from nltk.corpus import stopwords
from nltk.cluster.util import cosine_distance
from nltk.stem import WordNetLemmatizer
from nltk.stem import SnowballStemmer # is based on The Porter Stemming Algorithm
import numpy as np
import networkx as nx
import re 
import string

nltk.download('stopwords')
nltk.download('wordnet')

# Generate clean sentences
def create_long_sentences(article, output=True):
    sentences = []
    
    for sentence in article:
        sentence = re.sub(r'[0-9]?.?&#x9;','', sentence) # Remove tags
        word_count = len(re.findall(r'\w+', sentence)) 
        if 4 < word_count <= 100: # Include sentences with 5 to 100 words
            sentences.append(sentence.split(" "))
        if output: 
            print(sentence, ": words = ", word_count)
        
    return sentences   


# Process word by converting to lowercase, removing punctuation, lemmatising, and stemming. 
# https://medium.com/@pemagrg/pre-processing-text-in-python-ad13ea544dae
def process_word(word):
    
    wordnet_lemmatizer = WordNetLemmatizer()
    snowball_stemmer = SnowballStemmer('english')
    
    word = word.lower() # convert to lowercase
    word = word.translate(str.maketrans('', '', string.punctuation)) # remove punctuation 
    word = wordnet_lemmatizer.lemmatize(word) # lemmatise words
    word = snowball_stemmer.stem(word) # stemming
    
    return word


# Process sentence by converting to lowercase, removing punctuation, lemmatising, stemming and removing stopwords. 
# Sentence must be a list of words that make up the sentence
# https://medium.com/@pemagrg/pre-processing-text-in-python-ad13ea544dae
def process_sentence(sentence):
    
    # Process words in sentence
    processed_sentence = [process_word(word) for word in sentence]
    
    # Remove stopwords
    stop_words = stopwords.words('english')
    processed_sentence = [word for word in processed_sentence if word not in stop_words]
    
    return processed_sentence


# Similarity matrix
def sentence_similarity(sent1, sent2):
        
    all_words = list(set(sent1 + sent2)) 
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
 
    # build the vector for the first sentence
    for w in sent1:
        vector1[all_words.index(w)] += 1
 
    # build the vector for the second sentence
    for w in sent2:
        vector2[all_words.index(w)] += 1
    
    # Vectors must be non-zero to calculate cosine similarity
    if np.count_nonzero(vector1) == 0 or np.count_nonzero(vector2) == 0:
        return 1
 
    return 1 - cosine_distance(vector1, vector2)


def build_similarity_matrix(sentences):
    # Create an empty similarity matrix
    similarity_matrix = np.zeros((len(sentences), len(sentences)))
    
    processed_sentences = [process_sentence(sentence) for sentence in sentences]
     
    for idx1 in range(len(processed_sentences)):
        for idx2 in range(len(processed_sentences)):
            if idx1 == idx2: # ignore if both are same sentences
                continue 
            sentences_similarity = sentence_similarity(processed_sentences[idx1], processed_sentences[idx2])
            similarity_matrix[idx1][idx2] = sentences_similarity

    return similarity_matrix


# Generate Summary Method
def generate_summary_from_file(file_path, top_n=5, output=True):
    # Read text and split it into sentences
    file = open(file_path, "r")
    file_data = file.readlines()
    sentences = re.split(r'[?!.] ', file_data[0])
    
    long_sentences = create_long_sentences(sentences, output)
    return generate_summary(long_sentences, top_n, output)


def generate_summary_from_text(text, top_n=5, output=True):    
    
    if len(text) <= 200:
        # If text is short, return the original text rather than a summary
        return text
    else:  
        # Read text and split it into sentences
        sentences = re.split(r'[?!.] ', text)
        long_sentences = create_long_sentences(sentences, output)
        return generate_summary(long_sentences, top_n, output)  
    

def generate_summary(sentences, top_n=5, output=True):
    summarize_text = []
    
    # Generate Similarly Matrix across sentences
    sentence_similarity_matrix = build_similarity_matrix(sentences)

    # Rank sentences in similarity matrix
    sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_matrix)
    try: 
        # https://stackoverflow.com/questions/13040548/networkx-differences-between-pagerank-pagerank-numpy-and-pagerank-scipy
        #scores = nx.pagerank(sentence_similarity_graph) # The eigenvector calculation is done by the power iteration method and has no guarantee of convergence
        scores = nx.pagerank_numpy(sentence_similarity_graph) # The eigenvector calculation uses NumPy’s interface to the LAPACK eigenvalue solvers. This will be the fastest and most accurate for small graphs.
        #scores = nx.pagerank_scipy(sentence_similarity_graph) # SciPy sparse-matrix implementation of the power-method
    except nx.NetworkXError:
        print ("NetworkXError")
        return ""
    except nx.PowerIterationFailedConvergence:
        print ("PowerIterationFailedConvergence")
        return ""
    
    # Sort the rank and pick top sentences
    ranked_sentences = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)
    if output:
        print("Indexes of top ranked_sentence order are ", ranked_sentences)    

    if len(ranked_sentences) < top_n:
        top_n = len(ranked_sentences)
        
    for i in range(top_n):
        summarize_text.append(" ".join(ranked_sentences[i][1]))

    # Output the summary text        
    return '. '.join(summarize_text).replace("..",".")


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\noaka\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\noaka\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [140]:
sentence = ['The', 'company', 'will', 'provide', 'AI', 'development', 'tools', 'and', 'Azure', 'AI', 'services', 'such', 'as', 'Microsoft', 'Cognitive', 'Services,', 'Bot', 'Services', 'and', 'Azure', 'Machine', 'Learning.According', 'to', 'Manish', 'Prakash,', 'Country', 'General', 'Manager-PS,', 'Health', 'and', 'Education,', 'Microsoft', 'India,', 'said,', '"With', 'AI', 'being', 'the', 'defining', 'technology', 'of', 'our', 'time,', 'it', 'is', 'transforming', 'lives', 'and', 'industry', 'and', 'the', 'jobs', 'of', 'tomorrow', 'will', 'require', 'a', 'different', 'skillset']
processed_sentence = process_sentence(sentence)
print(processed_sentence)

['compani', 'provid', 'ai', 'develop', 'tool', 'azur', 'ai', 'servic', 'microsoft', 'cognit', 'servic', 'bot', 'servic', 'azur', 'machin', 'learningaccord', 'manish', 'prakash', 'countri', 'general', 'managerp', 'health', 'educ', 'microsoft', 'india', 'said', 'ai', 'defin', 'technolog', 'time', 'transform', 'life', 'industri', 'job', 'tomorrow', 'requir', 'differ', 'skillset']


In [141]:
# let's begin
generate_summary_from_text(ai_text, 3, False)

'Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services. This program also included developer-focused AI school that provided a bunch of assets to help build AI skills. The program is an attempt to ramp up the institutional set-up and build capabilities among the educators to educate the workforce of tomorrow." The program aims to build up the cognitive skills and in-depth understanding of developing intelligent cloud connected solutions for applications across industry'

In [142]:
generate_summary_from_text(grouped_text.iloc[4], 3, False)

'This bill provides for the creation of a national heavy vehicle regulator. Statutes Amendment (Heavy Vehicles Registration Fees) Bill. In particular, this bill amends the Highways Act and the Motor Vehicles Act so that South Australia can meet its obligations under the Heavy Vehicle National Law (South Australia) Act 2013, which contains the national law as a schedule'

In [143]:
generate_summary_from_text(grouped_text.iloc[2], 3, True)

'Statutes Amendment (Possession of Firearms and Prohibited Weapons) Bill. Assent. His Excellency the Governor assented to the bill.'

In [144]:
generate_summary_from_text(report_text, 3, False)

"Mr Wortley was unable to get a copy of the royal commission report. WORTLEY: Why weren't all members of parliament given a copy of the royal commission's report. Mr Wortley saying he couldn't trust himself to read his own reports"

In [145]:
# Iterate over all text in data frame and add summary
import time
t0 = time.time()

grouped_text_df['TextRankCustom'] = grouped_text_df.apply(lambda x: generate_summary_from_text(x.Text, 3, False), axis=1)

t1 = time.time()
total_time = t1-t0
print("Time =", total_time)

Time = 5644.5772132873535


In [146]:
grouped_text_df.head(5)

Unnamed: 0_level_0,Text,WordCount,Truncated,TextRankCustom
HansardID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
HANSARD-10-21289.xml,Bills. Children and Young People (Safety) Bill...,16,Bills. Children and Young People (Safety) Bill...,Bills. Children and Young People (Safety) Bill...
HANSARD-10-21290.xml,Sentencing Bill. Assent. His Excellency the Go...,11,Sentencing Bill. Assent. His Excellency the Go...,Sentencing Bill. Assent. His Excellency the Go...
HANSARD-10-21291.xml,Statutes Amendment (Possession of Firearms and...,18,Statutes Amendment (Possession of Firearms and...,Statutes Amendment (Possession of Firearms and...
HANSARD-10-21292.xml,Public Interest Disclosure Bill. Conference. T...,69,Public Interest Disclosure Bill. Conference. T...,That the sitting of the council be not suspend...
HANSARD-10-21310.xml,Bills. Statutes Amendment (Heavy Vehicles Regi...,327,Bills. Statutes Amendment (Heavy Vehicles Regi...,This bill provides for the creation of a natio...


### Excel Output of Document Summaries

In [147]:
grouped_text_df.to_excel('.\\DocumentSummary.xlsx', sheet_name='TextRank', index=True)