Unsupervised-Comment-based-Multi-document-Extractive-Summarization

This work proposes a novel multi-objective optimization-based framework for Unsupervised-Comment-based-Multi-document-Extractive-Summarization. A subset of relevant news sentences will be automatically selected from an available set of sentences by utilizing the user-comments. Different statistical quality functions measuring various aspects of summary, namely, diversity, user attention score, density based score and, user-attention with syntactic score, are optimized simultaneously using the search capability of a multi-objective binary differential evolution technique.

Input Files:

WMD matrix which is the distance matrix having tweet to tweet distance in semantic space [Line-28]
Reader Attention score of news sentence [Line-38]
Density based score of news sentence [Line-49]
Reader Attention with syntatcic score of news sentence [Line-60]
Length of news sentences [Line 84]
Original set of news sentences [Line 96]
Reference/Actual/Gold summaries [Line-123, Line-152, Line-177, Line-202]

Note: All the above input files are present in the preprocessing directory.

Embeddings Used:

For word embedding word2vec model is used.

For English datase, 'word2vec-google-news-30' is used. https://github.com/RaRe-Technologies/gensim-data/releases/tag/word2vec-google-news-300
For French dataset, 'frWac non lem no postag no 200 cbow cut0' is used. https://fauconnier.github.io/

User input:

Since, the code is automated for multiple topics, you have to update the below values before running the main program.

Population size [Line 229]
Mating pool size [Line 232]
Minimum number of tweets to be in the summary [Line 237]
Maximum number of tweets to be in the summary [Line 240]
Maximum number of generations [Line 243]

Output Files:

Folder ‘generation_wise_details’: It includes summaries obtained for each solution in the population + Rouge scores for each summary
Folder ‘Pareto_front’: It include Pareto fronts obtained at the end of each generation.
Files: (a) ‘Annotator1_solutionwise_summary_score_overview’, (b) ‘Annotator2_solutionwise_summary_score_overview’, (c) ‘Annotator3_solutionwise_summary_score_overview’ (d) ‘Annotator4_solutionwise_summary_score_overview’ These files contains gold summaries scores corresponding to each solution in the final population (at the end of the execution) (e) Plots: i) ‘Generation_wise_Objective_values’: It shows the maximum values of objective functions at each generation. ii) ‘New Sols_vs_Generations’: It shows the number of new good solutions obtained at the end of each generation. iii) ‘Generation Wise Rouge score’: It shows the maximum ROUGE score values (obtained using the gold summary) at each generation.

How to Run:

Install Python version: 3.6
Create a text file and provide all the topics names in that file separated by '\n' and provide the text file path in [Line 618]. All the outputs will be stored in the output folder in the folder with the same topic name you have provided in the input text file.
To run the program, go to ‘examples’ folder and run the file ‘comment_based_summarization_main.py’ and give the required number parameters before running the program. Note that there we have utilized 2 datasets one belonging to English language and another belonging to French language. For the testing purpose result of only 3 topics out of 45 topics of the english dataset are present. For running the code on french dataset execute 'french_dataset_comment_based_summarization_main.py' and provide the path of all the required input files to the program.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
Objective_functions		Objective_functions
Output		Output
SMEA		SMEA
create_children		create_children
english_dataset		english_dataset
examples		examples
french_dataset		french_dataset
preprocessing		preprocessing
README.md		README.md
french_topics_all.csv		french_topics_all.csv
french_topics_test.csv		french_topics_test.csv
topics_all.txt		topics_all.txt
topics_test.txt		topics_test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Objective_functions

Objective_functions

Output

Output

SMEA

SMEA

create_children

create_children

english_dataset

english_dataset

examples

examples

french_dataset

french_dataset

preprocessing

preprocessing

README.md

README.md

french_topics_all.csv

french_topics_all.csv

french_topics_test.csv

french_topics_test.csv

topics_all.txt

topics_all.txt

topics_test.txt

topics_test.txt

Repository files navigation

Unsupervised-Comment-based-Multi-document-Extractive-Summarization

Input Files:

Embeddings Used:

User input:

Output Files:

How to Run:

About

Releases

Packages

Languages

vishalsinghroha/Unsupervised-Comment-based-Multi-document-Extractive-Summarization

Folders and files

Latest commit

History

Repository files navigation

Unsupervised-Comment-based-Multi-document-Extractive-Summarization

Input Files:

Embeddings Used:

User input:

Output Files:

How to Run:

About

Resources

Stars

Watchers

Forks

Languages