Hallucination Reduction in Long Input Text Summarization

Hallucination in text summarization refers to the phenomenon where the model generates information that is not supported by the input source document. Hallucination poses significant obstacles to the accuracy and reliability of the generated summaries. In this paper, we aim to reduce hallucinated outputs or hallucinations in summaries of long-form text documents. We have used the PubMed dataset, which contains long scientific research documents and their abstracts. We have incorporated the techniques of data filtering and joint entity and summary generation (JAENS) in the fine-tuning of the Longformer Encoder-Decoder (LED) model to minimize hallucinations and thereby improve the quality of the generated summary. We have used the following metrics to measure factual consistency at the entity level: precision-source, and F1-target. Our experiments show that the fine-tuned LED model performs well in generating the paper abstract. Data filtering techniques based on some preprocessing steps reduce entity-level hallucinations in the generated summaries in terms of some of the factual consistency metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LED-Filtereing		LED-Filtereing
LED-Filtering-JAENS		LED-Filtering-JAENS
LED		LED
EntityScore.ipynb		EntityScore.ipynb
EntityScore.txt		EntityScore.txt
README.md		README.md
Test_EntityScore.csv		Test_EntityScore.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LED-Filtereing

LED-Filtereing

LED-Filtering-JAENS

LED-Filtering-JAENS

LED

LED

EntityScore.ipynb

EntityScore.ipynb

EntityScore.txt

EntityScore.txt

README.md

README.md

Test_EntityScore.csv

Test_EntityScore.csv

Repository files navigation

Hallucination Reduction in Long Input Text Summarization

About

Releases

Packages

Languages

tohidarehman/Hallucination-Reduction-Text-Summarization

Folders and files

Latest commit

History

Repository files navigation

Hallucination Reduction in Long Input Text Summarization

About

Resources

Stars

Watchers

Forks

Languages