Skip to content

zurlog/abs-embeddings-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abstracts Embeddings Evaluation

Made withJupyter Contributors Forks Issues

Data and Code for the paper:

Zurlo, G., Ronchieri, E. (2024). Abstracts Embeddings Evaluation: A Case Study of Artificial Intelligence and Medical Imaging for the COVID-19 Infection. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing - ICIAP 2023 Workshops. ICIAP 2023. Lecture Notes in Computer Science, vol 14365. Springer, Cham. https://doi.org/10.1007/978-3-031-51023-6_18


Summary

The SARS-CoV-2 pandemic triggered unprecedented research efforts across various disciplines. Notably, the field of artificial intelligence (AI) applied to medical imaging has been prominently involved. Given the scarcity of resources in facing this devious disease, AI-based tools have emerged as potentially valuable assets to be harnessed. Natural Language Processing (NLP) offers a means to expedite the analysis of scientific articles on a larger scale, and has long been recognized as a solution to mitigate information overload in biomedical research. Since the beginning of the pandemic, the natural language processing (NLP) community has been consistently addressing the needs of domain experts by applying cutting-edge methods to enhance comprehension and knowledge discovery.

The primary objective of this study is to assess the adequacy of commonly employed biomedical transformer-based models, trained on pre-pandemic corpora, in capturing the semantic features present in medical imaging literature. Concurrently, we aim to observe the potential advantage of continual and citation-informed pretraining on COVID-19 literature.

To accomplish this, we introduce a unique and independent test set specifically focused on the medical imaging domain. This novel dataset serves as a valuable resource for the extrinsic evaluation of contextual embeddings, comprising realistic text classification tasks based on 560 gold labels referred to two target variables: the clinical task and imaging modality.

Installation

This project depends on Python ($\geq$ 3.7). The project script can be installed via pip install . in the project root, i.e.:

git clone https://github.com/zurlog/abs-embeddings-eval
cd abs-embeddings-eval
pip install -e .

Contents

Notebooks in scripts/:

  • Embeddings_Extraction: Compute the abstracts embeddings from 15 BERT models. Kaggle Badge
  • Embeddings_Comparison_Modality: Metrics calculations in the prediction of the imaging modality employed.
  • Embeddings_Comparison_Task: Metrics calculations in the prediction of the clinical task.
  • Setup: Dependencies and utility functions.

Files in results/:

  • Modality_accuracy.csv and Modality_balanced_acc.csv : Results of the imaging modality prediction comparison.
  • Task_(primary)_accuracy.csv and Task_(primary_balanced_acc.csv : Results of the clinical task prediction comparison.
  • 📁 embeddings: Pre-computed vectors stored as serialized Pandas Series.

Files in data/:

  • subset_wlabels.csv : 560 records subset with gold labels.

3D Visualization

With the TensorBoard Embedding Projector, we graphically represented SPECTER embeddings against the corresponding labels. The interactive dashboard allows users to search for specific terms in abstracts, and highlights articles that are adjacent to each other in the embedding (low-dimensional) space. The user can choose and tune three popular dimensionality reduction methods (UMAP, T-SNE, PCA).

Embeddings 3D Visualization

Acknowledgments

References, Inspiration, Code Snippets, etc.

About

Code for the paper "Abstracts Embeddings Evaluation: A Case Study of Artificial Intelligence and Medical Imaging for the COVID-19 Infection"

Topics

Resources

Stars

Watchers

Forks