Research of Current Trends in Bioinformatics Software Sharing and Archiving

Software plays a vital role in modern scientific research, making it imperative to uphold both the accessibility and high quality of scientific software. Recognizing the significance of sustainable and reproducible science, Software Heritage (https://www.softwareheritage.org/) serves as a global archive for software preservation. This project focuses on examining the current trends in the development of bioinformatic software by gathering information from the abstracts of articles published on PubMed (https://pubmed.ncbi.nlm.nih.gov/). By utilizing the APIs of PubMed, GitHub, and Software Heritage, we collect diverse information regarding approximately 10,000 scientific software packages. Subsequently, our analysis aims to determine the proportion of archived software, assess the developmental dynamics, and evaluate the accessibility of software through the provided publication links. Furthermore, the workflow is implemented using Snakemake, facilitating the seamless initiation of the analysis from scratch.

Create the environment

Clone the repository

Clone the repository:

git clone https://github.com/zhukovanadezhda/bioinformatics-software.git
cd bioinformatics-software

Setup the conda environment

Install miniconda and mamba. Create the bioinfosoft conda environment:

mamba env create -f binder/environment.yml

Load the environment

conda activate bioinfosoft

Remark: to deactivate an active environment, use:

conda deactivate

Get API keys

The workflow analysis requires API keys for PubMed, GitHub and Software Heritage.

To get API keys:

For PubMed, go at the bottom of the NCBI Account Settings page.
For GitHub, go on the Personnal access tokens page of your account. There is not need to select specific scopes.

Create the file .env to store API keys in the following format:

GITHUB_TOKEN=...
PUBMED_TOKEN=...
SWH_TOKEN=...

Run the analysis

Run the analysis with the Snakemake workflow:

snakemake --cores 1 --use-conda

All the results will be stored in the data folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Research of Current Trends in Bioinformatics Software Sharing and Archiving

Create the environment

Clone the repository

Setup the conda environment

Load the environment

Get API keys

Run the analysis

Files

README.md

Latest commit

History

README.md

File metadata and controls

Research of Current Trends in Bioinformatics Software Sharing and Archiving

Create the environment

Clone the repository

Setup the conda environment

Load the environment

Get API keys

Run the analysis