Skip to content

Latest commit

Β 

History

History
66 lines (41 loc) Β· 2.72 KB

README.md

File metadata and controls

66 lines (41 loc) Β· 2.72 KB

Research of Current Trends in Bioinformatics Software Sharing and Archiving

Software plays a vital role in modern scientific research, making it imperative to uphold both the accessibility and high quality of scientific software. Recognizing the significance of sustainable and reproducible science, Software Heritage (https://www.softwareheritage.org/) serves as a global archive for software preservation. This project focuses on examining the current trends in the development of bioinformatic software by gathering information from the abstracts of articles published on PubMed (https://pubmed.ncbi.nlm.nih.gov/). By utilizing the APIs of PubMed, GitHub, and Software Heritage, we collect diverse information regarding approximately 10,000 scientific software packages. Subsequently, our analysis aims to determine the proportion of archived software, assess the developmental dynamics, and evaluate the accessibility of software through the provided publication links. Furthermore, the workflow is implemented using Snakemake, facilitating the seamless initiation of the analysis from scratch.

GitHub last commit SWH

Create the environment

Clone the repository

Clone the repository:

git clone https://github.com/zhukovanadezhda/bioinformatics-software.git
cd bioinformatics-software

Setup the conda environment

Install miniconda and mamba. Create the bioinfosoft conda environment:

mamba env create -f binder/environment.yml

Load the environment

conda activate bioinfosoft

Remark: to deactivate an active environment, use:

conda deactivate

Get API keys

The workflow analysis requires API keys for PubMed, GitHub and Software Heritage.

To get API keys:

Create the file .env to store API keys in the following format:

GITHUB_TOKEN=...
PUBMED_TOKEN=...
SWH_TOKEN=...

Run the analysis

Run the analysis with the Snakemake workflow:

snakemake --cores 1 --use-conda

All the results will be stored in the data folder.