EDEN: Multiscale Expected Density of Nucleotide Encoding for Enhanced DNA Sequence Classification with Hybrid Deep Learning

🧬 Abstract

Background: DNA sequences are fundamental carriers of genetic information. Accurate classification is essential for understanding gene regulation and disease mechanisms. Existing encoding methods often struggle to capture both local and long-range dependencies simultaneously.

Results: We introduce EDEN (Expected Density of Nucleotide Encoding), a unified multiscale encoding framework based on kernel density estimation. EDEN captures position-specific and context-dependent nucleotide patterns and integrates them into a hybrid deep learning architecture. Across sixteen benchmark datasets, EDEN achieves state-of-the-art performance with significantly fewer parameters than competing models.

Conclusions: EDEN provides an efficient, biologically informed representation for genomic sequence classification, demonstrating high practicality for large-scale applications.

🛠 Project Structure

The repository is modularized for academic reproducibility and clear separation of concerns:

models.py: Implementation of the Hybrid_CNN architecture (Dual-branch CNN).
utils.py: Core engine for Multiscale EDN Encoding, data loading, and evaluation metrics.
predict.py: Command-line interface (CLI) for performing inference on datasets.
datasets/: Genomic benchmark data in CSV format. All datasets used in this study are publicly available as part of the Genome Understanding Evaluation (GUE) benchmark. The benchmark datasets can be accessed at: https://huggingface.co/datasets/leannmlindsey/GUE
models/: Pretrained .pth weights for various genomic tasks.

🚀 Installation & Usage

1. Requirements

Install the necessary Python packages using conda/pip:

pip install torch pandas numpy scikit-learn

2. Running Inference (CLI)

The predict.py script allows you to run predictions for specific datasets directly from the terminal.

Basic Syntax:

python predict.py --dataset <dataset_folder_name> --limit <integer_value>

Examples:

Core Promoter Detection (TATA):

python predict.py --dataset human_prom_core_tata --limit 70

Transcription Factor Binding (TF0):

python predict.py --dataset human_tf0 --limit 100

📊 CLI Parameters Table

To run the CLI, use the following parameters:

Dataset Category	`--dataset` (Parameter)	`--limit` (Parameter)
Core Promoter	`human_prom_core_all`	70
Core Promoter	`human_prom_core_notata`	70
Core Promoter	`human_prom_core_tata`	70
Promoter (300bp)	`human_prom_300_all`	300
Promoter (300bp)	`human_prom_300_notata`	300
Promoter (300bp)	`human_prom_300_tata`	300
TF Binding (Human)	`human_tf0`	100
TF Binding (Human)	`human_tf1`	100
TF Binding (Human)	`human_tf2`	100
TF Binding (Human)	`human_tf3`	100
TF Binding (Human)	`human_tf4`	100
TF Binding (Mouse)	`mouse_tf0`	100
TF Binding (Mouse)	`mouse_tf1`	100
TF Binding (Mouse)	`mouse_tf2`	100
TF Binding (Mouse)	`mouse_tf3`	100
TF Binding (Mouse)	`mouse_tf4`	100

📧 Contact

Saman Zabihi
Email: szabihi@hotmail.com
GitHub: https://github.com/zabihis/EDEN

📚 Citation

If you use EDEN in your research, please cite our work:

Zabihi, S., Hashemi, S. & Mansoori, E. EDEN: multiscale expected density of nucleotide encoding for enhanced DNA sequence classification with hybrid deep learning. BMC Bioinformatics 27, 40 (2026). https://doi.org/10.1186/s12859-026-06367-6

or BibTeX entry:

@article{zabihi2026eden,
  title={EDEN: multiscale expected density of nucleotide encoding for enhanced DNA sequence classification with hybrid deep learning},
  author={Zabihi, S. and Hashemi, S. and Mansoori, E.},
  journal={BMC Bioinformatics},
  volume={27},
  number={40},
  year={2026},
  publisher={BioMed Central},
  doi={10.1186/s12859-026-06367-6}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EDEN: Multiscale Expected Density of Nucleotide Encoding for Enhanced DNA Sequence Classification with Hybrid Deep Learning

🧬 Abstract

🛠 Project Structure

🚀 Installation & Usage

1. Requirements

2. Running Inference (CLI)

📊 CLI Parameters Table

📧 Contact

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
datasets		datasets
models		models
LICENSE		LICENSE
README.md		README.md
models.py		models.py
predict.py		predict.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

EDEN: Multiscale Expected Density of Nucleotide Encoding for Enhanced DNA Sequence Classification with Hybrid Deep Learning

🧬 Abstract

🛠 Project Structure

🚀 Installation & Usage

1. Requirements

2. Running Inference (CLI)

📊 CLI Parameters Table

📧 Contact

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages