-
Notifications
You must be signed in to change notification settings - Fork 7
Sequence similarity algorithms
To enhance the algorithm's efficiency, an additional step has been introduced. This step filters alignment windows, retaining only those with a dissimilarity rate surpassing the user-defined threshold specified in the parameter YAML file. These retained windows represent the most significant sequence alignments.
Edit-based algorithms, often referred to as distance-based algorithms, quantify the minimum number of single-character operations (insertions, deletions, or substitutions) needed to convert one sequence into another. The greater the number of operations required, the lower the similarity or greater the distance between the sequences. In the field of bioinformatics, these algorithms play a crucial role in tasks related to phylogeny and phylogeography, enabling the comparison of genetic sequences and their evolutionary relationships. They serve as a fundamental metric for various sequence similarity techniques and find widespread application in bioinformatics tasks such as DNA sequence alignment, phylogenetic analysis, and molecular evolution studies.
The Hamming distance, denoted as
The Levenshtein distance, denoted as
The Damerau-Levenshtein distance, denoted as
Jaro similarity, denoted as
Jaccard similarity, denoted as
Sørensen-Dice similarity, also known as Dice coefficient, is another measure for comparing the similarity between two sets. It considers the ratio of twice the size of the intersection to the sum of the sizes of the two sets.
Jaro-Winkler similarity, denoted as
Smith–Waterman similarity, denoted as
Please email us at: Nadia.Tahiri@USherbrooke.ca for any questions or feedback.
Wiki
Available analyses
Misc