-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No mean distance calculated between samples #37
Comments
Dear @AndreaAguadoM, Can you please provide an example?
Make sure you specify the Best, |
Dear Sergei
I apologize for the delayed response; unfortunately, your previous message
got lost in the shuffle of incoming emails, and I mistakenly thought you
hadn't replied.
In response to your demand, here I provide an example below as requested:
{
"Actual comparisons performed" :0,
"Comparisons accounting for copy numbers " :0,
"Total comparisons possible" : 1,
"Links found" : 0,
"Maximum distance" : 0,
"Sequences" : 2,
"Mean distance" : -nan,
The primary issue I've encountered relates to the -l parameter threshold
setting. I've set the threshold to the minimum value of 1. Upon comparing
two sequences. It appears there is no overlap in any position.
Consequently, I still obtain this "nan" result.
On the other hand, in my experimentation, I used tn93 to compare two
sequences that, for the the most part differ (except for one common
nucleotide):
sample1
NNNNNNNTGGCGAVATGTCTAGTAGCCAGCTGTGATAAATGTCAGCAAAAAGGAGAAGCCATGCATGGACAAGTAGACTGTAGTCCAGGAATATGGCAACTAGATTGTACACACTTAGAAGACAAAATTATCCTGGTAGCAGTTCATGTAGCCAGTGGATATATAGAAGCAGAAGTTATTCCAGCAGAAACAGGGCAGGAAACAGCATACTTCATCCTAAAGTTAGCAGGAAGATGGCCAGTAAAAACAATACATACAGACAATGGTAGAAATTTTACCAGTAGTGCTGTGAAGGCAGCCTGTTGGTGGGCAGGGATCCAGCAGGAATTTGGAATTCCCTACAATCCCCAAAGTCAAGGAGTAGTAGAATCTATGAATAAAGAATTAAAGAAAATCATAGGACAAGTAAGAGATCAAGCTGAACATCTTAAGACAGCAGTACAAATGGCGGTGTTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGGGAGTACAGTGCAGGGGAAAGAATAATAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTATAAAAATTCAAAATTTCCGGGTTTATTACAGGGACAGCAGAGACCCAATTTGGAAAGGACCAGCAAAGCTGCTCTGGAAAGGTGAAGGGGCAGTAGTCATACAAGATAATAGTGAAATAAAAGTAGTGCCAAGAAGAAAAGCAAAGATCATTAGGGATTATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTAGACAGGATGAGGATTAGAACATGGAAGGCAAGTAGACNNNNNN
sample2
AAAAAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAAAAA
Nevertheless, I obtain the following distance as a result:
$tn93 -t 1 -l 1 -o tn93_${sample1}_${sample2}.txt
${sample1}_${sample2}_alignment.fasta
"Actual comparisons performed" :1,
"Comparisons accounting for copy numbers " :1,
"Total comparisons possible" : 1,
"Links found" : 1,
"Maximum distance" : 0.0025491,
"Sequences" : 2,
"Mean distance" : 0.0025491,
This 0.0025 value as a result is remarkably low. I am puzzled by this
outcome given the substantial dissimilarity in almost every nucleotide of
the sequences.
Your insights on this matter would be really appreciated. Thank you so much
in advance.
Best regards!
El mar, 5 dic 2023 a las 14:14, Sergei Pond ***@***.***>)
escribió:
… Dear @AndreaAguadoM <https://github.com/AndreaAguadoM>,
Can you please provide an example? nan will only arise if *no*
comparisons were performed, i.e. something like this occurs (Actual
comparisons performed = 0).
{
"Actual comparisons performed" :0,
"Comparisons accounting for copy numbers " :0,
"Total comparisons possible" : 10,
"Links found" : 0,
"Maximum distance" : 0,
"Sequences" : 5,
"Mean distance" : nan
...
Make sure you specify the -L argument to compare sequences that overlap
by fewer than the default 100 nucleotides as well (which is the case for
the example above).
Best,
Sergei
—
Reply to this email directly, view it on GitHub
<#37 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AW4354SFZKMXOKLLXYXODN3YH4M3FAVCNFSM6AAAAABAHHEGFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBQG43TAOBZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Dear @AndreaAguadoM, In default run mode, If you want to treat Best, |
Thank you so much! I've been noticing that when using this -a parameter adjustment (-a average), I obtain 1000 as resulting mean distance in some distance calculations. As far as I know, the Tamura-Nei distance has a range of values between 0 and 2. Why am I obtaining these results? Thanks in advance again! |
Dear @AndreaAguadoM,
It requires some serious data pathology, but it could occur. In fact, Line 756 in 728bb98
Best, |
Okay. thank you very much! Your response has been very helpful |
Hello!
My name is Andrea and I am bioinformatician from Spain. I have been using tn93 for a while, including it in some of the pipelines I am developing in order to analyze HIV sequences more effectively. I am trying to generate a distance matrix with lots of HIV-samples, and analyzing my results, I found some of the sequences pairs do not seem to have assigned a distance (as a result of the mean distance, I obtain -nan). How can this be possible if the t parameter value adjusted in my pipeline is 1?
Thanks in advance!
The text was updated successfully, but these errors were encountered: