Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No mean distance calculated between samples #37

Open
AndreaAguadoM opened this issue Dec 5, 2023 · 6 comments
Open

No mean distance calculated between samples #37

AndreaAguadoM opened this issue Dec 5, 2023 · 6 comments

Comments

@AndreaAguadoM
Copy link

Hello!
My name is Andrea and I am bioinformatician from Spain. I have been using tn93 for a while, including it in some of the pipelines I am developing in order to analyze HIV sequences more effectively. I am trying to generate a distance matrix with lots of HIV-samples, and analyzing my results, I found some of the sequences pairs do not seem to have assigned a distance (as a result of the mean distance, I obtain -nan). How can this be possible if the t parameter value adjusted in my pipeline is 1?

Thanks in advance!

@spond
Copy link
Member

spond commented Dec 5, 2023

Dear @AndreaAguadoM,

Can you please provide an example? nan will only arise if no comparisons were performed, i.e. something like this occurs (Actual comparisons performed = 0).

{
	"Actual comparisons performed" :0,
	"Comparisons accounting for copy numbers " :0,
	"Total comparisons possible" : 10,
	"Links found" : 0,
	"Maximum distance" : 0,
	"Sequences" : 5,
	"Mean distance" : nan
...

Make sure you specify the -L argument to compare sequences that overlap by fewer than the default 100 nucleotides as well (which is the case for the example above).

Best,
Sergei

@AndreaAguadoM
Copy link
Author

AndreaAguadoM commented Dec 11, 2023 via email

@spond
Copy link
Member

spond commented Dec 11, 2023

Dear @AndreaAguadoM,

In default run mode, N means "match everything". Sequences that comprise N will match any character at that position (distance 0).

If you want to treat N differently, you should adjust the -a command line argument. For example -a average.

Best,
Sergei

@AndreaAguadoM
Copy link
Author

Thank you so much! I've been noticing that when using this -a parameter adjustment (-a average), I obtain 1000 as resulting mean distance in some distance calculations. As far as I know, the Tamura-Nei distance has a range of values between 0 and 2. Why am I obtaining these results? Thanks in advance again!

@spond
Copy link
Member

spond commented Dec 14, 2023

Dear @AndreaAguadoM,

1000 is the upper bound that tn93 reports for all distances. Most genetic distances, including the TN93 distance, can range from 0 to ∞

It requires some serious data pathology, but it could occur. In fact, tn93 will "downgrade" to a K2P distance is the input data do not contain one of the four characters. That's because TN93 may become undefined in this case.

if (useK2P) {

Best,
Sergei

@AndreaAguadoM
Copy link
Author

Okay. thank you very much! Your response has been very helpful
Best,
Andrea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants