Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results isuue and running error #1668

Closed
fatima-akhtar113 opened this issue Nov 27, 2023 · 20 comments
Closed

results isuue and running error #1668

fatima-akhtar113 opened this issue Nov 27, 2023 · 20 comments

Comments

@fatima-akhtar113
Copy link

i cannot run my orthologue file in meme there are 11 sequences it is running fine in fel and slac
also can we say gene is under positive selection if there is selection on one codon or two how we interpret datamonkey results

@spond
Copy link
Member

spond commented Nov 27, 2023

Dear @fatima-akhtar113,

  1. I am afraid I can't help you unless you provide more information about the MEME analysis. If you ran in in Datamonkey, please include the URL for the results page.
  2. No, you cannot conclude that a gene is under selection if one or two sites are under selection. See https://academic.oup.com/mbe/article/32/5/1365/1134918. Use BUSTED to look for gene-level selection.
    image.

Best,
Sergei

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Nov 27, 2023 via email

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Nov 27, 2023 via email

@spond
Copy link
Member

spond commented Nov 27, 2023

Dear @fatima-akhtar113,

Like I said previosuly, if the goal is to identify selection at the level of a gene you should use BUSTED. However, the sequences you submitted to Datamonkey have not been properly aligned. Datamonkey will "pad" sequences of unequal lengths with ? at the end and this is what happened here (https://www.datamonkey.org/meme/656497ed1fdac30a835a1cd3/fasta)

Datamonkey requires codon-aware multiple sequence alignments. If you are not familiar with how to obtain those, you may want to take a look elsewhere, e.g. https://github.com/veg/hyphy-analyses/blob/master/codon-msa/README.md and #1477

I attach an aligned version of your data (using the codon-msa workflow I linked to above).

If you run it through BUSTED in HyPhy, like so

 hyphy busted --alignment /Users/sergei/Desktop/seqs.msa --tree neighbor-joining --starting-points 5 

You will get a significant result for positive selection (p ~ 0), but a very odd looking ω distribution

image

A dN/dS of 3000 is indicative of some pathologies with the data / model. For example here's one site which shows several multi-nucleotide substitutions

image

If you then run BUSTED with support for multiple hits (see https://academic.oup.com/mbe/article/40/7/msad150/7217158)

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa.gz --starting-points 5 --tree neighbor-joining --multiple-hits Double+Triple 

a very odd result is obtained

### Partition-level rates for multiple-hit substitutions
* rate at which 2 nucleotides are changed instantly within a single codon :   1.9304
* Corresponding fraction of substitutions : 45.463%
* rate at which 3 nucleotides are changed instantly within a single codon :   1.9649
* Corresponding fraction of substitutions :  5.696%

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.967     |    0.000    |       Not supported by data       |
|        Negative selection         |     0.999     |    0.000    |       Not supported by data       |
|      Diversifying selection       |    244.639    |   100.000   |                                   |

Having more than 50% of the substitutions occur due to multiple hits is very odd.

May I ask where these sequences come from? (unless they are simulated).

Best,
Sergei

seqs.msa.gz

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Nov 28, 2023 via email

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Nov 28, 2023 via email

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Nov 28, 2023 via email

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Nov 28, 2023 via email

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Nov 28, 2023 via email

@spond
Copy link
Member

spond commented Nov 28, 2023

Dear @fatima-akhtar113,

I am not sure what you mean by "reverse-translate". There is no 1-1 way to reverse translate a protein sequence because of redundant codons. For example, you can use any of the 6 available codons for Serine. You need to find the underlying CDS sequences for each corresponding species.

If you take your protein sequence for homo and use blastp on it, the following result

image

However, if you take the nucleotide sequence you provided and run blastn on it, you get complete nonsense.

image

This nucleotide sequence does not exist in nature.

You should instead pull out the corresponding CDS for each sequence, for example https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS8793.1 will give you the human sequences.

Best,
Sergei

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Nov 28, 2023 via email

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Nov 29, 2023 via email

@spond
Copy link
Member

spond commented Nov 29, 2023

Dear @fatima-akhtar113,

How you collect your data is really up to you, and depends on the problem at hand.
But based on what you describe, this seems sensbile. The database will have underlying CDS sequences for your proteins.

Best,
Sergei

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Jan 4, 2024 via email

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Jan 4, 2024 via email

@spond
Copy link
Member

spond commented Jan 4, 2024

Dear @fatima-akhtar113,

I am afraid I don't fully understand what you are asking. If you are including attachments, you should do it via a web-browser (not e-mail), because otherwise the attachments will be stripped out.

Best,
Sergei

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Jan 5, 2024 via email

@spond
Copy link
Member

spond commented Jan 5, 2024

Dear @fatima-akhtar113,

Yes, based on your alignment, aBSREL obtained a p-value of ~0.0 on the human branch (also see https://observablehq.com/@spond/absrel?url=https://www.datamonkey.org/absrel/65978ad3ba6f2072cc42906e/results for a newer visualization).

However, I would encourage you to check the alignment for robustness. Some of the "hotspots" for positive selection signal, e.g. codons around position 1150

image

seem to correspond to a gappy region which may have been misaligned

image

See https://www.ebi.ac.uk/Tools/services/web/toolresult.ebi?jobId=mview-I20240105-132233-0645-54922136-p1m

Best,
Sergei

Copy link

github-actions bot commented Mar 6, 2024

Stale issue message

@fatima-akhtar113
Copy link
Author

fatima-akhtar113 commented Jun 4, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants