Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RELAX with branch specific equilibrium frequencies #682

Closed
saurabh-mk opened this issue Oct 30, 2017 · 11 comments
Closed

RELAX with branch specific equilibrium frequencies #682

saurabh-mk opened this issue Oct 30, 2017 · 11 comments

Comments

@saurabh-mk
Copy link

Hello,

Sorry to revisit an old issue which remained unresolved. I hope it was ok to start a new one.

  1. I want to test for relaxed selection on some branches while simultaneously assigning separate equilibrium nucleotide frequencies to the same branches.
  2. If possible, I would like to do this for two branches in the same tree i.e. test for relaxation with different equilibrium frequencies on multiple sets of "test" branches (different from background).

Can you guide me on how to accomplish this? What I have in mind is similar to Wertheim et al (MBE 2015) in the case of endosymbionts.

Thanks!
Saurabh Mahajan

@spond
Copy link
Member

spond commented Nov 6, 2017

Dear @saurabh-mk,

You can't assign equilibrium frequencies to branches; you can parameterize different substitution models that will induce different frequencies, and assign them to different branches.

Where are you getting the frequencies for the branches that are being tested? Typically these frequencies are estimated by counts from the data.

In order to implement this, you will need to edit the file RELAX.bf which is located inside TemplateBatchFiles/SelectionAnalyses.

Around line 200 in that file you will find the following statement, which defines the substitution model for the test branches.
RELAX.test.model = relax.io.define_a_bsrel_model (RELAX.test, relax.codon_frequencies, estimators.GetGlobalMLE (relax.mg_results, RELAX.test) ,1);

The equilibrium frequencies are passed in the relax.codon_frequencies which is itself defined around line 100

relax.codon_frequencies = frequencies._aux.CF3x4(frequencies._aux.empirical.collect_data ("RELAX.codon_filter",3,1,1), models.DNA.alphabet, relax.codon_lists["sense"], relax.codon_lists["stop"]);

You could define a different set of frequencies by creating another variable

relax.codon_frequencies_user = frequencies._aux.CF3x4(freq_matrix, models.DNA.alphabet, relax.codon_lists["sense"], relax.codon_lists["stop"]);

where freq_matrix is the 4x3 matrix of positional nucleotide frequencies (A,C,G,T in rows, codon position 1, codon position 2, codon position 3 in columns). This will apply the CF3x4 estimator to the set of nucleotide frequencies, correcting for unobservable stop codons.

Then pass relax.codon_frequencies_user as an argument to relax.io.define_a_bsrel_model to create a model with different base frequencies.

Another issue that will occur at this point is that if you define a tree where models assigned to different branches have different equilibrium frequencies, HyPhy will require that you tell it which frequencies to use at the root of the tree (since the model is no longer in equilibrium).

You can do so by changing the line

LikelihoodFunction relax.LF = (RELAX.codon_filter, RELAX.tree);

to

relax.root_freqs = relax.codon_frequencies_user [terms.codons]); // or relax.root_freqs = relax.codon_frequencies [terms.codons]) LikelihoodFunction3 relax.LF = (RELAX.codon_filter, RELAX.tree, relax.root_freqs);

Regarding your question 2: yes, something like this can be implemented, but it requires considerably more work if you start with RELAX.bf, because you actually have to redefine the entire testing procedure, seeing how you now have two relaxation parameters (K_1 and K_2) and could runs tests like

  1. K_1 ≠ K_2
  2. K_1 ≠ 1
  3. K_2 ≠ 1

...

Best,
Sergei

@saurabh-mk
Copy link
Author

Hi,

I am sorry to report that I couldn't find these (or even similar) lines in the RELAX.bf file. I tried the current master as well as 2.2.6 and 2.3.4 versions.

I found that lines similar to what you suggest already appear RELAX-Groups.bf, but I am not sure if they work the same way.

For example, this is line 99 in RELAX-Groups.bf

relax.codon_frequencies_test = frequencies._aux.CF3x4(frequencies._aux.empirical.collect_data ("RELAX.codon_filter.test",3,1,1), models.DNA.alphabet, relax.codon_lists["sense"], relax.codon_lists["stop"]);

What am I missing?

Saurabh

@saurabh-mk
Copy link
Author

Hi,
apologies for persisting despite the in progress label. And thanks for taking this up.

Would it be possible to say when this issue is likely to be answered? Meanwhile, I am wondering if you could make available the scripts use for non-homogeneous models in the RELAX paper (referred on pg 826, right column, first para of the paper)?

Saurabh

@spond
Copy link
Member

spond commented Feb 6, 2018

Dear @saurabh-mk,

Apologies for the delay. Unfortunately, the original RELAX implementation from the paper is not compatible with the current release of HyPhy (stale code).

I am attaching a version of RELAX (RELAX-het.txt) which allows you to specify a separate set of frequencies on test branches.

Please place it inside res/TemplateBatchFiles/SelectionAnalyses and run in from the command line like so (assuming you are doing it from the HyPhy source directory where binaries have been compiled)

./HYPHYMP LIBPATH=`pwd`/res/ res/TemplateBatchFiles/SelectionAnalyses/RELAX-het.txt

Currently, the analysis is set up to use the frequencies collected from sequences included in the test set, but you can look around line 300 to see other examples.

Best,
Sergei

RELAX-het.txt

@saurabh-mk
Copy link
Author

saurabh-mk commented Feb 7, 2018

Hello!

Thanks a lot for taking time to do this. Unfortunately, I get this error, after I run it as suggested (using the beta version)-

"Expression appears to be incomplete/syntax error. Scope: 1, paretheses depth: 0, matrix scope: 0.
for(relax.i=1;relax.i<relax.rate_classes;relax.i+=1){parameters.SetRange(model.generic.GetGlobalParameter(relax.reference.bsrel_model,terms.AddCategory(terms.parameters.omega_ratio,relax.i)),terms.range01);"

Saurabh

@spond
Copy link
Member

spond commented Feb 7, 2018

Dear @saurabh-mk,

Hmm, can you check out the master branch (v2.3.11) and confirm the error? I am not able to replicate it.

Best,
Sergei

@saurabh-mk
Copy link
Author

Cant seem to get the 2.3.11 version. This

git clone --branch master --single-branch https://github.com/veg/hyphy.git

gives me this- HYPHY 2.3.1020180207beta(MP) for Linux on x86_64

@spond
Copy link
Member

spond commented Feb 7, 2018

Dear @saurabh-mk,

2.3.10 should also work.

Best,
Sergei

@saurabh-mk
Copy link
Author

saurabh-mk commented Feb 7, 2018

Thanks for your responses. Really appreciate it.

I get the same error with all the versions-

"Expression appears to be incomplete/syntax error. Scope: 1, paretheses depth: 0, matrix scope: 0.
for(relax.i=1;relax.i<relax.rate_classes;relax.i+=1){parameters.SetRange(model.generic.GetGlobalParameter(relax.reference.bsrel_model,terms.AddCategory(terms.parameters.omega_ratio,relax.i)),terms.range01);"

Also on a different machine. I am at a loss how to trouble shoot/debug further. Can I provide any more details which may help?

Saurabh

@spond
Copy link
Member

spond commented Feb 7, 2018

Dear @saurabh-mk,

Many apologies for this. I made a syntax error when commenting out example usage. See the attached file.

Best,
Sergei
RELAX-het.txt

@saurabh-mk
Copy link
Author

saurabh-mk commented Feb 8, 2018

Hi,

Thanks so much! Its now running, but it gives further errors during likelihood calculations

### Fitting the alternative model to test K != 1
Error:
Internal error in ComputeBranchCache (branch UCkHTCla.tree_id_0.GCF_000007725_1_ASM772v1 ) reversible model cached likelihood = -78130.1, directly computed likelihood = -77888.2. This is most likely because a non-reversible model was incorrectly auto-detected (or specified by the model file in environment variables).
 
Function call stack
1 : Optimize storing into, mles, the following likelihood function:likelihoodFunction ; 
-------
2 : relax.alternative_model.fit=estimators.FitLF(relax.filter_names,relax.trees,{"0":relax.model_map},relax.general_descriptive.fit,relax.model_object_map,{terms.run_options.retain_lf_object:TRUE})
-------

I have tried both analysis types- All and Minimal. They give the same errors.

The likelihoods and the global estimates under the GTR model and the full codon models is exactly the same as the homogenous versions. So, I assume at that point its showing the output of fitting the homogenous model.

I will try and dig into the RELAX-het.txt file to understand more. But do you have any idea of why the likelihood errors?

I have attached the alignment file and the annotated tree.

annotated_phylogeny.txt
combined_alignment.txt

Saurabh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants