Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When does RELAX collapse rate classes? #1708

Closed
NatJWalker-Hale opened this issue May 13, 2024 · 4 comments
Closed

When does RELAX collapse rate classes? #1708

NatJWalker-Hale opened this issue May 13, 2024 · 4 comments

Comments

@NatJWalker-Hale
Copy link

Hi guys,

As always, thanks so much for developing and supporting such useful methods. I just wanted to ask when and how RELAX chooses to collapse rate classes? From RELAX.bf I can see if (relax.inferred_ge_distribution[0][1] < 1e-5 || relax.inferred_ge_distribution[1][1] < 1e-5) - is this checking the inferred dN/dS of a rate class, or the inferred proportion of sites?

Thanks very much for the help!

Nat

@spond
Copy link
Member

spond commented May 13, 2024

Dear @NatJWalker-Hale,

RELAX checks the proportion of sites in ω categories 1 or 2 but just for the general exploratory model. Are you seeing this happen in your data?

Best,
Sergei

@NatJWalker-Hale
Copy link
Author

Dear @spond,

Thanks very much for the reply! I think that I'm understanding the output a little bit better now. I also ought to have mentioned that this is v2.5.60. For my example, the stdout contains:

### Fitting the general descriptive (separate k per branch) model

### * Log(L) = -6522.20, AIC-c = 13138.83 (47 estimated parameters)
* The following baseline rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |    4.060    |                                   |
|        Negative selection         |     0.000     |   93.895    |                                   |
|      Diversifying selection       |    48.917     |    2.044    |                                   |

and then

### Fitting the alternative model to test K != 1
* Log(L) = -6527.00, AIC-c = 13132.30 (39 estimated parameters)
* Relaxation/intensification parameter (K) =     0.64
* The following rate distribution was inferred for **test** branches

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.572     |   79.236    |                                   |
|         Neutral evolution         |     1.000     |   20.764    |                                   |
|      Diversifying selection       |     1.007     |    0.000    |       Not supported by data       |

* The following rate distribution was inferred for **reference** branches

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.418     |   79.236    |                                   |
|         Neutral evolution         |     1.000     |   20.764    |                                   |
|      Diversifying selection       |     1.011     |    0.000    |       Not supported by data       |

followed by

### Fitting the null (K := 1) model
* Log(L) = -6527.12, AIC-c = 13130.52 (38 estimated parameters)
* The following rate distribution for test/reference branches was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.457     |   77.869    |                                   |
|        Negative selection         |     0.878     |   22.131    |                                   |
|      Diversifying selection       |     1.014     |    0.000    |       Not supported by data       |

(so no significant evidence)

But just to be clear, even though the proportion is 0, the alternative and null are still fitting that site class, correct?

The Not supported by data message presumably appears when any proportion is estimated as 0. Is this also the case for the output JSON? For this example, none of the model fits have p2 or omega2 in the output.

On the other hand, I have just noticed that some of the sites in the alignment were empty (due to a pruning error), and now that I've removed those sites and re-run, the output JSON does have p2 and omega2 for each model (with p2 still 0). So I'm not sure what is going on there, but sanitising the input seems to have fixed it anyway.

Thanks again!

Nat

@spond
Copy link
Member

spond commented May 15, 2024

Dear @NatJWalker-Hale,

The intended behavior for Not supported by data screen output is as you surmised: the specified number of rates are always fitted, but if one (or more) of them have no support (very low weight) or essentially the same dN/dS values, the code will simply flag them as such for used information.

One exception is that if you fit All models (including the general exploratory), then the code can kick down the number of rates if it finds lack of support for the user-specified number of rates

io.ReportProgressMessageMD("RELAX", "ge", "\n ### Because some of the rate classes were collapsed to 0, the model is likely overparameterized. RELAX will reduce the number of site rate classes by one and repeat the fit now.\n----\n");

This will be indicated by the following echo to stdout

### Because some of the rate classes were collapsed to 0, the model is likely overparameterized. RELAX will reduce the number of site rate classes by one and repeat the fit now.
----

My guess is that's what happened when you had no corresponding rates in the JSON file (otherwise they should have been there, just with near 0 weights).

Best,
Sergei

@NatJWalker-Hale
Copy link
Author

Dear @spond,

Okay, that's good to know! Thanks very much for clarifying.

Cheers!

Nat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants