Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

negative numbers in pmatrix #125

Closed
xflouris opened this issue Mar 7, 2017 · 6 comments
Closed

negative numbers in pmatrix #125

xflouris opened this issue Mar 7, 2017 · 6 comments
Assignees

Comments

@xflouris
Copy link
Owner

xflouris commented Mar 7, 2017

The parameter optimization routine may evaluate non-sensible parameters that cause the pmatrix to be close to the identity matrix but, due to numerical underflow, cause some cells to have small negative values.

Problem:

After computing the expd[1..states] array (exponential of rate*branch*eigenvalue[state]), we may end up with expd = {1,1,...,1} where each value is 1 +- a very small number. Hence we will be multiplying the eigenvectors matrix with the inverse eigenvectors matrix, with the expectation of obtaining the identity matrix. However, since each column of the inverse eigenvectors matrix was multiplied with the respective element from expd, the floating point representation changes and the resulting pmatrix may contain small negative numbers.

Proposed fix:

If for all elements x of expd[1..states] holds that 1 - T < x < 1 + T, where T is a threshold value of, for example, 1e-6, then set the identity matrix as the pmatrix.

xflouris pushed a commit that referenced this issue Mar 9, 2017
… of branchlength*rate*eigenvalue[1..states] are approximately one - issue #125
@xflouris
Copy link
Owner Author

xflouris commented Mar 9, 2017

Added the check. I've set the threshold T to 1e-15 which seems to be the smallest value of T that can still work for the datasets I tried. For T=1e-16 we still get negative numbers.
I've also noticed that we obtain the log-likelihood retains its value (upto the 8th decimal point) even for larger thresholds, upto the value of T=1e-12. For larger values (e.g T=1e-11) I observe a difference of several integer units in the log-L.

@amkozlov : would be good if you could double-check the implementation.

@amkozlov
Copy link
Collaborator

@xflouris: I just added a test for negative values (#129), and unfortunately, it shows that current fix doesn't work in all situations.

In particular, I still get negative values with substitution rates 0.001000 1000.000000 0.001000 1000.000000 0.001000 1.000000. Although this could be fixed with increasing T to e.g. 10e-12, it will still fail with other settings (e.g., on AA data I got negative values even with T=10e-8).

The only way I found to get rid of all negative values in all my test cases was to use 'brute force' a posteriori check for negative values after pmatrix was computed, and then to set pmatrix to identity if at least one negative value was detected. Of course, this "solution" could also lead to logL differences, and could potentially conceal bugs if we get negative values in pmatrix for some other reason.

@stamatak
Copy link
Collaborator

stamatak commented May 25, 2017 via email

@amkozlov
Copy link
Collaborator

@xflouris @stamatak: I think we should implement an extremely simple and elegant solution suggested by Ben here:
#129 (comment)

I tested it with the scalar pmatrix kernel, and it fixed all the negs.

The only small caveat is that expm1 is not part of ANSI C (although it is available in GNU/gcc). But this could be easily solved by including the back-up implementation in libpll. e.g.
https://github.com/SurajGupta/r-source/blob/master/src/nmath/expm1.c

@amkozlov
Copy link
Collaborator

I committed the above fix for all pmatrix kernels (4498719), seems to work fine. Just in case, I left the old threshold-checking code in the comments, we can remove it later.

@xflouris: could you please double-check the implementation, and then hopefully we can close this nasty issue. I'm still impressed how simple the solution was, we should use math magic more often :)

@xflouris
Copy link
Owner Author

checked, seems to work fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants