Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crashes due to L=-INF on larger datasets #1

Closed
stephens999 opened this issue Sep 22, 2015 · 2 comments
Closed

crashes due to L=-INF on larger datasets #1

stephens999 opened this issue Sep 22, 2015 · 2 comments

Comments

@stephens999
Copy link

we're running maptpx on some larger datasets. When we do this we get infinite L sometimes in this line:

if(is.infinite(L)){ L <- sum( (log(Q0)*col_sums(X))[Q0>0] )}

but the problem is that the sum() returns NA.
This then causes a crash later at
if(QNup$L < L){
because L is NA.

We tried to fix this by:
i) comment out the if(is.infinite(L)) line
ii) comment out reldif # reldif <- dif/L
iii) change tolerance check to dif instead of reldif: if(abs(dif) < tol){

This seemed to work except that it crashed (presumably after 1000 iterations) at
if(((iter+1)%%1000)==0){
cat(sprintf("p %d iter %d diff %g\n",
nrow(theta), iter+1,round(diff)))

due to the bug round(diff) should be round(dif).

We've fixed that and are now re-running - currently waiting for results.

What do you think? I think that the tolerance check should be on the difference in loglikelihood anyway - the relative change in loglikelihood is nonsense (at least so I always tell my students!)
But maybe there is also something to understand about why that sum() produces NA?

We can prepare a formal pull request with these changes if you like.

@mataddy
Copy link
Member

mataddy commented Sep 22, 2015

your fix makes a ton of sense; i don't know why that diff -> dif bug wasn't causing problems before; I'll fix it now.

i agree that checking for convergence on the raw log-likelihood is the right thing to do. I used to do it that way (and argue for it in my paper at http://jmlr.csail.mit.edu/proceedings/papers/v22/taddy12/taddy12.pdf) but switched to looking at relative differences because that's how most of the other packages out there work (especially the lda package). I have no strong feelings about it and could easily switch back though; keep me posted on whether you get reasonable results. Note that of course if you keep the tolerance the same you're going to need to run for many more iterations to converge for real rather than relative log LHD difference.

mataddy added a commit that referenced this issue Sep 22, 2015
@mataddy
Copy link
Member

mataddy commented Sep 22, 2015

i merged the pull in #2 and just pushed a small doc update. thanks for this!

@mataddy mataddy closed this as completed Sep 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants