BUG: density function returns NaN for certain values of the parameters. #80

martinjankowiak · 2021-05-28T00:55:06Z

hello @zoj613 thanks for the great library! i've been able to use it for sampling with great ease. recently i've been looking at using log pdfs to compute MH ratios and the like. i am encountering nans in strange places. for example:

from polyagamma import polyagamma_pdf

x = polyagamma_pdf(25.80653973, 100.0, return_log=1)
y = polyagamma_pdf(25.10653973, 100.0, return_log=1)
print("x", x)
print("y", y)

yields

x nan
y -1.766024369081947

have you encountered anything like this? is the non-log version more stable and better tested?

i'm using pip install --pre -U polyagamma and thus version 1.3.1.

incidentally, would it be possible to support some sort of precision argument to the pdf methods so that users can make trade offs between speed and accuracy? or otherwise just dictate a fixed number of terms in the series to use? i'm not sure if that can help with parallelization....

The text was updated successfully, but these errors were encountered:

zoj613 · 2021-05-28T10:18:44Z

Hi @martinjankowiak, thank you for the bug report. Im glad to hear you have found the library useful for sampling.

This is definitely unexpected behaviour. From my quick analysis, it appears that the implementation breaks down for sufficiently large values of the scale parameter h. Here is a plot of the pdf (optained by calling exp() on the result of the logpdf) for h=100 compared to much smaller values:

As can be seen, for values of x just above 25, things suddenly go wrong. I did a quick print-statement debug of the C function and as of now the problem occurs on this line:

polyagamma/src/pgm_density.c

Line 117 in da31ac8

sum += sign * exp(curr - first) * t / h;

Here is the output of the line when using the input you provided:

i=0, curr=309.346381
i=1, sum=-12.647341
i=2, sum=77.701459
i=3, sum=-309.133375
i=4, sum=895.831478
i=5, sum=-2016.643901
i=6, sum=3672.975365
i=7, sum=-5566.288155
i=8, sum=7164.219293
i=9, sum=-7954.477246
i=10, sum=7713.228093
i=11, sum=-6597.153887
i=12, sum=5017.974055
i=13, sum=-3417.677855
i=14, sum=2096.509822
i=15, sum=-1164.135168
i=16, sum=587.682449
i=17, sum=-270.753110
i=18, sum=114.224948
i=19, sum=-44.259741
i=20, sum=15.793703
i=21, sum=-5.202792
i=22, sum=1.585683
i=23, sum=-0.448009
i=24, sum=0.117553
i=25, sum=-0.028693
i=26, sum=0.006525
i=27, sum=-0.001384
i=28, sum=0.000274
i=29, sum=-0.000051
i=30, sum=0.000009
i=31, sum=-0.000001
i=32, sum=0.000000
i=33, sum=-0.000000
i=34, sum=0.000000
i=35, sum=0.000000
i=36, sum=0.000000
i=37, sum=0.000000
i=38, sum=0.000000
i=39, sum=0.000000
i=40, sum=0.000000

The sum eventually vanishes after many iterations. It doesn't help that there is a division by h on that line, which I suspect causes underflow for when h is big enough compared to the numerator. After exiting the loop, the logarithm function is called with sum, which then returns -inf.

As a quick workaround, I wrote a function that calculates the logpdf using normal approximation. Since h is large, I suspect that its accuracy for computing the density wont be too bad?

import math

def polyagamma_logpdf_large_h(x, h=1, z=0):
    if z == 0:
        mean = 0.25 * h
        var = 0.041666688 * h
    else:
        x = math.tanh(0.5 * z)
        mean = 0.5 * h * x / z
        var = 0.25 * h * (math.sinh(z) - z) * (1 - x * x) / (z ** 3)

    return -math.log(math.sqrt(var)) - 0.5 * math.log(2 * math.pi) - 0.5 * (x - mean) ** 2 / var

and using it with the input,we get:

>>> polyagamma_logpdf_large_h(25.80653973, h=100.0)
-1.7105576873858082
>>> polyagamma_logpdf_large_h(25.10653973, h=100.0)
-1.6338590520155094

Below is the pdf plot of the above normal approximation vs the pdf as it stands in the main branch for large scale parameter values and z=0:

I will have to investigate the real cause of numerical instability because I am sure it happens with other combinations of (h, z).

zoj613 · 2021-05-28T10:34:23Z

Also worth noting that this behaviour happens even if we call polyagamma_pdf without the return_log flag:

Also note that for larger values of the exponential tilting parameter (e.g z>=5), the numerical instability of the pdf seems to go away:

zoj613 · 2021-06-05T19:13:17Z

@martinjankowiak I submitted a re-write at #81 that addresses part of the issue and ensures nans are not returned. However the instability still persists for large h when z=0, leading to the wrong value being returned. For z >> 0 This seems to not be an issue, for example:

Do you have suggestions on how to address this? The implementation seems to break down at around x>=20 for very small values of z parameter. I have tried all the obvious tricks I know of to deal with instability for extreme values but nothing works. I'm interested in what use case do you require z=0?. In most applications I have seen z tends to be reasonably larger than 1. So I suspect that many users who use this library for computing the logpdf will not run into this issue.

martinjankowiak · 2021-06-05T19:33:37Z

@zoj613 thanks for taking a closer look at this! for my present use case i can actually compute the necessary MH ratio without explicitly evaluating the density. so i myself don't have a pressing need for this tricky regime at present.

i'm not sure how best to address the remaining instabilities as i haven't stared at the necessary formulae long enough. for what it's worth when i implemented a somewhat rough approximation to PG(1, 0) in pyro i found it helpful to group even and odd terms together. i have no idea if that would be helpful here. i guess one place to look for ideas might be generic references about summing alternating series.

zoj613 · 2021-07-16T12:16:11Z

I considered a saddle point approximation of the density as a viable solution for large values of (x, h). I came to the conclusion that it might not be the best choice. The derivative of the cumulant function of a PG(h, 0) is

K'(t) = 0.25 * h * tanh(sqrt(-0.5 * t)) / sqrt(-0.5 * t).

When computing the saddle point approximation, we have to solve for t in K'(t) = x. I noticed that for x >= 0.25 * h there is no real solution since tanh(x) / x has range (0, 1). This might be related to why the numerical instubilities manifest at around x=25 when h=100 in the plots above. The math could be wrong, of course. I'm hoping that someone more knowledgeable might chime in and offer some other alternatives.

zoj613 assigned zoj613 and unassigned zoj613 May 28, 2021

zoj613 added the bug Something isn't working label May 28, 2021

zoj613 changed the title ~~nans in log pdfs~~ BUG: nans in log pdfs for certain values of the parameters. May 28, 2021

zoj613 added the polyagamma_pdf issue related to the density function of the distribution label May 28, 2021

zoj613 changed the title ~~BUG: nans in log pdfs for certain values of the parameters.~~ BUG: density function returns NaN for certain values of the parameters. May 28, 2021

zoj613 mentioned this issue Jun 1, 2021

BUG: rewrite logpdf implementation to address numerical instability #81

Draft

zoj613 added the help wanted Extra attention is needed label Jun 5, 2021

zoj613 mentioned this issue Jun 13, 2021

BUG: sampler occasionally hangs #83

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: density function returns NaN for certain values of the parameters. #80

BUG: density function returns NaN for certain values of the parameters. #80

martinjankowiak commented May 28, 2021 •

edited

Loading

zoj613 commented May 28, 2021 •

edited

Loading

zoj613 commented May 28, 2021 •

edited

Loading

zoj613 commented Jun 5, 2021 •

edited

Loading

martinjankowiak commented Jun 5, 2021

zoj613 commented Jul 16, 2021 •

edited

Loading

BUG: density function returns NaN for certain values of the parameters. #80

BUG: density function returns NaN for certain values of the parameters. #80

Comments

martinjankowiak commented May 28, 2021 • edited Loading

zoj613 commented May 28, 2021 • edited Loading

zoj613 commented May 28, 2021 • edited Loading

zoj613 commented Jun 5, 2021 • edited Loading

martinjankowiak commented Jun 5, 2021

zoj613 commented Jul 16, 2021 • edited Loading

martinjankowiak commented May 28, 2021 •

edited

Loading

zoj613 commented May 28, 2021 •

edited

Loading

zoj613 commented May 28, 2021 •

edited

Loading

zoj613 commented Jun 5, 2021 •

edited

Loading

zoj613 commented Jul 16, 2021 •

edited

Loading