Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rootrem funtctions in ulong_extras module #119

Merged
merged 42 commits into from May 4, 2015

Conversation

Projects
None yet
3 participants
@kush789
Copy link

kush789 commented Feb 24, 2015

Hey,

I have tried to implement two algorithms to calculate the root remainder.

  • One is the newton iteration method present over here - http://en.wikipedia.org/wiki/Nth_root_algorithm. I cant seem to figure out why it is taking up a lot more time than expected. Maybe it's because of the required precision set (small_float in the code) is too small. However increasing it gives wrong answers. Because of this the number of iterations seem to be quite large, almost equivalent to the value of the input number, and increasing as n increases.
  • The second one is an algorithm I found in the book Modern Computer Arithmetic - Richard P. Brent and Paul Zimmermann. A link to the same is here - http://arxiv.org/pdf/1004.4710.pdf (section 1.5.2, page 29 ) . I found a reference to this book in the source code of mpn_rootrem. This is turning out to be comparable to the mpn_rootrem function when called in flint (including the overheads of changing data types). I have added some preliminary tests before the actual algorithm so as to avoid overflow when calling n_pow and so that time can be saved by using the fast n_pow present.
  • In both functions, the answers match those produced by mpz_rootrem. However the newton iteration method is horridly slow. It is taking up almost half a second for 1 - 1000. The second algorithm is pretty competitive in terms of speed.
@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Feb 24, 2015

Thanks Kushagra. Here are a few comments we came up with:

  1. The n_sqrt function had terrible documentation, which I've now fixed. It always computes the integer square root. The comment about "one too high" referred to the implementation details, not the output of the function. But the documentation was wrong in other ways too. I have now fixed the docs. If you pull from me you will get the update.

  2. In the n = 2 case, you may as well just use n_sqrtrem, since it already does exactly the right thing (modulo bugs in the libc).

  3. You are using an extremely expensive call to pow in each iteration. This function has to handle any power, not just integer powers, so it is extremely inefficient for purpose. This is likely the main problem with the implementation.

It is likely much better to replace it with a simple loop that multiplies the value by itself however many times. This will result in errors accumulating in the final bits, but that probably doesn't matter though, since you divide by n anyway.

  1. If you call the pow function even once, you may as well call it right at the start with pow(a, 1.0/n) and then just adjust the result, similar to the way the n_sqrt function works. Comment 3 eliminates this, but it might be worth trying an implementation along these lines anyway.

You can't guarantee anything about the output of pow, so it would have to keep looping until the result was correct.

The difficulty is handling overflow if you check the result with integer multiplication.

  1. The division by n can possibly be replaced by multiplication by 1/n, which you can precompute, since you use it every iteration of the loop. This should be faster.

  2. How did you arrive at the definition of small_float? It would be useful if you could write some code comments about why these values always work. Also, is there some cheap function you can compute to give you small float that doesn't cost as much as the if statements you currently have.

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Feb 24, 2015

Hey,

Thanks for the review, I'll try to make the newton iteration method more efficient over the next few days. Could you also review the second function?

There's no math behind the value of small_float, I tried testing it for different values of 10 ^ -k and the answer seemed to be wrong for greater values.

Thanks,
Kushagra

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Feb 24, 2015

For cube root you could write a special n_cbrt function. It can use the iteration:

x <- x - (x^3 - a)x/(2x^3 + a)

http://www.cims.nyu.edu/~dbindel/class/cs279/qbrt.pdf

Note the multiplication by 2 can be done with an addition.

One important thing for an efficient nth root is to get a good starting approximation.

Apparently the following bit hack offers a pretty good starting point, when taking an nth root of double a. Unfortunately as written it only works on a 64 bit machine (I'm not sure if int64_t is always available on a 32 bit machine these days, even in C90, if so just use that instead of slong and it will work).

The hack takes advantage of the floating point machine representation of the double a and manipulates the bits as an integer. The first and last lines are a hack to make it think the value is an integer, not a double. Note that is not the same thing as casting from a double to an integer!

static inline
double nth_root_bootstrap(double a, int n)
{ 
   slong i = *((slong *) &a);
   const slong s = (1 << 10) - 1;
   i = (i - (s << 52)) / n + (s << 52);
   return *((double *) &i);
}

Instead of /n there should probably be an explicit precomputed inverse worked out for each of the ~64 possible values of n. Note that's an integer divide, not floating point divide.

You can compute some precomputed inverses with the n_preinvert_limb function and print them out, then put the values in a table.

You'll have to look at n_mod2_preinv for example to see how to use these precomputed inverses to do division (you don't need the remainder of course, just the quotient).

Lots of steps here. Ask if you get stuck.

If my code isn't totally rubbish, the following table shows how close to the nth root it comes for a random floating point value (which happens to be an integer), for n = 1..64

n a approx a^1/n actual a^1/n
1: 1804289383.000000, 1804289383.000000, 1804289383.000000
2: 1804289383.000000, 43915.271103, 42476.927655
3: 1804289383.000000, 1256.234815, 1217.405888
4: 1804289383.000000, 213.772014, 206.099315
5: 1804289383.000000, 72.708806, 71.000472
6: 1804289383.000000, 35.628669, 34.891344
7: 1804289383.000000, 22.126572, 21.005327
8: 1804289383.000000, 14.680375, 14.356159
9: 1804289383.000000, 11.271445, 10.677718
10: 1804289383.000000, 8.544300, 8.426178
11: 1804289383.000000, 7.156500, 6.941964
12: 1804289383.000000, 6.226792, 5.906890
13: 1804289383.000000, 5.440116, 5.152566
14: 1804289383.000000, 4.765822, 4.583157
15: 1804289383.000000, 4.181433, 4.140827
16: 1804289383.000000, 3.835047, 3.788952
17: 1804289383.000000, 3.609456, 3.503391
18: 1804289383.000000, 3.408931, 3.267678
19: 1804289383.000000, 3.229513, 3.070252
20: 1804289383.000000, 3.068038, 2.902788
21: 1804289383.000000, 2.921941, 2.759157
22: 1804289383.000000, 2.789125, 2.634761
23: 1804289383.000000, 2.667859, 2.526086
24: 1804289383.000000, 2.556698, 2.430409
25: 1804289383.000000, 2.454430, 2.345591
26: 1804289383.000000, 2.360029, 2.269926
27: 1804289383.000000, 2.272620, 2.202045
28: 1804289383.000000, 2.191455, 2.140831
29: 1804289383.000000, 2.115888, 2.085369
30: 1804289383.000000, 2.045358, 2.034902
31: 1804289383.000000, 1.989690, 1.988797
32: 1804289383.000000, 1.958762, 1.946523
33: 1804289383.000000, 1.929708, 1.907630
34: 1804289383.000000, 1.902364, 1.871735
35: 1804289383.000000, 1.876582, 1.838510
36: 1804289383.000000, 1.852233, 1.807672
37: 1804289383.000000, 1.829199, 1.778977
38: 1804289383.000000, 1.807378, 1.752213
39: 1804289383.000000, 1.786676, 1.727194
40: 1804289383.000000, 1.767009, 1.703757
41: 1804289383.000000, 1.748302, 1.681758
42: 1804289383.000000, 1.730485, 1.661071
43: 1804289383.000000, 1.713497, 1.641583
44: 1804289383.000000, 1.697281, 1.623195
45: 1804289383.000000, 1.681786, 1.605816
46: 1804289383.000000, 1.666965, 1.589366
47: 1804289383.000000, 1.652774, 1.573775
48: 1804289383.000000, 1.639174, 1.558977
49: 1804289383.000000, 1.626130, 1.544914
50: 1804289383.000000, 1.613608, 1.531532
51: 1804289383.000000, 1.601576, 1.518785
52: 1804289383.000000, 1.590007, 1.506627
53: 1804289383.000000, 1.578875, 1.495021
54: 1804289383.000000, 1.568155, 1.483929
55: 1804289383.000000, 1.557825, 1.473318
56: 1804289383.000000, 1.547864, 1.463158
57: 1804289383.000000, 1.538252, 1.453421
58: 1804289383.000000, 1.528972, 1.444081
59: 1804289383.000000, 1.520006, 1.435114
60: 1804289383.000000, 1.511340, 1.426500
61: 1804289383.000000, 1.502957, 1.418217
62: 1804289383.000000, 1.494845, 1.410247
63: 1804289383.000000, 1.486990, 1.402573

Seems to not be insanely bad as a starting point.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Feb 24, 2015

It should be possible to prove that certain values of small_float work.
Code in flint needs to be guaranteed to work, not just heuristically work
(modulo unintended coding bugs of course, which we'll never completely
eliminate).

What you should be able to prove is something about the number of bits that
it takes to represent the correct root. Once you know that, you have an
idea how small the adjustment needs to be before it can only affect the
last bit of the result (i.e. it is smaller than that last bit).

We of course need to think about the case where that last bit could be one
off due to this, however. So there might be a final adjustment missing from
the current code.

Bill.

On 24 February 2015 at 22:29, Kushagra Singh notifications@github.com
wrote:

Hey,

Thanks for the review, I'll try to make the newton iteration method more
efficient over the next few days. Could you also review the second
function?

There's no math behind the value of small_float, I tried testing it for
different values of 10 ^ -k and the answer seemed to be wrong for greater
values.

Thanks,
Kushagra


Reply to this email directly or view it on GitHub
#119 (comment).

@fredrik-johansson

This comment has been minimized.

Copy link

fredrik-johansson commented Feb 24, 2015

Here is a reference implementation using binary search. It seems to be a bit slower than a call to pow() in the worst case and a bit faster in the best case. It should always be faster if you could figure out a quick way to rule out overflow without umul_ppmm. A proper Newton iteration should be slightly faster, but maybe this is good enough?

mp_limb_t
n_rootrem(mp_limb_t * r, mp_limb_t x, unsigned int k)
{
    mp_limb_t a, b, m, p, hi, lo;
    unsigned int bc;
    int i;

    if (k <= 1 || x <= 1)
    {
        *r = 0;
        return x;
    }

    if (k == 2)
    {
        return n_sqrtrem(r, x);
    }

    if (k >= FLINT_BITS || (UWORD(1) << k) > x)
    {
        *r = x - 1;
        return 1;
    }

    a = 2;
    b = UWORD(1) << ((FLINT_BIT_COUNT(x) + k - 1) / k);

    while (a < b)
    {
        m = a + (b - a) / 2;

        p = m + 1;

        for (i = 1; i < k; i++)
        {
            umul_ppmm(hi, lo, p, m + 1);

            if (hi != 0)
                goto too_large;
            else
                p = lo;
        }

        if (p == x)
        {
            *r = 0;
            return m + 1;
        }
        else if (p > x)
        {
            too_large:
            b = m;
        }
        else
        {
            a = m + 1;
        }
    }

    p = a;
    for (i = 1; i < k; i++)
        p *= a;

    *r = x - p;
    return a;
}
@fredrik-johansson

This comment has been minimized.

Copy link

fredrik-johansson commented Feb 24, 2015

Another possibility is to compute x^(1/k) approximately as exp(log(x)/k) using Taylor series for exp and log. This would still just give an approximation, i.e. a final adjustment is necessary, but it might be faster than the libm pow() since we only need 7 digit accuracy in the worst case. This should cost ~25 floating-point multiplications (and about as many additions), and 3 floating-point divisions (2 of which can use precomputed inverses). The code would also be nearly branch free and permit some instruction level parallelism.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Feb 24, 2015

That doesn't sound too bad at all.

I don't think it's competitive with the cubic iteration I posted (I'm
probably miscounting, but from the article it looks like, including 3-4 bit
approximate bootstrap, there are 6 multiplications, 3 divisions, one of
which is precomputed, 10 add/subs). There's two additional multiplications
and some adds to check the result as well I guess.

But for higher nth roots it looks pretty efficient.

On 24 February 2015 at 23:45, Fredrik Johansson notifications@github.com
wrote:

Another possibility is to compute x^(1/k) approximately as exp(log(x)/k)
using Taylor series for exp and log. This would still just give an
approximation, i.e. a final adjustment is necessary, but it might be faster
than the libm pow() since we only need 7 digit accuracy in the worst case.
This should cost ~25 floating-point multiplications (and about as many
additions), and 3 floating-point divisions (2 of which can use precomputed
inverses). The code would also be nearly branch free and permit some
instruction level parallelism.


Reply to this email directly or view it on GitHub
#119 (comment).

@fredrik-johansson

This comment has been minimized.

Copy link

fredrik-johansson commented Feb 24, 2015

By the way, it's trivial to avoid the extra loop to compute the remainder in the end of the binary search algorithm I posted. That makes it a fair bit faster.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Feb 24, 2015

The bootstrap algorithm above is a version of the approximate inverse square root code here:

http://en.wikipedia.org/wiki/Fast_inverse_square_root

(see the section "First approximation of the result" and the end of the section before it).

In our case (53 bit double) B = 2^52 and L = 2^10 - 1

We don't want the inverse square root, but the nth root, so we want to compute

1/n log_2(x)

The approx. to that value (with sigma = 0 so that the resulting value is always too large -- required by some of the algorithms that use the bootstrap), is given by

Iy = LB + (Ix - LB)/n

which is precisely what the code computes.

We can get a better approximation by tuning the value sigma, but then the value will sometimes be too low, sometimes too high, which the algorithms that use this don't necessarily want.

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Feb 27, 2015

Hi,

Sorry for the delayed follow up, I had exams from Monday till this morning. I worked upon the suggestions given. I tried various combinations of the bootstrap code along with a certain number of newton iterations. The best speed came when I used bootstrap code to get an estimate, followed by one newton iteration to get a closer value and then simply increasing/decreasing it till the correct result.

  • The bootstrap code seems to be a lot off only for small values of root (to be precise only 2 and 3). And since we are using sqrtrem for root = 2, I guess the first estimate isn't a lot off.
  • I removed the pow() calls. They were certainly the reason for the inefficiency. What I hadn't realized earlier was that floating point precision isn't really required for this function, as we are setting base as the integral part of n ^ 1/root anyway.
  • The answers are correct when compared to mpz_rootrem.
  • The code snippet which Fredrik sir commented above has better speeds for root <= 6 compared to n_rootrem. (by a factor of 2) However it becomes slower after that.
  • Also I was thinking that we can have a different function which checks for an exact root. This would be better for the perfect power testing function. We don't really need the remainder for that.
    These are the times I recorded on my machine (looping n from 1 to 1 000 000 )

root| n_rootrem | mpz_rootrem

1 | 0.0158970000 | 0.0621330000
2 | 0.0232890000 | 1.1546490000
3 | 0.2007970000 | 1.2319320000
4 | 0.2121500000 | 1.0352970000
5 | 0.2018700000 | 0.8864130000
6 | 0.2341120000 | 0.8323570000
7 | 0.2186210000 | 0.6710600000
8 | 0.2282150000 | 0.7152760000
9 | 0.2234660000 | 0.5983430000
10 | 0.2449520000 | 0.4469660000
11 | 0.2389620000 | 0.4202950000
12 | 0.2555650000 | 0.4399070000
13 | 0.3040590000 | 0.4715030000
14 | 0.2857570000 | 0.4838650000
15 | 0.2597950000 | 0.4629330000
16 | 0.2469840000 | 0.4150980000
17 | 0.2121830000 | 0.3738700000
18 | 0.1592150000 | 0.3402840000
19 | 0.0541930000 | 0.2693090000
20 | 0.0473350000 | 0.1194350000
21 | 0.0484500000 | 0.1180460000
22 | 0.0459520000 | 0.1184810000
23 | 0.0491670000 | 0.1191340000
24 | 0.0463980000 | 0.1187060000
25 | 0.0479300000 | 0.1150740000
26 | 0.0455520000 | 0.1235550000
27 | 0.0476480000 | 0.1223900000
28 | 0.0476310000 | 0.1233700000
29 | 0.0472150000 | 0.1229540000
30 | 0.0463600000 | 0.1205380000
31 | 0.0467580000 | 0.1180830000
32 | 0.0484000000 | 0.1161420000
33 | 0.0471450000 | 0.1196790000
34 | 0.0458660000 | 0.1204910000
35 | 0.0445870000 | 0.1194120000
36 | 0.0502830000 | 0.1171860000
37 | 0.0488970000 | 0.1166440000
38 | 0.0500780000 | 0.1188190000
39 | 0.0469700000 | 0.1180660000
40 | 0.0509750000 | 0.1208670000
41 | 0.0492990000 | 0.1225280000
42 | 0.0482460000 | 0.1205570000
43 | 0.0450340000 | 0.1199850000
44 | 0.0455790000 | 0.1237130000
45 | 0.0471540000 | 0.1217970000
46 | 0.0535110000 | 0.1171140000
47 | 0.0492430000 | 0.1215960000
48 | 0.0493000000 | 0.1198310000
49 | 0.0498940000 | 0.1217440000
50 | 0.0472360000 | 0.1192850000
51 | 0.0455480000 | 0.1212650000
52 | 0.0488590000 | 0.1206530000
53 | 0.0480320000 | 0.1193290000
54 | 0.0464660000 | 0.1228550000
55 | 0.0470500000 | 0.1194870000
56 | 0.0465240000 | 0.1188190000
57 | 0.0485680000 | 0.1193820000
58 | 0.0458630000 | 0.1178250000
59 | 0.0494700000 | 0.1213760000
60 | 0.0460460000 | 0.1218320000
61 | 0.0489670000 | 0.1212340000
62 | 0.0462610000 | 0.1189640000
63 | 0.0495940000 | 0.1197950000
64 | 0.0482140000 | 0.1199730000

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 1, 2015

Hi Kushagra,

It sounds like we are getting somewhere with this. Thanks for the hard work
on it.

I have been travelling the last few days and so now have a big overflow of
work in my inbox. It may take a few days for me to really look at this
properly. But I will get back to you on it soon, if Fredrik doesn't beat me
to it.

Bill.

On 27 February 2015 at 23:15, Kushagra Singh notifications@github.com
wrote:

Hi,

Sorry for the delayed follow up, I had exams from Monday till this
morning. I worked upon the suggestions given. I tried various combinations
of the bootstrap code along with a certain number of newton iterations. The
best speed came when I used bootstrap code to get an estimate, followed by
one newton iteration to get a closer value and then simply
increasing/decreasing it till the correct result.

The bootstrap code seems to be a lot off only for small values of root
(to be precise only 2 and 3). And since we are using sqrtrem for root = 2,
I guess the first estimate isn't a lot off.

I removed the pow() calls. They were certainly the reason for the
inefficiency. What I hadn't realized earlier was that floating point
precision isn't really required for this function, as we are setting base
as the integral part of n ^ 1/root anyway.

The answers are correct when compared to mpz_rootrem.

The code snippet which Fredrik sir commented above has better speeds
for root <= 6 compared to n_rootrem. (by a factor of 2) However it becomes
slower after that.

These are the times I recorded on my machine (looping n from 1 to 1 000
000 )
root| n_rootrem | mpz_rootrem

1 | 0.0158970000 | 0.0621330000
2 | 0.0232890000 | 1.1546490000
3 | 0.2007970000 | 1.2319320000
4 | 0.2121500000 | 1.0352970000
5 | 0.2018700000 | 0.8864130000
6 | 0.2341120000 | 0.8323570000
7 | 0.2186210000 | 0.6710600000
8 | 0.2282150000 | 0.7152760000
9 | 0.2234660000 | 0.5983430000
10 | 0.2449520000 | 0.4469660000
11 | 0.2389620000 | 0.4202950000
12 | 0.2555650000 | 0.4399070000
13 | 0.3040590000 | 0.4715030000
14 | 0.2857570000 | 0.4838650000
15 | 0.2597950000 | 0.4629330000
16 | 0.2469840000 | 0.4150980000
17 | 0.2121830000 | 0.3738700000
18 | 0.1592150000 | 0.3402840000
19 | 0.0541930000 | 0.2693090000
20 | 0.0473350000 | 0.1194350000
21 | 0.0484500000 | 0.1180460000
22 | 0.0459520000 | 0.1184810000
23 | 0.0491670000 | 0.1191340000
24 | 0.0463980000 | 0.1187060000
25 | 0.0479300000 | 0.1150740000
26 | 0.0455520000 | 0.1235550000
27 | 0.0476480000 | 0.1223900000
28 | 0.0476310000 | 0.1233700000
29 | 0.0472150000 | 0.1229540000
30 | 0.0463600000 | 0.1205380000
31 | 0.0467580000 | 0.1180830000
32 | 0.0484000000 | 0.1161420000
33 | 0.0471450000 | 0.1196790000
34 | 0.0458660000 | 0.1204910000
35 | 0.0445870000 | 0.1194120000
36 | 0.0502830000 | 0.1171860000
37 | 0.0488970000 | 0.1166440000
38 | 0.0500780000 | 0.1188190000
39 | 0.0469700000 | 0.1180660000
40 | 0.0509750000 | 0.1208670000
41 | 0.0492990000 | 0.1225280000
42 | 0.0482460000 | 0.1205570000
43 | 0.0450340000 | 0.1199850000
44 | 0.0455790000 | 0.1237130000
45 | 0.0471540000 | 0.1217970000
46 | 0.0535110000 | 0.1171140000
47 | 0.0492430000 | 0.1215960000
48 | 0.0493000000 | 0.1198310000
49 | 0.0498940000 | 0.1217440000
50 | 0.0472360000 | 0.1192850000
51 | 0.0455480000 | 0.1212650000
52 | 0.0488590000 | 0.1206530000
53 | 0.0480320000 | 0.1193290000
54 | 0.0464660000 | 0.1228550000
55 | 0.0470500000 | 0.1194870000
56 | 0.0465240000 | 0.1188190000
57 | 0.0485680000 | 0.1193820000
58 | 0.0458630000 | 0.1178250000
59 | 0.0494700000 | 0.1213760000
60 | 0.0460460000 | 0.1218320000
61 | 0.0489670000 | 0.1212340000
62 | 0.0462610000 | 0.1189640000
63 | 0.0495940000 | 0.1197950000
64 | 0.0482140000 | 0.1199730000


Reply to this email directly or view it on GitHub
#119 (comment).

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 3, 2015

Hey,

  • I have implemented the n_cbrt function. I am using an algorithm similar to n_rootrem, first getting a good estimate from the bootstrap function you had suggested, then carrying out a newton iteration and then verifying it by simply increasing or decreasing it. The method you had suggested over here - http://www.cims.nyu.edu/~dbindel/class/cs279/qbrt.pdf wasn't that effective as we are looking for only the truncated part, not the exact decimal value. What makes me happier is that the function is giving a better time than a call to pow(n, 1./3.).
  • I am checking for overflow in the function at the stage after the iteration. If it is grater than 2642245 (or 1626 in case of a 32 bit machine), I am calling n_pow by typecasting it as a double, adding one to the answer and adjusting it accordingly. I had to add one as pow(n, 1./3.) was giving a wrong answer for perfect cubes (It was giving one less than the correct answer).
  • I have added the test function and docs for the same, calling mpz_root(ans, n, 3) to verify the answers. It is quite similar to the way n_sqrt is tested presently.
  • I have made some minor changes in n_rootrem(). There was a small mistake I was making in the newton iteration. Now it is giving an even closer answer, which means lesser iterations before returning. This has resulted in better speeds, specially for root = 3.
  • I made some minor changes in the ulong_extras doc. A couple of them had me confused for a couple of minutes initially. It is more intuitive now.

Thanks,
Kushagra

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 3, 2015

If the n_rootrem function seems ok, should I go ahead and implement the same in the fmpz module? As you said we can call this function if n is small, otherwise make a call to gmp_rootrem().

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 3, 2015

I got these timings on my machine while looping n from 1 to 1 000 000 000 :

  • pow(n, 1./3.) took 152.069542 seconds to execute
  • n_cbrt(n) took 102.191376 seconds to execute
@fredrik-johansson

This comment has been minimized.

Copy link

fredrik-johansson commented Mar 3, 2015

You should be testing and profiling input all the way up to 2^64-1. It's also useful to generate random input of the form x^n, x^n + 1, x^n - 1.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

Did you mean ulong_extras.h instead of ulongextras.h. Also you need to include ulong_extras.h in the files from which this function is called, not just in the implementation .c file for the function itself.

Bill.

On 16 March 2015 at 08:21, Kushagra Singh notifications@github.com wrote:

Yes,
the header file has FLINT_DLL double n_cbrt_estimate(double a);
the file containing this function has both longlong.h and ulongextras.h .

The weird part is that when I copy the same function into a C file and
call it in main from that file, it return a correct value. I am totally
confused about this.


Reply to this email directly or view it on GitHub
#119 (comment).

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 16, 2015

Yes, sorry that was a typo.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

If my amended comment is no help, can you commit what you have and add it to the pull request so I can take a look.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

Also check you are including flint.h in cbrt_estimate.c.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

Something you can also try, though it isn't a fix, is to put the prototype for n_cbrt_estimate in the files where it is called. If that works, then somehow, it isn't picking up the prototype in ulong_extras.h.

I don't see how. But that's the only possible diagnosis I can think of.

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 16, 2015

Adding a prototype did work. Do you know why it might not be picking it up?

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

No idea unfortunately. Can you push your code and I'll try to figure it out.

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 16, 2015

I'll make the other changes and do it.
On 16/03/2015 1:35 pm, "wbhart" notifications@github.com wrote:

No idea unfortunately. Can you push your code and I'll try to figure it
out.


Reply to this email directly or view it on GitHub
#119 (comment).

Commit summary:
- Added seperate function n_root_estimate and cbrt_estimate.
- Removed duplicated estimate codes.
- Renamed files as per convention.
@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 16, 2015

I have pushed the code. I made one more observation. When called inside n_cbrt_newton_iteration() and in n_cbrt(), the estimate that n_cbrt_estimate() returns is correct. However surprisingly, If i call n_cbrt_estimate as a normal function in a main of a new file, the returned value is wrong. So although the cube root functions are not getting effected, n_cbrt_estimate() if called otherwise in main, will give wrong answers.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

And the file in which you have your main function #includes ulong_extras.h and flint.h? And the location you are specifying for the includes when you compile it is the correct one (the flint source tree, not the installed location of flint, e.g. maybe you need to do make install?)

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 16, 2015

I kept all these things in mind while I was checking (header files and make
install)
On 16/03/2015 3:00 pm, "wbhart" notifications@github.com wrote:

And the file in which you have your main function #includes ulong_extras.h
and flint.h? And the location you are specifying for the includes when you
compile it is the correct one (the flint source tree, not the installed
location of flint, e.g. maybe you need to do make install?)


Reply to this email directly or view it on GitHub
#119 (comment).

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

Can you email a copy of your external program that you are using to check this and I'll see if I can replicate it.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

Well, I am personally ready to sign off on this code. Only Fredrik needs to agree and it can be merged.

I'm pretty sure the issue with the prototype is a compilation issue. I don't think anything is wrong in the code you've written for flint itself.

We can keep trying to figure it out. Try deleting every ulong_extras.h you can find on your system, except the one you think you are compiling against. Check that ulong_extras.h definitely has the right prototype in it. Recompile your external program and see if anything shakes loose.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

My version of gcc gives a few compilation warnings, which are worth fixing:

In file included from root_estimate.c:33:0:
root_estimate.c: In function ‘n_root_estimate’:
/home/wbhart/flint2/flint.h:95:15: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
#define slong mp_limb_signed_t
^
root_estimate.c:114:5: note: in expansion of macro ‘slong’
slong s = ((1 << 10) - 1);
^
root.c: In function ‘n_root’:
root_estimate.c:109:22: warning: unused variable ‘ret’ [-Wunused-variable]
ulong i, hi, lo, ret;
^
root.c:109:5: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
const mp_limb_t upper_limit = max_base[root]; /* n <= upper_limit^root */
^

rootrem.c: In function ‘n_rootrem’:
rootrem.c:117:5: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
const mp_limb_t upper_limit = max_base[root]; /* n <= upper_limit^root */
^

Some of the mixed declaration ones might be fun to fix. All the variables need to be declared at the top of the function in C90. You can't insert other statements between variable declarations.

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 16, 2015

I have changed the code which was giving errors. In some cases assignment of a constant before some code was not possible, for example

rootrem.c: In function ‘n_rootrem’:
rootrem.c:117:5: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
const mp_limb_t upper_limit = max_base[root]; /* n <= upper_limit^root */
^

as there was an upper bound on root, which was being checked earlier on. So I removed the const keyword. Make check looks cleaner now.

@wbhart

This comment has been minimized.

Copy link
Owner

wbhart commented Mar 16, 2015

Thanks Kushagra. I'm completely happy with this code now. It's over to Fredrik now.

@fredrik-johansson

This comment has been minimized.

Copy link

fredrik-johansson commented Mar 16, 2015

I should have time to review this comprehensively later in the week.

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Mar 29, 2015

Fredrik sir, were you able to have a look at the commits recently?

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Apr 4, 2015

I noticed that at one place we were calculating constants. I have changed that code as it is not necessary to do that.

@fredrik-johansson

This comment has been minimized.

Copy link

fredrik-johansson commented Apr 4, 2015

Regarding the last commits, this is not necessary to do since the compiler can fold such constants. Also, you need to use UWORD() for the large literal.
Why are s and uword_val slong and not ulong?

@fredrik-johansson

This comment has been minimized.

Copy link

fredrik-johansson commented Apr 4, 2015

There is still a spelling error in the function name n_cbrt_chebyshef_approx

@kush789

This comment has been minimized.

Copy link
Author

kush789 commented Apr 4, 2015

There was no particular reason for s and uword_val to be slong, I have changed them. And the spelling of chebyshev as well.

wbhart added a commit that referenced this pull request May 4, 2015

Merge pull request #119 from kush789/trunk
rootrem funtctions in ulong_extras module

@wbhart wbhart merged commit 40c26d4 into wbhart:trunk May 4, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.