{{ message }}

# wbhart / flint2

forked from fredrik-johansson/flint2

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

# rootrem funtctions in ulong_extras module #119

Merged
merged 42 commits into from May 4, 2015
Merged

# rootrem funtctions in ulong_extras module#119

merged 42 commits into from May 4, 2015

## Conversation

### kush789 commented Feb 24, 2015

 Hey, I have tried to implement two algorithms to calculate the root remainder. One is the newton iteration method present over here - http://en.wikipedia.org/wiki/Nth_root_algorithm. I cant seem to figure out why it is taking up a lot more time than expected. Maybe it's because of the required precision set (small_float in the code) is too small. However increasing it gives wrong answers. Because of this the number of iterations seem to be quite large, almost equivalent to the value of the input number, and increasing as n increases. The second one is an algorithm I found in the book Modern Computer Arithmetic - Richard P. Brent and Paul Zimmermann. A link to the same is here - http://arxiv.org/pdf/1004.4710.pdf (section 1.5.2, page 29 ) . I found a reference to this book in the source code of mpn_rootrem. This is turning out to be comparable to the mpn_rootrem function when called in flint (including the overheads of changing data types). I have added some preliminary tests before the actual algorithm so as to avoid overflow when calling n_pow and so that time can be saved by using the fast n_pow present. In both functions, the answers match those produced by mpz_rootrem. However the newton iteration method is horridly slow. It is taking up almost half a second for 1 - 1000. The second algorithm is pretty competitive in terms of speed.
added 2 commits Feb 24, 2015
``` Added rootrem function ```
``` daab978 ```
``` Added rootrem method using newton iteration method ```
``` a021236 ```

### wbhart commented Feb 24, 2015

 Thanks Kushagra. Here are a few comments we came up with: The n_sqrt function had terrible documentation, which I've now fixed. It always computes the integer square root. The comment about "one too high" referred to the implementation details, not the output of the function. But the documentation was wrong in other ways too. I have now fixed the docs. If you pull from me you will get the update. In the n = 2 case, you may as well just use n_sqrtrem, since it already does exactly the right thing (modulo bugs in the libc). You are using an extremely expensive call to pow in each iteration. This function has to handle any power, not just integer powers, so it is extremely inefficient for purpose. This is likely the main problem with the implementation. It is likely much better to replace it with a simple loop that multiplies the value by itself however many times. This will result in errors accumulating in the final bits, but that probably doesn't matter though, since you divide by n anyway. If you call the pow function even once, you may as well call it right at the start with pow(a, 1.0/n) and then just adjust the result, similar to the way the n_sqrt function works. Comment 3 eliminates this, but it might be worth trying an implementation along these lines anyway. You can't guarantee anything about the output of pow, so it would have to keep looping until the result was correct. The difficulty is handling overflow if you check the result with integer multiplication. The division by n can possibly be replaced by multiplication by 1/n, which you can precompute, since you use it every iteration of the loop. This should be faster. How did you arrive at the definition of small_float? It would be useful if you could write some code comments about why these values always work. Also, is there some cheap function you can compute to give you small float that doesn't cost as much as the if statements you currently have.

### kush789 commented Feb 24, 2015

 Hey, Thanks for the review, I'll try to make the newton iteration method more efficient over the next few days. Could you also review the second function? There's no math behind the value of small_float, I tried testing it for different values of 10 ^ -k and the answer seemed to be wrong for greater values. Thanks, Kushagra

### wbhart commented Feb 24, 2015

 For cube root you could write a special n_cbrt function. It can use the iteration: ``````x <- x - (x^3 - a)x/(2x^3 + a) `````` http://www.cims.nyu.edu/~dbindel/class/cs279/qbrt.pdf Note the multiplication by 2 can be done with an addition. One important thing for an efficient nth root is to get a good starting approximation. Apparently the following bit hack offers a pretty good starting point, when taking an nth root of double a. Unfortunately as written it only works on a 64 bit machine (I'm not sure if int64_t is always available on a 32 bit machine these days, even in C90, if so just use that instead of slong and it will work). The hack takes advantage of the floating point machine representation of the double a and manipulates the bits as an integer. The first and last lines are a hack to make it think the value is an integer, not a double. Note that is not the same thing as casting from a double to an integer! ``````static inline double nth_root_bootstrap(double a, int n) { slong i = *((slong *) &a); const slong s = (1 << 10) - 1; i = (i - (s << 52)) / n + (s << 52); return *((double *) &i); } `````` Instead of /n there should probably be an explicit precomputed inverse worked out for each of the ~64 possible values of n. Note that's an integer divide, not floating point divide. You can compute some precomputed inverses with the n_preinvert_limb function and print them out, then put the values in a table. You'll have to look at n_mod2_preinv for example to see how to use these precomputed inverses to do division (you don't need the remainder of course, just the quotient). Lots of steps here. Ask if you get stuck. If my code isn't totally rubbish, the following table shows how close to the nth root it comes for a random floating point value (which happens to be an integer), for n = 1..64 n a approx a^1/n actual a^1/n 1: 1804289383.000000, 1804289383.000000, 1804289383.000000 2: 1804289383.000000, 43915.271103, 42476.927655 3: 1804289383.000000, 1256.234815, 1217.405888 4: 1804289383.000000, 213.772014, 206.099315 5: 1804289383.000000, 72.708806, 71.000472 6: 1804289383.000000, 35.628669, 34.891344 7: 1804289383.000000, 22.126572, 21.005327 8: 1804289383.000000, 14.680375, 14.356159 9: 1804289383.000000, 11.271445, 10.677718 10: 1804289383.000000, 8.544300, 8.426178 11: 1804289383.000000, 7.156500, 6.941964 12: 1804289383.000000, 6.226792, 5.906890 13: 1804289383.000000, 5.440116, 5.152566 14: 1804289383.000000, 4.765822, 4.583157 15: 1804289383.000000, 4.181433, 4.140827 16: 1804289383.000000, 3.835047, 3.788952 17: 1804289383.000000, 3.609456, 3.503391 18: 1804289383.000000, 3.408931, 3.267678 19: 1804289383.000000, 3.229513, 3.070252 20: 1804289383.000000, 3.068038, 2.902788 21: 1804289383.000000, 2.921941, 2.759157 22: 1804289383.000000, 2.789125, 2.634761 23: 1804289383.000000, 2.667859, 2.526086 24: 1804289383.000000, 2.556698, 2.430409 25: 1804289383.000000, 2.454430, 2.345591 26: 1804289383.000000, 2.360029, 2.269926 27: 1804289383.000000, 2.272620, 2.202045 28: 1804289383.000000, 2.191455, 2.140831 29: 1804289383.000000, 2.115888, 2.085369 30: 1804289383.000000, 2.045358, 2.034902 31: 1804289383.000000, 1.989690, 1.988797 32: 1804289383.000000, 1.958762, 1.946523 33: 1804289383.000000, 1.929708, 1.907630 34: 1804289383.000000, 1.902364, 1.871735 35: 1804289383.000000, 1.876582, 1.838510 36: 1804289383.000000, 1.852233, 1.807672 37: 1804289383.000000, 1.829199, 1.778977 38: 1804289383.000000, 1.807378, 1.752213 39: 1804289383.000000, 1.786676, 1.727194 40: 1804289383.000000, 1.767009, 1.703757 41: 1804289383.000000, 1.748302, 1.681758 42: 1804289383.000000, 1.730485, 1.661071 43: 1804289383.000000, 1.713497, 1.641583 44: 1804289383.000000, 1.697281, 1.623195 45: 1804289383.000000, 1.681786, 1.605816 46: 1804289383.000000, 1.666965, 1.589366 47: 1804289383.000000, 1.652774, 1.573775 48: 1804289383.000000, 1.639174, 1.558977 49: 1804289383.000000, 1.626130, 1.544914 50: 1804289383.000000, 1.613608, 1.531532 51: 1804289383.000000, 1.601576, 1.518785 52: 1804289383.000000, 1.590007, 1.506627 53: 1804289383.000000, 1.578875, 1.495021 54: 1804289383.000000, 1.568155, 1.483929 55: 1804289383.000000, 1.557825, 1.473318 56: 1804289383.000000, 1.547864, 1.463158 57: 1804289383.000000, 1.538252, 1.453421 58: 1804289383.000000, 1.528972, 1.444081 59: 1804289383.000000, 1.520006, 1.435114 60: 1804289383.000000, 1.511340, 1.426500 61: 1804289383.000000, 1.502957, 1.418217 62: 1804289383.000000, 1.494845, 1.410247 63: 1804289383.000000, 1.486990, 1.402573 Seems to not be insanely bad as a starting point.

### wbhart commented Feb 24, 2015

 It should be possible to prove that certain values of small_float work. Code in flint needs to be guaranteed to work, not just heuristically work (modulo unintended coding bugs of course, which we'll never completely eliminate). What you should be able to prove is something about the number of bits that it takes to represent the correct root. Once you know that, you have an idea how small the adjustment needs to be before it can only affect the last bit of the result (i.e. it is smaller than that last bit). We of course need to think about the case where that last bit could be one off due to this, however. So there might be a final adjustment missing from the current code. Bill. On 24 February 2015 at 22:29, Kushagra Singh notifications@github.com wrote: Hey, Thanks for the review, I'll try to make the newton iteration method more efficient over the next few days. Could you also review the second function? There's no math behind the value of small_float, I tried testing it for different values of 10 ^ -k and the answer seemed to be wrong for greater values. Thanks, Kushagra — Reply to this email directly or view it on GitHub #119 (comment).

### fredrik-johansson commented Feb 24, 2015

 Here is a reference implementation using binary search. It seems to be a bit slower than a call to pow() in the worst case and a bit faster in the best case. It should always be faster if you could figure out a quick way to rule out overflow without umul_ppmm. A proper Newton iteration should be slightly faster, but maybe this is good enough? ``````mp_limb_t n_rootrem(mp_limb_t * r, mp_limb_t x, unsigned int k) { mp_limb_t a, b, m, p, hi, lo; unsigned int bc; int i; if (k <= 1 || x <= 1) { *r = 0; return x; } if (k == 2) { return n_sqrtrem(r, x); } if (k >= FLINT_BITS || (UWORD(1) << k) > x) { *r = x - 1; return 1; } a = 2; b = UWORD(1) << ((FLINT_BIT_COUNT(x) + k - 1) / k); while (a < b) { m = a + (b - a) / 2; p = m + 1; for (i = 1; i < k; i++) { umul_ppmm(hi, lo, p, m + 1); if (hi != 0) goto too_large; else p = lo; } if (p == x) { *r = 0; return m + 1; } else if (p > x) { too_large: b = m; } else { a = m + 1; } } p = a; for (i = 1; i < k; i++) p *= a; *r = x - p; return a; } ``````

### fredrik-johansson commented Feb 24, 2015

 Another possibility is to compute x^(1/k) approximately as exp(log(x)/k) using Taylor series for exp and log. This would still just give an approximation, i.e. a final adjustment is necessary, but it might be faster than the libm pow() since we only need 7 digit accuracy in the worst case. This should cost ~25 floating-point multiplications (and about as many additions), and 3 floating-point divisions (2 of which can use precomputed inverses). The code would also be nearly branch free and permit some instruction level parallelism.

### wbhart commented Feb 24, 2015

 That doesn't sound too bad at all. I don't think it's competitive with the cubic iteration I posted (I'm probably miscounting, but from the article it looks like, including 3-4 bit approximate bootstrap, there are 6 multiplications, 3 divisions, one of which is precomputed, 10 add/subs). There's two additional multiplications and some adds to check the result as well I guess. But for higher nth roots it looks pretty efficient. On 24 February 2015 at 23:45, Fredrik Johansson notifications@github.com wrote: Another possibility is to compute x^(1/k) approximately as exp(log(x)/k) using Taylor series for exp and log. This would still just give an approximation, i.e. a final adjustment is necessary, but it might be faster than the libm pow() since we only need 7 digit accuracy in the worst case. This should cost ~25 floating-point multiplications (and about as many additions), and 3 floating-point divisions (2 of which can use precomputed inverses). The code would also be nearly branch free and permit some instruction level parallelism. — Reply to this email directly or view it on GitHub #119 (comment).

### fredrik-johansson commented Feb 24, 2015

 By the way, it's trivial to avoid the extra loop to compute the remainder in the end of the binary search algorithm I posted. That makes it a fair bit faster.

### wbhart commented Feb 24, 2015

 The bootstrap algorithm above is a version of the approximate inverse square root code here: http://en.wikipedia.org/wiki/Fast_inverse_square_root (see the section "First approximation of the result" and the end of the section before it). In our case (53 bit double) B = 2^52 and L = 2^10 - 1 We don't want the inverse square root, but the nth root, so we want to compute 1/n log_2(x) The approx. to that value (with sigma = 0 so that the resulting value is always too large -- required by some of the algorithms that use the bootstrap), is given by Iy = LB + (Ix - LB)/n which is precisely what the code computes. We can get a better approximation by tuning the value sigma, but then the value will sometimes be too low, sometimes too high, which the algorithms that use this don't necessarily want.
added 5 commits Feb 27, 2015
``` Made n_rootrem using Newton's method more efficient. ```
``` f5e4c6e ```
``` Modified header file ```
``` cefb3b5 ```
``` Added n_rootrem test ```
``` 949bef4 ```
``` Added documentation for n_rootrem, edited documentation of n_sqrtrem ```
``` 75a9dce ```
``` Removed old n_rootrem function ```
``` 689d4c8 ```

### kush789 commented Feb 27, 2015

 Hi, Sorry for the delayed follow up, I had exams from Monday till this morning. I worked upon the suggestions given. I tried various combinations of the bootstrap code along with a certain number of newton iterations. The best speed came when I used bootstrap code to get an estimate, followed by one newton iteration to get a closer value and then simply increasing/decreasing it till the correct result. The bootstrap code seems to be a lot off only for small values of root (to be precise only 2 and 3). And since we are using sqrtrem for root = 2, I guess the first estimate isn't a lot off. I removed the pow() calls. They were certainly the reason for the inefficiency. What I hadn't realized earlier was that floating point precision isn't really required for this function, as we are setting base as the integral part of n ^ 1/root anyway. The answers are correct when compared to mpz_rootrem. The code snippet which Fredrik sir commented above has better speeds for root <= 6 compared to n_rootrem. (by a factor of 2) However it becomes slower after that. Also I was thinking that we can have a different function which checks for an exact root. This would be better for the perfect power testing function. We don't really need the remainder for that. These are the times I recorded on my machine (looping n from 1 to 1 000 000 ) root| n_rootrem | mpz_rootrem 1 | 0.0158970000 | 0.0621330000 2 | 0.0232890000 | 1.1546490000 3 | 0.2007970000 | 1.2319320000 4 | 0.2121500000 | 1.0352970000 5 | 0.2018700000 | 0.8864130000 6 | 0.2341120000 | 0.8323570000 7 | 0.2186210000 | 0.6710600000 8 | 0.2282150000 | 0.7152760000 9 | 0.2234660000 | 0.5983430000 10 | 0.2449520000 | 0.4469660000 11 | 0.2389620000 | 0.4202950000 12 | 0.2555650000 | 0.4399070000 13 | 0.3040590000 | 0.4715030000 14 | 0.2857570000 | 0.4838650000 15 | 0.2597950000 | 0.4629330000 16 | 0.2469840000 | 0.4150980000 17 | 0.2121830000 | 0.3738700000 18 | 0.1592150000 | 0.3402840000 19 | 0.0541930000 | 0.2693090000 20 | 0.0473350000 | 0.1194350000 21 | 0.0484500000 | 0.1180460000 22 | 0.0459520000 | 0.1184810000 23 | 0.0491670000 | 0.1191340000 24 | 0.0463980000 | 0.1187060000 25 | 0.0479300000 | 0.1150740000 26 | 0.0455520000 | 0.1235550000 27 | 0.0476480000 | 0.1223900000 28 | 0.0476310000 | 0.1233700000 29 | 0.0472150000 | 0.1229540000 30 | 0.0463600000 | 0.1205380000 31 | 0.0467580000 | 0.1180830000 32 | 0.0484000000 | 0.1161420000 33 | 0.0471450000 | 0.1196790000 34 | 0.0458660000 | 0.1204910000 35 | 0.0445870000 | 0.1194120000 36 | 0.0502830000 | 0.1171860000 37 | 0.0488970000 | 0.1166440000 38 | 0.0500780000 | 0.1188190000 39 | 0.0469700000 | 0.1180660000 40 | 0.0509750000 | 0.1208670000 41 | 0.0492990000 | 0.1225280000 42 | 0.0482460000 | 0.1205570000 43 | 0.0450340000 | 0.1199850000 44 | 0.0455790000 | 0.1237130000 45 | 0.0471540000 | 0.1217970000 46 | 0.0535110000 | 0.1171140000 47 | 0.0492430000 | 0.1215960000 48 | 0.0493000000 | 0.1198310000 49 | 0.0498940000 | 0.1217440000 50 | 0.0472360000 | 0.1192850000 51 | 0.0455480000 | 0.1212650000 52 | 0.0488590000 | 0.1206530000 53 | 0.0480320000 | 0.1193290000 54 | 0.0464660000 | 0.1228550000 55 | 0.0470500000 | 0.1194870000 56 | 0.0465240000 | 0.1188190000 57 | 0.0485680000 | 0.1193820000 58 | 0.0458630000 | 0.1178250000 59 | 0.0494700000 | 0.1213760000 60 | 0.0460460000 | 0.1218320000 61 | 0.0489670000 | 0.1212340000 62 | 0.0462610000 | 0.1189640000 63 | 0.0495940000 | 0.1197950000 64 | 0.0482140000 | 0.1199730000

### wbhart commented Mar 1, 2015

Hi Kushagra,

It sounds like we are getting somewhere with this. Thanks for the hard work
on it.

I have been travelling the last few days and so now have a big overflow of
work in my inbox. It may take a few days for me to really look at this
properly. But I will get back to you on it soon, if Fredrik doesn't beat me
to it.

Bill.

On 27 February 2015 at 23:15, Kushagra Singh notifications@github.com
wrote:

Hi,

Sorry for the delayed follow up, I had exams from Monday till this
morning. I worked upon the suggestions given. I tried various combinations
of the bootstrap code along with a certain number of newton iterations. The
best speed came when I used bootstrap code to get an estimate, followed by
one newton iteration to get a closer value and then simply
increasing/decreasing it till the correct result.

## The answers are correct when compared to mpz_rootrem.

The code snippet which Fredrik sir commented above has better speeds
for root <= 6 compared to n_rootrem. (by a factor of 2) However it becomes
slower after that.

These are the times I recorded on my machine (looping n from 1 to 1 000
000 )
root| n_rootrem | mpz_rootrem

1 | 0.0158970000 | 0.0621330000
2 | 0.0232890000 | 1.1546490000
3 | 0.2007970000 | 1.2319320000
4 | 0.2121500000 | 1.0352970000
5 | 0.2018700000 | 0.8864130000
6 | 0.2341120000 | 0.8323570000
7 | 0.2186210000 | 0.6710600000
8 | 0.2282150000 | 0.7152760000
9 | 0.2234660000 | 0.5983430000
10 | 0.2449520000 | 0.4469660000
11 | 0.2389620000 | 0.4202950000
12 | 0.2555650000 | 0.4399070000
13 | 0.3040590000 | 0.4715030000
14 | 0.2857570000 | 0.4838650000
15 | 0.2597950000 | 0.4629330000
16 | 0.2469840000 | 0.4150980000
17 | 0.2121830000 | 0.3738700000
18 | 0.1592150000 | 0.3402840000
19 | 0.0541930000 | 0.2693090000
20 | 0.0473350000 | 0.1194350000
21 | 0.0484500000 | 0.1180460000
22 | 0.0459520000 | 0.1184810000
23 | 0.0491670000 | 0.1191340000
24 | 0.0463980000 | 0.1187060000
25 | 0.0479300000 | 0.1150740000
26 | 0.0455520000 | 0.1235550000
27 | 0.0476480000 | 0.1223900000
28 | 0.0476310000 | 0.1233700000
29 | 0.0472150000 | 0.1229540000
30 | 0.0463600000 | 0.1205380000
31 | 0.0467580000 | 0.1180830000
32 | 0.0484000000 | 0.1161420000
33 | 0.0471450000 | 0.1196790000
34 | 0.0458660000 | 0.1204910000
35 | 0.0445870000 | 0.1194120000
36 | 0.0502830000 | 0.1171860000
37 | 0.0488970000 | 0.1166440000
38 | 0.0500780000 | 0.1188190000
39 | 0.0469700000 | 0.1180660000
40 | 0.0509750000 | 0.1208670000
41 | 0.0492990000 | 0.1225280000
42 | 0.0482460000 | 0.1205570000
43 | 0.0450340000 | 0.1199850000
44 | 0.0455790000 | 0.1237130000
45 | 0.0471540000 | 0.1217970000
46 | 0.0535110000 | 0.1171140000
47 | 0.0492430000 | 0.1215960000
48 | 0.0493000000 | 0.1198310000
49 | 0.0498940000 | 0.1217440000
50 | 0.0472360000 | 0.1192850000
51 | 0.0455480000 | 0.1212650000
52 | 0.0488590000 | 0.1206530000
53 | 0.0480320000 | 0.1193290000
54 | 0.0464660000 | 0.1228550000
55 | 0.0470500000 | 0.1194870000
56 | 0.0465240000 | 0.1188190000
57 | 0.0485680000 | 0.1193820000
58 | 0.0458630000 | 0.1178250000
59 | 0.0494700000 | 0.1213760000
60 | 0.0460460000 | 0.1218320000
61 | 0.0489670000 | 0.1212340000
62 | 0.0462610000 | 0.1189640000
63 | 0.0495940000 | 0.1197950000
64 | 0.0482140000 | 0.1199730000

Reply to this email directly or view it on GitHub
#119 (comment).

added 6 commits Mar 3, 2015
``` Changed documentation ```
``` a3b6bc2 ```
``` Added cube root function in ulong_extras ```
``` 6c19ead ```
``` Added test for n_cbrt. ```
``` 2cd4cec ```
``` Added docimentation for n_cbrt. ```
``` 2e0b097 ```
``` Changed n_rootrem to get better intial estimates. ```
``` e55c763 ```
``` Added n_cbrt to the header file ulong_extras.h ```
``` 4b0c859 ```

### kush789 commented Mar 3, 2015

 If the n_rootrem function seems ok, should I go ahead and implement the same in the fmpz module? As you said we can call this function if n is small, otherwise make a call to gmp_rootrem().

### kush789 commented Mar 3, 2015

 I got these timings on my machine while looping n from 1 to 1 000 000 000 : pow(n, 1./3.) took 152.069542 seconds to execute n_cbrt(n) took 102.191376 seconds to execute

### fredrik-johansson commented Mar 3, 2015

 You should be testing and profiling input all the way up to 2^64-1. It's also useful to generate random input of the form x^n, x^n + 1, x^n - 1.
added 2 commits Mar 3, 2015
``` Stronger tests for n_rootrem and n_cbrt ```
``` 98d92d7 ```
``` Fixed minor bug in n_rootrem and n_cbrt. ```
``` e66a0f4 ```

### wbhart commented Mar 16, 2015

 Did you mean ulong_extras.h instead of ulongextras.h. Also you need to include ulong_extras.h in the files from which this function is called, not just in the implementation .c file for the function itself. Bill. On 16 March 2015 at 08:21, Kushagra Singh notifications@github.com wrote: Yes, the header file has FLINT_DLL double n_cbrt_estimate(double a); the file containing this function has both longlong.h and ulongextras.h . The weird part is that when I copy the same function into a C file and call it in main from that file, it return a correct value. I am totally confused about this. — Reply to this email directly or view it on GitHub #119 (comment).

### kush789 commented Mar 16, 2015

 Yes, sorry that was a typo.

### wbhart commented Mar 16, 2015

 If my amended comment is no help, can you commit what you have and add it to the pull request so I can take a look.

### wbhart commented Mar 16, 2015

 Also check you are including flint.h in cbrt_estimate.c.

### wbhart commented Mar 16, 2015

 Something you can also try, though it isn't a fix, is to put the prototype for n_cbrt_estimate in the files where it is called. If that works, then somehow, it isn't picking up the prototype in ulong_extras.h. I don't see how. But that's the only possible diagnosis I can think of.

### kush789 commented Mar 16, 2015

 Adding a prototype did work. Do you know why it might not be picking it up?

### wbhart commented Mar 16, 2015

 No idea unfortunately. Can you push your code and I'll try to figure it out.

### kush789 commented Mar 16, 2015

 I'll make the other changes and do it. On 16/03/2015 1:35 pm, "wbhart" notifications@github.com wrote: No idea unfortunately. Can you push your code and I'll try to figure it out. — Reply to this email directly or view it on GitHub #119 (comment).
``` Commit summary: ```
``` 8f3dc1c ```
```- Added seperate function n_root_estimate and cbrt_estimate.
- Removed duplicated estimate codes.
- Renamed files as per convention.```

### kush789 commented Mar 16, 2015

 I have pushed the code. I made one more observation. When called inside n_cbrt_newton_iteration() and in n_cbrt(), the estimate that n_cbrt_estimate() returns is correct. However surprisingly, If i call n_cbrt_estimate as a normal function in a main of a new file, the returned value is wrong. So although the cube root functions are not getting effected, n_cbrt_estimate() if called otherwise in main, will give wrong answers.

### wbhart commented Mar 16, 2015

 And the file in which you have your main function #includes ulong_extras.h and flint.h? And the location you are specifying for the includes when you compile it is the correct one (the flint source tree, not the installed location of flint, e.g. maybe you need to do make install?)

### kush789 commented Mar 16, 2015

 I kept all these things in mind while I was checking (header files and make install) On 16/03/2015 3:00 pm, "wbhart" notifications@github.com wrote: And the file in which you have your main function #includes ulong_extras.h and flint.h? And the location you are specifying for the includes when you compile it is the correct one (the flint source tree, not the installed location of flint, e.g. maybe you need to do make install?) — Reply to this email directly or view it on GitHub #119 (comment).

### wbhart commented Mar 16, 2015

 Can you email a copy of your external program that you are using to check this and I'll see if I can replicate it.

### wbhart commented Mar 16, 2015

 Well, I am personally ready to sign off on this code. Only Fredrik needs to agree and it can be merged. I'm pretty sure the issue with the prototype is a compilation issue. I don't think anything is wrong in the code you've written for flint itself. We can keep trying to figure it out. Try deleting every ulong_extras.h you can find on your system, except the one you think you are compiling against. Check that ulong_extras.h definitely has the right prototype in it. Recompile your external program and see if anything shakes loose.

### wbhart commented Mar 16, 2015

 My version of gcc gives a few compilation warnings, which are worth fixing: In file included from root_estimate.c:33:0: root_estimate.c: In function ‘n_root_estimate’: /home/wbhart/flint2/flint.h:95:15: warning: ISO C90 forbids mixed declarations and code [-Wpedantic] #define slong mp_limb_signed_t ^ root_estimate.c:114:5: note: in expansion of macro ‘slong’ slong s = ((1 << 10) - 1); ^ root.c: In function ‘n_root’: root_estimate.c:109:22: warning: unused variable ‘ret’ [-Wunused-variable] ulong i, hi, lo, ret; ^ root.c:109:5: warning: ISO C90 forbids mixed declarations and code [-Wpedantic] const mp_limb_t upper_limit = max_base[root]; /* n <= upper_limit^root */ ^ rootrem.c: In function ‘n_rootrem’: rootrem.c:117:5: warning: ISO C90 forbids mixed declarations and code [-Wpedantic] const mp_limb_t upper_limit = max_base[root]; /* n <= upper_limit^root */ ^ Some of the mixed declaration ones might be fun to fix. All the variables need to be declared at the top of the function in C90. You can't insert other statements between variable declarations.
``` Cleaned up code, changed code which gcc didn't like. ```
``` 7fe5e22 ```

### kush789 commented Mar 16, 2015

 I have changed the code which was giving errors. In some cases assignment of a constant before some code was not possible, for example rootrem.c: In function ‘n_rootrem’: rootrem.c:117:5: warning: ISO C90 forbids mixed declarations and code [-Wpedantic] const mp_limb_t upper_limit = max_base[root]; /* n <= upper_limit^root */ ^ as there was an upper bound on root, which was being checked earlier on. So I removed the const keyword. Make check looks cleaner now.

### wbhart commented Mar 16, 2015

 Thanks Kushagra. I'm completely happy with this code now. It's over to Fredrik now.

### fredrik-johansson commented Mar 16, 2015

 I should have time to review this comprehensively later in the week.

### kush789 commented Mar 29, 2015

 Fredrik sir, were you able to have a look at the commits recently?
added 2 commits Apr 4, 2015
``` No need to calculate constants. ```
``` 237e573 ```
``` Minor changes. ```
``` cce6b50 ```

### kush789 commented Apr 4, 2015

 I noticed that at one place we were calculating constants. I have changed that code as it is not necessary to do that.

### fredrik-johansson commented Apr 4, 2015

 Regarding the last commits, this is not necessary to do since the compiler can fold such constants. Also, you need to use UWORD() for the large literal. Why are s and uword_val slong and not ulong?

### fredrik-johansson commented Apr 4, 2015

 There is still a spelling error in the function name n_cbrt_chebyshef_approx
added 2 commits Apr 4, 2015
``` Fixed chebyshev spelling. ```
``` b393994 ```
``` Used UWORD while declaring large integers. ```
``` 426cbbf ```

### kush789 commented Apr 4, 2015

 There was no particular reason for s and uword_val to be slong, I have changed them. And the spelling of chebyshev as well.
added a commit that referenced this pull request May 4, 2015
``` Merge pull request #119 from kush789/trunk ```
``` 40c26d4 ```
`rootrem funtctions in ulong_extras module`
merged commit `40c26d4` into wbhart:trunk May 4, 2015
to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet