Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some question about rotmg #1452

Closed
wxcstc opened this issue Feb 8, 2018 · 42 comments · Fixed by #1454
Closed

some question about rotmg #1452

wxcstc opened this issue Feb 8, 2018 · 42 comments · Fixed by #1454

Comments

@wxcstc
Copy link

wxcstc commented Feb 8, 2018

Well,when i used rotmg function,the same input had different result between cublas and openblas.
input is as follows:
d1=5.9e-8
d2=5.960464e-8
x1=1.0
y1=150.0
param[5]={0.0, 0.0, 0.0, 0.0, 0.0}

And the output of openblas:
d1=0.999956
d2=0.989812
x1=0.036623
y1=150.0
param[5]={-1.0, 0.000002, -0.000244, 0.000244, 0.000002}

And the output of cublas:
d1=0.999956
d2=0.989812
x1=0.036623
y1=150.0
param[5]={-1.0, 0.000002, -0.000244, 1.0, 0.000002}

I referred some papers in order to know the implementation of the function.And I found that in some situation, this function will have a scaling operation.It seems that cublas does not have this operation? It is a bug of cublas?I think it's a bug of cublas, but i'm not sure.

@martin-frbg
Copy link
Collaborator

Ahem, did you notice the reply to your identical message in the nvidia cuda forum where the moderator asked you for a standalone test case to replicate the issue ?

@martin-frbg
Copy link
Collaborator

On the other hand, a testcase with your parameters linked against the netlib reference BLAS does give the same result as you got with cublas, so you may be onto something.

@martin-frbg
Copy link
Collaborator

The changed result comes from 025fc91 in May 2014 which must have been #365. There, the problem was repeated execution of a code segment based on the flag value set by its first invocation, now the added conditional to prevent this also seems to block a valid action in the first pass. Perhaps what is needed is a second flag, or perhaps I fail to understand the logic - this function was rewritten from old fortran-style code that jumped back and forth, neither the old nor the new version are easy to read.
Pinging @btracey as he suggested the fix that seems to cause problems now.

@martin-frbg
Copy link
Collaborator

I further notice that ATLAS - which was given as reference in #365 - seems to avoid setting any flags within while() loops, and indeed the scaling there is done in dedicated inner loops as needed, rather than looping over the entire logic...

@brada4
Copy link
Contributor

brada4 commented Feb 8, 2018

Sorry, but are you using SROTMG or DROTMG? With former the inputs look marginal towards precision and may overflow something. There is trivial geometry behind this function. can you try netlib BLAS Like one built by your linux distro for ultimate truth?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 8, 2018

@brada4 drotmg, I already compared with netlib above (which agrees with cublas but not openblas). The supposed "fix" from 365 that broke it looks more like a bad workaround for a fundamental problem: IMHO rotmg should have "if" where it has "while" and do the while loop for scaling inside that "if" after reacting to (and resetting) dflag. What it does now is potentially flip dflag around with every iteration of the rescaling, which simply does not make sense.
I will do a PR tomorrow or over the weekend as time permits.

@brada4
Copy link
Contributor

brada4 commented Feb 8, 2018

Yes, it looks like manual f2c from reference function...

@wxcstc
Copy link
Author

wxcstc commented Feb 10, 2018

@martin-frbg @brada4 I don't think it's the reason of dflag in while loop. In my understanding, if have multiple cycles,in the first cycle,dflag and other variables will be changed. And in next cycle,dflag has been -1,it will not go into the branch which to set dflag and other variables. So,using if or while to set dflag and other variables dosen't affect the final result. How do you think?

@martin-frbg
Copy link
Collaborator

The code for in next cycle, dflag has been -1, it will not go into the branch was added in #365 to prevent this unwanted side effect of cycling. The problem with it is that in your case, dflag is -1 from the start and the added code completely prevents it from changing the variables accordingly. My change separates the scaling cycle from the dflag setting code.

@wxcstc
Copy link
Author

wxcstc commented Feb 11, 2018

ok,I have already understood how this result occurred. Now I have another confusion,why should judge dflag two times?Can I extract the two judgments and put them in front of if(*dd1 != ZERO)?I am not very clear about the principle.With reference to this paper "basic linear algebra subprograms for fortran usage" http://www.cs.utexas.edu/users/kincaid/blas.pdf, I can understand most of the code.But still some do not understand. I did not find the statement to support the operation of judging the flag. Maybe my english is poor 😔. Can you answer my doubts?Thank you very much~😀

@kortschak
Copy link
Contributor

I don't think this is completely fixed.

In attempting to fix our implementation I find that the whole situation is SNAFU; to get my head around it, I have done a literal translation of the FORTRAN77 code here rendered as this Go code. This fails to agree with my translation of the current NETLIB implementation here rendered as this Go code under our test suite:

--- FAIL: TestDrotmg (0.00s)
	level1double.go:2007: drotmg/drotmg_f77 H mismatch RD1_Big_RD2_Big_Flag_0: expected -1 [4096 -3584 1792 4096], found -1 [4096 -4096 1 4096]
	level1double.go:2007: drotmg/drotmg_f77 H mismatch RD1_Big_RD2_Big_Flag_1: expected -1 [2340.5714285714284 -4096 4096 4681.142857142857], found -1 [2340.5714285714284 -4096 1 4681.142857142857]
	level1double.go:2007: drotmg/drotmg_f77 H mismatch D1_Big_D2_Small_Flag_1: expected -1 [2.8671999999999997e-26 -0.000244140625 4096 2.44140625e-16], found -1 [2.8671999999999997e-26 -0.000244140625 1 2.44140625e-16]
	level1double.go:2007: drotmg/drotmg_f77 H mismatch D1_Small_D2_Big_Flag_1: expected -1 [2.3731773997569866e+10 -1.6777216e+07 0.000244140625 1.6777216e-07], found -1 [2.3731773997569866e+10 -4096 1 1.6777216e-07]
	level1double.go:2007: drotmg/drotmg_f77 H mismatch OpenBLAS#1452: expected -1 [1.6110934624105326e-06 -0.000244140625 0.000244140625 1.6276041666666668e-06], found -1 [1.6110934624105326e-06 -0.000244140625 1 1.6276041666666668e-06]

This makes the whole situation fairly confusing, given that neither of these actually satisfy the documentation's claims AFAICS (though the documentation is unclear and seem to be incorrect - vis the reference to DY2 which does not exists - though using DY1 does not help).

The situation is worse with the OpenBLAS implementation used as a CGO back-end for Go tests (some of these appear to be simple scaling differences, but others not):

--- FAIL: TestDrotmg (0.00s)
	level1double.go:1994: drotmg/drotmg_f77 rd1 mismatch RD1_Big_RD2_Big_Flag_0: expected 68.96627824858757, found 1.1570621468926554e+09
	level1double.go:1997: drotmg/drotmg_f77 rd2 mismatch RD1_Big_RD2_Big_Flag_0: expected 34.483139124293785, found 5.785310734463277e+08
	level1double.go:2000: drotmg/drotmg_f77 rx1 mismatch RD1_Big_RD2_Big_Flag_0: expected 45312, found 11.0625
	level1double.go:2007: drotmg/drotmg_f77 H mismatch RD1_Big_RD2_Big_Flag_0: expected -1 [4096 -3584 1792 4096], found -1 [1 -1 1 1]
	level1double.go:1994: drotmg/drotmg_f77 rd1 mismatch RD1_Big_RD2_Big_Flag_1: expected 57.6914092640818, found 9.679012345679014e+08
	level1double.go:1997: drotmg/drotmg_f77 rd2 mismatch RD1_Big_RD2_Big_Flag_1: expected 28.8457046320409, found 4.839506172839507e+08
	level1double.go:2000: drotmg/drotmg_f77 rx1 mismatch RD1_Big_RD2_Big_Flag_1: expected 47396.57142857142, found 11.57142857142857
	level1double.go:2007: drotmg/drotmg_f77 H mismatch RD1_Big_RD2_Big_Flag_1: expected -1 [2340.5714285714284 -4096 4096 4681.142857142857], found -1 [0.5714285714285714 -1 1 1.1428571428571428]
	level1double.go:1994: drotmg/drotmg_f77 rd1 mismatch RD1_Big_RD2_Med_Flag_0: expected 1.1920927762985347, found 1.9999998000000197e+07
	level1double.go:2000: drotmg/drotmg_f77 rx1 mismatch RD1_Big_RD2_Med_Flag_0: expected 32768.0032768, found 8.0000008
	level1double.go:2007: drotmg/drotmg_f77 H mismatch RD1_Big_RD2_Med_Flag_0: expected -1 [4096 -1 0.0004096 1], found -1 [1 -1 1e-07 1]
	level1double.go:1994: drotmg/drotmg_f77 rd1 mismatch RD1_Big_RD2_Med_Flag_1: expected 1192.0928955078125, found 2e+10
	level1double.go:2000: drotmg/drotmg_f77 rx1 mismatch RD1_Big_RD2_Med_Flag_1: expected 3.2768e+14, found 8e+10
	level1double.go:2007: drotmg/drotmg_f77 H mismatch RD1_Big_RD2_Med_Flag_1: expected -1 [4.096e-17 -1 4096 1e-10], found -1 [1e-20 -1 1 1e-10]
	level1double.go:1994: drotmg/drotmg_f77 rd1 mismatch D1_Big_D2_Small_Flag_1: expected 119.20928955078125, found 2e+09
	level1double.go:2000: drotmg/drotmg_f77 rx1 mismatch D1_Big_D2_Small_Flag_1: expected 3.2768e+10, found 8e+06
	level1double.go:2007: drotmg/drotmg_f77 H mismatch D1_Big_D2_Small_Flag_1: expected -1 [2.8671999999999997e-26 -0.000244140625 4096 2.44140625e-16], found -1 [6.999999999999999e-30 -0.000244140625 1 2.44140625e-16]
	level1double.go:1997: drotmg/drotmg_f77 rd2 mismatch RD1_Med_RD2_Big_Flag_0: expected 1191.9736981379988, found 1.9998000199980003e+10
	level1double.go:2007: drotmg/drotmg_f77 H mismatch RD1_Med_RD2_Big_Flag_0: expected -1 [1 -0.0004096 1000 4096], found -1 [1 -1e-07 1000 1]
	level1double.go:1997: drotmg/drotmg_f77 rd2 mismatch D1_Med_D2_Big_Flag_1: expected 1192.092835903171, found 1.9999999000000053e+10
	level1double.go:2007: drotmg/drotmg_f77 H mismatch D1_Med_D2_Big_Flag_1: expected -1 [50 -4096 1 4.096e-06], found -1 [50 -1 1 1e-09]
	level1double.go:1997: drotmg/drotmg_f77 rd2 mismatch D1_Small_D2_Big_Flag_1: expected 216.1836123957717, found 6.085027726432532e+16
	level1double.go:2007: drotmg/drotmg_f77 H mismatch D1_Small_D2_Big_Flag_1: expected -1 [2.3731773997569866e+10 -1.6777216e+07 0.000244140625 1.6777216e-07], found -1 [2.3731773997569866e+10 -1 1 1e-14]
	level1double.go:2007: drotmg/drotmg_f77 H mismatch OpenBLAS#1452: expected -1 [1.6110934624105326e-06 -0.000244140625 0.000244140625 1.6276041666666668e-06], found -1 [1.6110934624105326e-06 -0.000244140625 1 1.6276041666666668e-06]

Note that in all cases the mismatched H is in the DFLAG=-1 state, so all elements must satisfy the check.

@martin-frbg martin-frbg reopened this Feb 17, 2018
@martin-frbg
Copy link
Collaborator

Reopening, but for me personally a C testcase would be far preferable to any transliteration into Go. Actually I was fairly confident that my simple fix was consistent with what ATLAS uses, but I may certainly have missed something.

@kortschak
Copy link
Contributor

What troubles me is that the two Fortran implementations disagree, and that their documentation is incorrect in at least two places, making it difficult to define a proper set of tests.

If/when I get it sorted out in Go, I'm happy to send a C rendition.

@btracey
Copy link
Contributor

btracey commented Feb 17, 2018

Sorry, I'm on travel at the moment, but ATLAS had a different open bug (in my opinion) in this function. I'll find the report and check if it's still present when I get a chance

@MigMuc
Copy link

MigMuc commented Feb 18, 2018

The solution is just to replace the if-statement in line 139 with the original while-statement from before.
The real issue was that in the Fortran reference function there never was a check for dflag == ONE.
I checked the input arguments given above with the fortran reference function, which give the same result as the cublas version. And the current openblas version setting the if to while.

@kortschak
Copy link
Contributor

As another data point, the OpenBLAS/netlib-lapack/BLAS/TESTING fails for DROTMG at 0391c07:

 Real BLAS Test Program Results


 Test of subprogram number  1             DDOT 
                                    ----- PASS -----

 Test of subprogram number  2            DAXPY 
                                    ----- PASS -----

 Test of subprogram number  3            DROTG 
                                    ----- PASS -----

 Test of subprogram number  4             DROT 
                                    ----- PASS -----

 Test of subprogram number  5            DCOPY 
                                    ----- PASS -----

 Test of subprogram number  6            DSWAP 
                                    ----- PASS -----

 Test of subprogram number  7            DNRM2 
                                    ----- PASS -----

 Test of subprogram number  8            DASUM 
                                    ----- PASS -----

 Test of subprogram number  9            DSCAL 
                                    ----- PASS -----

 Test of subprogram number 10            IDAMAX
                                    ----- PASS -----

 Test of subprogram number 11            DROTMG
                                       FAIL

 CASE  N INCX INCY  I                             COMP(I)                             TRUE(I)  DIFFERENCE     SIZE(I)

   11  6 9999 9999  1                      0.26666667D+11                      0.15894572D+04  0.2667D+11  0.1589D+04
   11  6 9999 9999  3                      0.15000000D-04                      0.61440000D-01 -0.6143D-01  0.6144D-01
   11  6 9999 9999  6                      0.10000000D+01                      0.40960000D+04 -0.4095D+04  0.4096D+04
   11  6 9999 9999  8                      0.50000000D-06                      0.20480000D-02 -0.2047D-02  0.2048D-02
   11  8 9999 9999  2                      0.13333333D+11                      0.79472860D+03  0.1333D+11  0.7947D+03
   11  8 9999 9999  7                     -0.10000000D+01                     -0.40960000D+04  0.4095D+04 -0.4096D+04
   11  8 9999 9999  9                      0.10000000D-05                      0.40960000D-02 -0.4095D-02  0.4096D-02

 Test of subprogram number 12            DROTM 
                                    ----- PASS -----

 Test of subprogram number 13            DSDOT 
                                    ----- PASS -----

@martin-frbg
Copy link
Collaborator

Seems to me the actual problem with my PR was that I copypasted the wrong conditional for rescaling dd1,dd2 against GAMSQ, so that while loop would never be invoked. With the new fix, both the original test case (to be added to the utests once I get those to build on all platforms again) and the netlib test passes.

@kortschak
Copy link
Contributor

Thank you for pushing that fix. Our implementation now disagrees in only one of our test cases by a value that means it should be straightforward to track down.

@martin-frbg
Copy link
Collaborator

I only pushed that as it was certain to be better than what was there before. It also passes the rotmg test from ATLAS, so if your test case is small enough it would perhaps make sense to keep it available for regression testing once this is solved for good.

@kortschak
Copy link
Contributor

Yes, I will ping this when I understand what is going on with that case. Thanks.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 18, 2018

WRT the possible bug in ATLAS that btracey mentioned, the only entry I can find is
https://sourceforge.net/p/math-atlas/support-requests/932/ from 2014 that had no clear solution (though somewhat later an erratum for 3.10.2 was published, noting "Error in ROTMG causes failures in modern lapack tests" without any details). So far I have only checked that current develop passes the first of the four tests included with that ticket.

@kortschak
Copy link
Contributor

kortschak commented Feb 18, 2018

OK, I have where the difference is between our implementations.

In the scaling checks in NETLIB 3.8.0 (and the Gonum code) the form is

if d != 0:
    while d not in range:
        update flag and H details
        if too low:
            scale up
        else:
            scale down

as seen here

 *     PROCEDURE..SCALE-CHECK
          IF (dd1.NE.zero) THEN
             DO WHILE ((dd1.LE.rgamsq) .OR. (dd1.GE.gamsq))
                IF (dflag.EQ.zero) THEN
                   dh11 = one
                   dh22 = one
                   dflag = -one
                ELSE
                   dh21 = -one
                   dh12 = one
                   dflag = -one
                END IF
                IF (dd1.LE.rgamsq) THEN
                   dd1 = dd1*gam**2
                   dx1 = dx1/gam
                   dh11 = dh11/gam
                   dh12 = dh12/gam
                ELSE
                   dd1 = dd1/gam**2
                   dx1 = dx1*gam
                   dh11 = dh11*gam
                   dh12 = dh12*gam
                END IF
             ENDDO
          END IF
 
          IF (dd2.NE.zero) THEN
             DO WHILE ( (dabs(dd2).LE.rgamsq) .OR. (dabs(dd2).GE.gamsq) )
                IF (dflag.EQ.zero) THEN
                   dh11 = one
                   dh22 = one
                   dflag = -one
                ELSE
                   dh21 = -one
                   dh12 = one
                   dflag = -one
                END IF
                IF (dabs(dd2).LE.rgamsq) THEN
                   dd2 = dd2*gam**2
                   dh21 = dh21/gam
                   dh22 = dh22/gam
                ELSE
                   dd2 = dd2/gam**2
                   dh21 = dh21*gam
                   dh22 = dh22*gam
                END IF
             END DO
          END IF

The OpenBLAS code does this instead:

if d != 0:
    if d not in range:
        update flag and H details
        if too low:
            while too low:
                scale up
        else:
            while too high:
                scale down

The difference stems from the fact that the NETLIB code resets H elements to unit values on each iteration of the outer loop, while this does not happen in the OpenBLAS code. Which is correct? Since the splitting of flag/H updates and scaling is a recent addition in OpenBLAS, I suspect the NETLIB way is the right approach (but have no other basis for this claim - and the NETLIB operations seem intuitively odd).

The test case we have that shows this difference is (with our expectations):

        {
                Name: "D1_Small_D2_Big_Flag_1",
                P: &blas.DrotmParams{
                        Flag: blas.Rescaling,
                        H:    [4]float64{2.3731773997569866e+10, -4096, 1, 1.6777216e-07},
                },
                D1:  120000000000000000,
                D2:  0.000000000012345,
                X1:  0.08,
                Y1:  8000000000000,
                Rd1: 0.00010502490698765249,
                Rd2: 216.1836123957717,
                Rx1: 3.8516669198055897e+09,
        },

@kortschak
Copy link
Contributor

kortschak commented Feb 19, 2018

Having now put in a drotm check to our tests, I'm pretty confident that the OpenBLAS/ATLAS approach (which as said above looks intuitively more likely) is correct (cf. the NETLIB code). There is still a failing case from our test suite, but I will look into that further (drotm fails to zero out the y1):

        {
                Name: "RD1_Big_RD2_Big_Flag_0",
                P: &blas.DrotmParams{
                        Flag: blas.Rescaling,
                        H:    [4]float64{4096, -4096, 1, 4096},
                },
                D1:  1600000000,
                D2:  800000000,
                X1:  8,
                Y1:  7,
                Rd1: 68.96627824858757,
                Rd2: 34.483139124293785,
                Rx1: 45312,
        },

@wxcstc
Copy link
Author

wxcstc commented Feb 23, 2018

So,in rescale situation when flag is -1,dh21 set -1,dh12 set 1,I just think it‘s unreasonable.And after recent commit,i think it still has difference between openblas and netlib.If d1 is small enough,need two times scale,result will be different.I'm not sure about this,I don't construct the data for this situation,there are some difficulties.

@kortschak
Copy link
Contributor

kortschak commented Feb 28, 2018

We have fixed out Go implementation by porting the code in the appendix of this paper. We're testing against the documented behaviour rather than only a set of golden values, so we a re confident that it is correct. Using OpenBLAS as a backend gives a number of failures against our golden values, but more significantly fails to provide values that zero out y1 for the following input to ROTMG: D1=1600000000, D2=800000000, X1=8, Y1=7.

level1double.go:2119: drotm y_0 mismatch RD1_Big_RD2_Big_Flag_0: expected 0, found -24052.672896090524

(the drotm test here checks that the rotation in H returned by ROTMG zeros out the second element for [x1;y1]).

@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 28, 2018

I think I am getting confused, but from the use of D1, D2 this is still ROTMG in terms of BLAS function naming rather than ROTM ? The OpenBLAS implementation in interface/rotmg.c does not attempt to
modify y1 anywhere, and neither does its FORTRAN counterpart in netlib (which has DY1 as "param[in]" unlike the DX1 that is "param[in,out]" in the explanation of the arguments at the top of the function). But as I said I am getting confused...

@vladimir-ch
Copy link
Contributor

The issue is in ROTMG, we use ROTM to check that the returned values actually zero out y1. I think the failure is due to a bug that was outlined by @wxcstc in #1452 (comment)

@kortschak
Copy link
Contributor

kortschak commented Feb 28, 2018 via email

@martin-frbg
Copy link
Collaborator

Thanks for the clarification. So your conclusion seems to be that netlib ROTMG is similarly broken, as the two appear to produce identical results in my tests with the inputs you posted previously ? (Not that this would be impossible, just inconvenient)
To save myself a likely headache, could you post your expected results from ROTMG, i.e. those that would "actually zero out y1" in a subsequent ROTM ?

@vladimir-ch
Copy link
Contributor

So your conclusion seems to be that netlib ROTMG is similarly broken ...

Yes, that's exactly our conclusion.

Our expected results from ROTMG for the inputs above:

flag = -1 (rescaled)
H = {4096, -3584, 1792, 4096}
d1 = 68.96627824858757
d2 = 34.483139124293785
x1 = 45312

@martin-frbg
Copy link
Collaborator

So if I understand correctly (from gonum/gonum@58e39c9) the salient difference is that you move the setting of dh11,dh22 to 1,1 (or dh12,dh22 to -1,1) from the "rescaling" part of the code (where they did look out of place, in particular given the original confusion over whether the flag should change in the while loop) to the preceding "calculating" block (?)

@vladimir-ch
Copy link
Contributor

Yes, correct. The setting of elements of H in the "rescaling" part could destroy the rotation and scaling computed previously.

Eventually we would also like to clarify whether gam could be increased and to what value. In my understanding from what I read in the papers, the value of 4096 was chosen very conservatively when BLAS Level 1 was created, that is before IEEE floating point. A larger value of gam would reduce the necessity for rescaling (flag = -1) which is good because having units is H saves operations in DROTM which is the whole point of all this. On the other hand, it would change the values returned by DROTMG which apparently (almost?) nobody uses anyway, so it's not clear whether it would be worth it.

@martin-frbg
Copy link
Collaborator

OpenBLAS could provide a DROTMG2 or DROTMGG or whatever BLAS extension that is DROTMG with a non-standard gam I guess. (Not sure how much of an audience this could reach compared to netlib et al. though)

@kortschak
Copy link
Contributor

The NETLIB reference tests pass without modification, so when we file an issue to correct the problem with their ROTMG, we can raise the issue of changing the gam constants. We were planning on filing the issue with NETLIB after the code is corrected here.

@martin-frbg
Copy link
Collaborator

I was about to prepare a PR, but found that my transcription of your changes makes it fail the test from the very first message again. I expect this still works for you, and I just made a mistake somewhere.

@kortschak
Copy link
Contributor

I did need to change to H_{1,2} from 1 in the OP to 1/4096. I am not particularly phased by this as the H returned by cublas will not give the correct ROTM returns.

My golden values are here. I guess I have been somewhat flexible with the truth in the comment in that section, but those are the times we live in.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Mar 2, 2018

Hmm. I still get errors even with that change, unless I also force dflag from ONE to -ONE when setting dh21=-ONE and dh12=ONE (which sort of makes sense, but makes the "X1=X2 D1=D2" test case return with flag set to rescaled and the H all messed up (1,-1,1,1). (And netlib blas tests go wrong as well).
I seem to be missing some other detail from your change, but lack the time to track it down right now.
So if anybody can beat me to it, please do.

@kortschak
Copy link
Contributor

Have you directly transcribed my change or just applied aspects of it? Is there a branch I can look at?

@martin-frbg
Copy link
Collaborator

Sorted now I believe (guess I only needed some sleep).

@wxcstc
Copy link
Author

wxcstc commented Mar 18, 2018

So,in my understanding,compared to original code of openblas,new merge add a new situation that x1 is 0.The operation about flag in original openblas is right.And netlib and cublas is wrong,is it? @kortschak @martin-frbg

@martin-frbg
Copy link
Collaborator

There has not been a reaction from the netlib team as far as I know, but the issue indeed seems to be that tne original netlib implementation (that both OpenBLAS and cublas followed faithfully) does not work for some inputs. So the original openblas code was wrong in its handling of the flag, but all the netlib-based implementations are also wrong in their handling of the specific testcase from gonum.

@wxcstc
Copy link
Author

wxcstc commented Mar 19, 2018

ok,I get it,thx~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants