-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some question about rotmg #1452
Comments
Ahem, did you notice the reply to your identical message in the nvidia cuda forum where the moderator asked you for a standalone test case to replicate the issue ? |
On the other hand, a testcase with your parameters linked against the netlib reference BLAS does give the same result as you got with cublas, so you may be onto something. |
The changed result comes from 025fc91 in May 2014 which must have been #365. There, the problem was repeated execution of a code segment based on the flag value set by its first invocation, now the added conditional to prevent this also seems to block a valid action in the first pass. Perhaps what is needed is a second flag, or perhaps I fail to understand the logic - this function was rewritten from old fortran-style code that jumped back and forth, neither the old nor the new version are easy to read. |
I further notice that ATLAS - which was given as reference in #365 - seems to avoid setting any flags within while() loops, and indeed the scaling there is done in dedicated inner loops as needed, rather than looping over the entire logic... |
Sorry, but are you using SROTMG or DROTMG? With former the inputs look marginal towards precision and may overflow something. There is trivial geometry behind this function. can you try netlib BLAS Like one built by your linux distro for ultimate truth? |
@brada4 drotmg, I already compared with netlib above (which agrees with cublas but not openblas). The supposed "fix" from 365 that broke it looks more like a bad workaround for a fundamental problem: IMHO rotmg should have "if" where it has "while" and do the while loop for scaling inside that "if" after reacting to (and resetting) dflag. What it does now is potentially flip dflag around with every iteration of the rescaling, which simply does not make sense. |
Yes, it looks like manual f2c from reference function... |
@martin-frbg @brada4 I don't think it's the reason of dflag in while loop. In my understanding, if have multiple cycles,in the first cycle,dflag and other variables will be changed. And in next cycle,dflag has been -1,it will not go into the branch which to set dflag and other variables. So,using if or while to set dflag and other variables dosen't affect the final result. How do you think? |
The code for |
ok,I have already understood how this result occurred. Now I have another confusion,why should judge dflag two times?Can I extract the two judgments and put them in front of |
I don't think this is completely fixed. In attempting to fix our implementation I find that the whole situation is SNAFU; to get my head around it, I have done a literal translation of the FORTRAN77 code here rendered as this Go code. This fails to agree with my translation of the current NETLIB implementation here rendered as this Go code under our test suite:
This makes the whole situation fairly confusing, given that neither of these actually satisfy the documentation's claims AFAICS (though the documentation is unclear and seem to be incorrect - vis the reference to The situation is worse with the OpenBLAS implementation used as a CGO back-end for Go tests (some of these appear to be simple scaling differences, but others not):
Note that in all cases the mismatched H is in the DFLAG=-1 state, so all elements must satisfy the check. |
Reopening, but for me personally a C testcase would be far preferable to any transliteration into Go. Actually I was fairly confident that my simple fix was consistent with what ATLAS uses, but I may certainly have missed something. |
What troubles me is that the two Fortran implementations disagree, and that their documentation is incorrect in at least two places, making it difficult to define a proper set of tests. If/when I get it sorted out in Go, I'm happy to send a C rendition. |
Sorry, I'm on travel at the moment, but ATLAS had a different open bug (in my opinion) in this function. I'll find the report and check if it's still present when I get a chance |
The solution is just to replace the if-statement in line 139 with the original while-statement from before. |
As another data point, the OpenBLAS/netlib-lapack/BLAS/TESTING fails for DROTMG at 0391c07:
|
Seems to me the actual problem with my PR was that I copypasted the wrong conditional for rescaling dd1,dd2 against GAMSQ, so that while loop would never be invoked. With the new fix, both the original test case (to be added to the utests once I get those to build on all platforms again) and the netlib test passes. |
Thank you for pushing that fix. Our implementation now disagrees in only one of our test cases by a value that means it should be straightforward to track down. |
I only pushed that as it was certain to be better than what was there before. It also passes the rotmg test from ATLAS, so if your test case is small enough it would perhaps make sense to keep it available for regression testing once this is solved for good. |
Yes, I will ping this when I understand what is going on with that case. Thanks. |
WRT the possible bug in ATLAS that btracey mentioned, the only entry I can find is |
OK, I have where the difference is between our implementations. In the scaling checks in NETLIB 3.8.0 (and the Gonum code) the form is
as seen here
The OpenBLAS code does this instead:
The difference stems from the fact that the NETLIB code resets H elements to unit values on each iteration of the outer loop, while this does not happen in the OpenBLAS code. Which is correct? Since the splitting of flag/H updates and scaling is a recent addition in OpenBLAS, I suspect the NETLIB way is the right approach (but have no other basis for this claim - and the NETLIB operations seem intuitively odd). The test case we have that shows this difference is (with our expectations):
|
Having now put in a
|
So,in rescale situation when flag is -1,dh21 set -1,dh12 set 1,I just think it‘s unreasonable.And after recent commit,i think it still has difference between openblas and netlib.If d1 is small enough,need two times scale,result will be different.I'm not sure about this,I don't construct the data for this situation,there are some difficulties. |
We have fixed out Go implementation by porting the code in the appendix of this paper. We're testing against the documented behaviour rather than only a set of golden values, so we a re confident that it is correct. Using OpenBLAS as a backend gives a number of failures against our golden values, but more significantly fails to provide values that zero out
(the drotm test here checks that the rotation in H returned by ROTMG zeros out the second element for [x1;y1]). |
I think I am getting confused, but from the use of D1, D2 this is still ROTMG in terms of BLAS function naming rather than ROTM ? The OpenBLAS implementation in interface/rotmg.c does not attempt to |
The issue is in ROTMG, we use ROTM to check that the returned values actually zero out |
Sorry, I did not explain myself adequately, but @vladimir-ch has fixed that deficit.
|
Thanks for the clarification. So your conclusion seems to be that netlib ROTMG is similarly broken, as the two appear to produce identical results in my tests with the inputs you posted previously ? (Not that this would be impossible, just inconvenient) |
Yes, that's exactly our conclusion. Our expected results from ROTMG for the inputs above:
|
So if I understand correctly (from gonum/gonum@58e39c9) the salient difference is that you move the setting of dh11,dh22 to 1,1 (or dh12,dh22 to -1,1) from the "rescaling" part of the code (where they did look out of place, in particular given the original confusion over whether the flag should change in the while loop) to the preceding "calculating" block (?) |
Yes, correct. The setting of elements of H in the "rescaling" part could destroy the rotation and scaling computed previously. Eventually we would also like to clarify whether |
OpenBLAS could provide a DROTMG2 or DROTMGG or whatever BLAS extension that is DROTMG with a non-standard gam I guess. (Not sure how much of an audience this could reach compared to netlib et al. though) |
The NETLIB reference tests pass without modification, so when we file an issue to correct the problem with their ROTMG, we can raise the issue of changing the gam constants. We were planning on filing the issue with NETLIB after the code is corrected here. |
I was about to prepare a PR, but found that my transcription of your changes makes it fail the test from the very first message again. I expect this still works for you, and I just made a mistake somewhere. |
I did need to change to H_{1,2} from 1 in the OP to 1/4096. I am not particularly phased by this as the H returned by cublas will not give the correct ROTM returns. My golden values are here. I guess I have been somewhat flexible with the truth in the comment in that section, but those are the times we live in. |
Hmm. I still get errors even with that change, unless I also force dflag from ONE to -ONE when setting dh21=-ONE and dh12=ONE (which sort of makes sense, but makes the "X1=X2 D1=D2" test case return with flag set to rescaled and the H all messed up (1,-1,1,1). (And netlib blas tests go wrong as well). |
Have you directly transcribed my change or just applied aspects of it? Is there a branch I can look at? |
Sorted now I believe (guess I only needed some sleep). |
So,in my understanding,compared to original code of openblas,new merge add a new situation that x1 is 0.The operation about flag in original openblas is right.And netlib and cublas is wrong,is it? @kortschak @martin-frbg |
There has not been a reaction from the netlib team as far as I know, but the issue indeed seems to be that tne original netlib implementation (that both OpenBLAS and cublas followed faithfully) does not work for some inputs. So the original openblas code was wrong in its handling of the flag, but all the netlib-based implementations are also wrong in their handling of the specific testcase from gonum. |
ok,I get it,thx~~ |
Well,when i used rotmg function,the same input had different result between cublas and openblas.
input is as follows:
d1=5.9e-8
d2=5.960464e-8
x1=1.0
y1=150.0
param[5]={0.0, 0.0, 0.0, 0.0, 0.0}
And the output of openblas:
d1=0.999956
d2=0.989812
x1=0.036623
y1=150.0
param[5]={-1.0, 0.000002, -0.000244, 0.000244, 0.000002}
And the output of cublas:
d1=0.999956
d2=0.989812
x1=0.036623
y1=150.0
param[5]={-1.0, 0.000002, -0.000244, 1.0, 0.000002}
I referred some papers in order to know the implementation of the function.And I found that in some situation, this function will have a scaling operation.It seems that cublas does not have this operation? It is a bug of cublas?I think it's a bug of cublas, but i'm not sure.
The text was updated successfully, but these errors were encountered: