New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix combiner independence #720
Fix combiner independence #720
Conversation
hw/xbox/nv2a/psh.c
Outdated
} else { | ||
mstring_unref(ab_dest); | ||
mstring_ref(ab_mapping); | ||
ab_dest = ab_mapping; | ||
} | ||
|
||
if (mstring_get_length(cd_dest)) { | ||
mstring_append_fmt(ps->code, "%s.%s = clamp(%s(%s), -1.0, 1.0);\n", | ||
mstring_get_str(cd_dest), write_mask, caster, mstring_get_str(cd_mapping)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this changed to cd.%s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because he introduced two intermediate storage variables, specifically to fix this issue with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because he introduced two intermediate storage variables, specifically to fix this issue with
Yeah, the fix is indeed to use intermediate vars so an output that sets a variable dependent on another variable in the output is not then dependent on the ordering they were specified as outputs (matching HW behavior).
E.g., if the output of the last stage had R0 set to (1.0, 0.0, 1.0) and the output of this stage is defined as R1 = R0 * T0
and R0 = T1 * C0
the final output should be: R1 = (1.0, 0.0, 1.0) * T0
, R0 = T1 * C0
whereas the current xemu code makes the value of R1 depend on whether it was calculated in AB (it'll be correct) or CD (it'll be wrong, because R0 gets updated before R1 is calculated and it erroneously uses the new value).
hw/xbox/nv2a/psh.c
Outdated
} | ||
if (!is_alpha && output.flags & PS_COMBINEROUTPUT_CD_BLUE_TO_ALPHA) { | ||
mstring_append_fmt(ps->code, "%s.a = %s.b;\n", | ||
mstring_get_str(cd_dest), mstring_get_str(cd_dest)); | ||
mstring_append_fmt(ps->code, "cd.a = cd.b;\n"); | ||
} | ||
|
||
MString *sum; | ||
if (output.muxsum_op == PS_COMBINEROUTPUT_AB_CD_SUM) { | ||
sum = mstring_from_fmt("(%s + %s)", mstring_get_str(ab), mstring_get_str(cd)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't all 3 mux/sum paths read from the new ab
and cd
variables now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'ab' and 'cd' in the C code actually contain the raw calculations (see ~line 419). They are only conditionally assigned to the glsl variables based on whether the results are referenced as part of the combiner output, so it would be unsafe to blindly use them here without adding code to blindly do the calculation and assignment to intermediates above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, this currently results in duplicate expressions, which puts additional burden on the shader compiler, which could be avoided and reduced into a single assignment, when the ab and cd variables are used in these mux/sum paths. Indeed, that does require assignment to ab and/or cd, but that could also be evaluated upfront similar to the new assign_* booleans.
Such "require_*" booleans could also help fixing the potential "read without assignment" issue mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree with you, I think Matt's preference in general has been to put in the basic fix, then optimize, so I'm inclined to do cleanup in a second pass since I don't think this PR changes the duplication in the case where the AB/CD results are assigned along with the muxsum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xemu isn't big on describing why code is written like it is, but this change is a good example of something that a future reader might see and think of "let's optimise this into a single assignment statement", overlooking the fact this specific construct is here to avoid sequential variable access (while hardware turns out to calculate in parallel). So IMHO even a single comment line on that would help tremendously here
The pull request description should also mention that it includes handling of the |
It doesn't intentionally. This PR is just dependent on several other of my PRs that need to be merged first. As far as I'm aware the xemu convention is one-fix-one-commit so any PR with multiple commits is generally squashed pre-review. |
I don't disagree, but @mborgerson has asked me to remove comments a number of times so I'm defaulting to letting him ask for explanation where necessary. |
hw/xbox/nv2a/psh.c
Outdated
} else { | ||
mstring_unref(cd_dest); | ||
mstring_ref(cd_mapping); | ||
cd_dest = cd_mapping; | ||
} | ||
|
||
if (!is_alpha && output.flags & PS_COMBINEROUTPUT_AB_BLUE_TO_ALPHA) { | ||
mstring_append_fmt(ps->code, "%s.a = %s.b;\n", | ||
mstring_get_str(ab_dest), mstring_get_str(ab_dest)); | ||
mstring_append(ps->code, "ab.a = ab.b;\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly an already existing issue might be that this flag is set, and thus ab is read, while ab isn't assigned to.
The same remark for cd below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice I'm not sure this matters; if the flag is set but the combiner is discarding the calculated AB value, it doesn't matter if this produces incorrect data. That said gating behind whether the values are to be used or not makes complete sense and is straightforward. This also caused me to add an additional test for the b2a behavior that pointed out the fact that I was introducing a bug where the alpha was never actually used in practice (fixed now) - thanks!
hw/xbox/nv2a/psh.c
Outdated
} | ||
if (!is_alpha && output.flags & PS_COMBINEROUTPUT_CD_BLUE_TO_ALPHA) { | ||
mstring_append_fmt(ps->code, "%s.a = %s.b;\n", | ||
mstring_get_str(cd_dest), mstring_get_str(cd_dest)); | ||
mstring_append_fmt(ps->code, "cd.a = cd.b;\n"); | ||
} | ||
|
||
MString *sum; | ||
if (output.muxsum_op == PS_COMBINEROUTPUT_AB_CD_SUM) { | ||
sum = mstring_from_fmt("(%s + %s)", mstring_get_str(ab), mstring_get_str(cd)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, this currently results in duplicate expressions, which puts additional burden on the shader compiler, which could be avoided and reduced into a single assignment, when the ab and cd variables are used in these mux/sum paths. Indeed, that does require assignment to ab and/or cd, but that could also be evaluated upfront similar to the new assign_* booleans.
Such "require_*" booleans could also help fixing the potential "read without assignment" issue mentioned above.
e8c006b
to
d22593e
Compare
d22593e
to
52122bc
Compare
800fec1
to
f719947
Compare
f719947
to
c0a378a
Compare
Awesome, thank you! |
Makes combiner stage output independent when assigning multiple results. (e.g., R1 can be set with the previous value of R0 in the same stage as R0 is assigned with the previous value of R1 without the one affecting the other).
Test
Results
Fixes #51
(Apparently, based on reports in xemu#general) fixes #53