Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix combiner independence #720

Merged
merged 1 commit into from Apr 24, 2022

Conversation

abaire
Copy link
Contributor

@abaire abaire commented Feb 5, 2022

Makes combiner stage output independent when assigning multiple results. (e.g., R1 can be set with the previous value of R0 in the same stage as R0 is assigned with the previous value of R1 without the one affecting the other).

Test
Results

Fixes #51
(Apparently, based on reports in xemu#general) fixes #53

Screenshot_20220204_192824

Screenshot_20220204_192835

} else {
mstring_unref(ab_dest);
mstring_ref(ab_mapping);
ab_dest = ab_mapping;
}

if (mstring_get_length(cd_dest)) {
mstring_append_fmt(ps->code, "%s.%s = clamp(%s(%s), -1.0, 1.0);\n",
mstring_get_str(cd_dest), write_mask, caster, mstring_get_str(cd_mapping));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed to cd.%s

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because he introduced two intermediate storage variables, specifically to fix this issue with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because he introduced two intermediate storage variables, specifically to fix this issue with

Yeah, the fix is indeed to use intermediate vars so an output that sets a variable dependent on another variable in the output is not then dependent on the ordering they were specified as outputs (matching HW behavior).

E.g., if the output of the last stage had R0 set to (1.0, 0.0, 1.0) and the output of this stage is defined as R1 = R0 * T0 and R0 = T1 * C0 the final output should be: R1 = (1.0, 0.0, 1.0) * T0, R0 = T1 * C0 whereas the current xemu code makes the value of R1 depend on whether it was calculated in AB (it'll be correct) or CD (it'll be wrong, because R0 gets updated before R1 is calculated and it erroneously uses the new value).

}
if (!is_alpha && output.flags & PS_COMBINEROUTPUT_CD_BLUE_TO_ALPHA) {
mstring_append_fmt(ps->code, "%s.a = %s.b;\n",
mstring_get_str(cd_dest), mstring_get_str(cd_dest));
mstring_append_fmt(ps->code, "cd.a = cd.b;\n");
}

MString *sum;
if (output.muxsum_op == PS_COMBINEROUTPUT_AB_CD_SUM) {
sum = mstring_from_fmt("(%s + %s)", mstring_get_str(ab), mstring_get_str(cd));
Copy link

@PatrickvL PatrickvL Feb 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't all 3 mux/sum paths read from the new ab and cd variables now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'ab' and 'cd' in the C code actually contain the raw calculations (see ~line 419). They are only conditionally assigned to the glsl variables based on whether the results are referenced as part of the combiner output, so it would be unsafe to blindly use them here without adding code to blindly do the calculation and assignment to intermediates above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, this currently results in duplicate expressions, which puts additional burden on the shader compiler, which could be avoided and reduced into a single assignment, when the ab and cd variables are used in these mux/sum paths. Indeed, that does require assignment to ab and/or cd, but that could also be evaluated upfront similar to the new assign_* booleans.
Such "require_*" booleans could also help fixing the potential "read without assignment" issue mentioned above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree with you, I think Matt's preference in general has been to put in the basic fix, then optimize, so I'm inclined to do cleanup in a second pass since I don't think this PR changes the duplication in the case where the AB/CD results are assigned along with the muxsum.

Copy link

@PatrickvL PatrickvL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xemu isn't big on describing why code is written like it is, but this change is a good example of something that a future reader might see and think of "let's optimise this into a single assignment statement", overlooking the fact this specific construct is here to avoid sequential variable access (while hardware turns out to calculate in parallel). So IMHO even a single comment line on that would help tremendously here

@PatrickvL
Copy link

The pull request description should also mention that it includes handling of the PS_COMBINERCOUNT_MUX_MSB flag

@abaire
Copy link
Contributor Author

abaire commented Feb 5, 2022

The pull request description should also mention that it includes handling of the PS_COMBINERCOUNT_MUX_MSB flag

It doesn't intentionally. This PR is just dependent on several other of my PRs that need to be merged first. As far as I'm aware the xemu convention is one-fix-one-commit so any PR with multiple commits is generally squashed pre-review.

@abaire
Copy link
Contributor Author

abaire commented Feb 5, 2022

Xemu isn't big on describing why code is written like it is, but this change is a good example of something that a future reader might see and think of "let's optimise this into a single assignment statement", overlooking the fact this specific construct is here to avoid sequential variable access (while hardware turns out to calculate in parallel). So IMHO even a single comment line on that would help tremendously here

I don't disagree, but @mborgerson has asked me to remove comments a number of times so I'm defaulting to letting him ask for explanation where necessary.

} else {
mstring_unref(cd_dest);
mstring_ref(cd_mapping);
cd_dest = cd_mapping;
}

if (!is_alpha && output.flags & PS_COMBINEROUTPUT_AB_BLUE_TO_ALPHA) {
mstring_append_fmt(ps->code, "%s.a = %s.b;\n",
mstring_get_str(ab_dest), mstring_get_str(ab_dest));
mstring_append(ps->code, "ab.a = ab.b;\n");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly an already existing issue might be that this flag is set, and thus ab is read, while ab isn't assigned to.

The same remark for cd below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice I'm not sure this matters; if the flag is set but the combiner is discarding the calculated AB value, it doesn't matter if this produces incorrect data. That said gating behind whether the values are to be used or not makes complete sense and is straightforward. This also caused me to add an additional test for the b2a behavior that pointed out the fact that I was introducing a bug where the alpha was never actually used in practice (fixed now) - thanks!

}
if (!is_alpha && output.flags & PS_COMBINEROUTPUT_CD_BLUE_TO_ALPHA) {
mstring_append_fmt(ps->code, "%s.a = %s.b;\n",
mstring_get_str(cd_dest), mstring_get_str(cd_dest));
mstring_append_fmt(ps->code, "cd.a = cd.b;\n");
}

MString *sum;
if (output.muxsum_op == PS_COMBINEROUTPUT_AB_CD_SUM) {
sum = mstring_from_fmt("(%s + %s)", mstring_get_str(ab), mstring_get_str(cd));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, this currently results in duplicate expressions, which puts additional burden on the shader compiler, which could be avoided and reduced into a single assignment, when the ab and cd variables are used in these mux/sum paths. Indeed, that does require assignment to ab and/or cd, but that could also be evaluated upfront similar to the new assign_* booleans.
Such "require_*" booleans could also help fixing the potential "read without assignment" issue mentioned above.

@ko81e24wy
Copy link

this PR fix lighting problem of NBA LIVE 2005
master
2022-02-06 20-33-38 的屏幕截图
PR
2022-02-08 01-42-13 的屏幕截图

@mborgerson
Copy link
Member

Awesome, thank you!

@mborgerson mborgerson merged commit 4c83427 into xemu-project:master Apr 24, 2022
@abaire abaire deleted the fix_combiner_independence branch April 24, 2022 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Halo:CE laser effect missing parts Some Halo:CE fog effects broken
5 participants