Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nv2a: Fix combiner single stage color+alpha interdependence #999

Conversation

abaire
Copy link
Contributor

@abaire abaire commented May 30, 2022

Fixes #890

Potentially could fix other games since it'd affect any pixel shader that assigns alpha using the blue component of a register that is also modified in the same combiner stage.

Test
HW Results

Before fix:
before_fix

After fix:
with_fix

@PatrickvL
Copy link

Could you add a before and after pixel shader example, as it's hard to tell what effect this change has - the new code appears to produce the exact same output as the previous code, so what am I missing here?

@abaire
Copy link
Contributor Author

abaire commented May 30, 2022

Could you add a before and after pixel shader example, as it's hard to tell what effect this change has - the new code appears to produce the exact same output as the previous code, so what am I missing here?

The attached screenshots certainly look very different to me, so I'm not sure what further evidence can be provided :)

You can also run the linked test on master and verify that it does not produce the same green quad that it does on HW (and with this PR).

@PatrickvL
Copy link

So you're not adding example pixel shader code to compare here? Okay. Pity. Would've been nice for reference.

@abaire
Copy link
Contributor Author

abaire commented May 30, 2022

Sorry, I misunderstood and thought you were asking for screenshots. Pity not required.

The PR very simply defers the assignment of the temp vars to the end of the stage so the color combiner cannot affect the alpha combiner.

Before:

// Stage 0
ab.rgb = clamp(vec3(dot(max(t1.rgb, 0.0), max(c0_0.rgb, 0.0))), -1.0, 1.0);
r1.rgb = ab.rgb;
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * max(vec4(0.0).a, 0.0))), -1.0, 1.0);
r1.a = ab.a;
// Stage 1
mux_sum.rgb = clamp(vec3(((max(r1.rgb, 0.0) * max(c0_1.rgb, 0.0)) + ((1.0 - clamp(c0_1.rgb, 0.0, 1.0)) * max(t1.rgb, 0.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
ab.a = clamp(((max(t1.a, 0.0) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.a = ab.a;
// Stage 2
ab.rgb = clamp(vec3((max(t2.rgb, 0.0) * max(r0.rgb, 0.0))), -1.0, 1.0);
r1.rgb = ab.rgb;
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.a = ab.a;
// Stage 3
mux_sum.rgb = clamp(vec3(((max(r1.rgb, 0.0) * max(c0_3.rgb, 0.0)) + ((1.0 - clamp(c0_3.rgb, 0.0, 1.0)) * max(r0.rgb, 0.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.a = ab.a;
// Stage 4
mux_sum.rgb = clamp(vec3(((max(t1.rgb, 0.0) * max(t3.rgb, 0.0)) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * max(r0.rgb, 0.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.a = ab.a;
// Stage 5
mux_sum.rgb = clamp(vec3(((max(r0.rgb, 0.0) * (1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0))) + (max(c0_5.rgb, 0.0) * max(t0.rgb, 0.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.a = ab.a;
// Stage 6
ab.rgb = clamp(vec3(((max(r0.rgb, 0.0) * max(c0_6.rgb, 0.0)) * 4.0)), -1.0, 1.0);
r1.rgb = ab.rgb;
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.a = ab.a;
// Stage 7
mux_sum.rgb = clamp(vec3(((max(r0.rgb, 0.0) * (1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0))) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * max(r1.rgb, 0.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.a = ab.a;
// Final Combiner
fragColor.rgb = max(vec4(0.0).rgb, 0.0) + mix(vec3(max(r0.rgb, 0.0)), vec3(max(c0_8.rgb, 0.0)), vec3(max(pFog.aaa, 0.0)));
fragColor.a = max(r0.a, 0.0);

After:

// Stage 0
ab.rgb = clamp(vec3(dot(max(t1.rgb, 0.0), max(c0_0.rgb, 0.0))), -1.0, 1.0);
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * max(vec4(0.0).a, 0.0))), -1.0, 1.0);
r1.rgb = ab.rgb;
r1.a = ab.a;
// Stage 1
mux_sum.rgb = clamp(vec3(((max(r1.rgb, 0.0) * max(c0_1.rgb, 0.0)) + ((1.0 - clamp(c0_1.rgb, 0.0, 1.0)) * max(t1.rgb, 0.0)))), -1.0, 1.0);
ab.a = clamp(((max(t1.a, 0.0) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
r0.a = ab.a;
// Stage 2
ab.rgb = clamp(vec3((max(t2.rgb, 0.0) * max(r0.rgb, 0.0))), -1.0, 1.0);
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r1.rgb = ab.rgb;
r0.a = ab.a;
// Stage 3
mux_sum.rgb = clamp(vec3(((max(r1.rgb, 0.0) * max(c0_3.rgb, 0.0)) + ((1.0 - clamp(c0_3.rgb, 0.0, 1.0)) * max(r0.rgb, 0.0)))), -1.0, 1.0);
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
r0.a = ab.a;
// Stage 4
mux_sum.rgb = clamp(vec3(((max(t1.rgb, 0.0) * max(t3.rgb, 0.0)) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * max(r0.rgb, 0.0)))), -1.0, 1.0);
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
r0.a = ab.a;
// Stage 5
mux_sum.rgb = clamp(vec3(((max(r0.rgb, 0.0) * (1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0))) + (max(c0_5.rgb, 0.0) * max(t0.rgb, 0.0)))), -1.0, 1.0);
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
r0.a = ab.a;
// Stage 6
ab.rgb = clamp(vec3(((max(r0.rgb, 0.0) * max(c0_6.rgb, 0.0)) * 4.0)), -1.0, 1.0);
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r1.rgb = ab.rgb;
r0.a = ab.a;
// Stage 7
mux_sum.rgb = clamp(vec3(((max(r0.rgb, 0.0) * (1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0))) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * max(r1.rgb, 0.0)))), -1.0, 1.0);
ab.a = clamp((((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
r0.a = ab.a;
// Final Combiner
fragColor.rgb = max(vec4(0.0).rgb, 0.0) + mix(vec3(max(r0.rgb, 0.0)), vec3(max(c0_8.rgb, 0.0)), vec3(max(pFog.aaa, 0.0)));
fragColor.a = max(r0.a, 0.0);

@mborgerson mborgerson merged commit 7d6da22 into xemu-project:master May 30, 2022
@abaire abaire deleted the fix/890/remove_color_alpha_combiner_single_stage_interdependence branch May 30, 2022 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need for Speed Most Wanted: Missing orange/yellow color filter
3 participants