Skip to content

Comparison patterns for PPC #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

dbalatoni13
Copy link
Contributor

@dbalatoni13 dbalatoni13 commented Mar 22, 2025

I added a test case for how MWCC 1.3.2/2.6 handles a != 0 and a != b and code to handle it. In this PR I plan to fix all possible combinations from the previous and new test.

Feel free to give feedback to the commits in the meantime! As I wrote on Discord, I'm still learning when to use early_unwrap_ints over early_unwrap. I have a TODO comment and I'm also not sure why the input of replace_clz_shift is first passed to fold_divmod, should we do that here too?

@dbalatoni13
Copy link
Contributor Author

I'll add a comment about the version info and flags.

@simonlindholm
Copy link
Collaborator

I'm also not sure why the input of replace_clz_shift is first passed to fold_divmod, should we do that here too?

That's just a case of "try all the patterns one at a time", first fold_divmod, then replace_clz_shift, then (newly added) replace_or_shift. The way you've added it looks good to me.

Copy link
Collaborator

@simonlindholm simonlindholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Happy to merge this now and do the other comparisons in a follow-up if you feel like it, but also ok with awaiting those

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented Mar 22, 2025

That's just a case of "try all the patterns one at a time", first fold_divmod, then replace_clz_shift, then (newly added) replace_or_shift. The way you've added it looks good to me.

Oh, right. Should I leave it like this then or can I chain it one more time? I think the other new patterns will go to the same place, so I'm not sure.

@dbalatoni13
Copy link
Contributor Author

Looks good. Happy to merge this now and do the other comparisons in a follow-up if you feel like it, but also ok with awaiting those

I won't take long, so I'd like to do it here to avoid cluttering PRs

Co-authored-by: Simon Lindholm <simon.lindholm10@gmail.com>
@simonlindholm
Copy link
Collaborator

Oh, right. Should I leave it like this then or can I chain it one more time?

Either way is fine. Probably best to be consistent, i.e. either

            lower_bits = replace_or_shift(replace_clz_shift(fold_divmod(lower_bits)))

or

            lower_bits = fold_divmod(lower_bits)
            lower_bits = replace_clz_shift(lower_bits)
            lower_bits = replace_or_shift(lower_bits)

Maybe the latter is a bit more readable?

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented Mar 22, 2025

Maybe the latter is a bit more readable?

Yes, especially if considering that I'll have to add more.

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented Mar 23, 2025

a < b and b > a generate the exact same code, I wonder which one I should go for.

@dbalatoni13
Copy link
Contributor Author

Hmm, if you look at a <= b in the comparison2 test, it's decompiled perfectly. But a >= b has casts, even though it's the exact same thing in the asm. This is how the expressions look like:
((arg1 >> 0x1F) + (arg0 >> 0x1FU) + M2C_CARRY)
(((s32) arg0 >> 0x1F) + ((u32) arg1 >> 0x1FU) + M2C_CARRY)

Do you think we can do something about that safely?

m2c/evaluate.py Outdated
left_expr = early_unwrap_ints(expr.left)
if not (isinstance(left_expr, BinaryOp) and left_expr.op == "+"):
return expr
# We call this function only from carry_add_to so it's not necessary to check the right side
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, the expression might be changed by fold_divmod by this point. I think I should either change the order or just add an extra check here.

@simonlindholm
Copy link
Collaborator

Btw, it would be worth adding a test for unsigned comparisons as well.

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented Mar 23, 2025

Oh wow, that generates code that uses an instruction that's not even implemented yet: orc. I'll take a look at the patterns from the old comparison test and leave the orc and the pattern using it for other PRs.

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented Mar 23, 2025

Actually, let's look at the patterns from the old comparison test some other time. Looking forward to your feedback to merge this :)

@simonlindholm
Copy link
Collaborator

I'll take a look tomorrow!

@dbalatoni13
Copy link
Contributor Author

Thanks! I just had this idea: if we know that a certain pattern is only generated if the args are u32 for example, can't we use this information to guess the types better? 🤔

@simonlindholm
Copy link
Collaborator

Yes, that would be ideal! Comparisons have different semantics between signed and unsigned so it's natural that they would give different codegen, and it's certainly useful to expose the types to the user.

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented Mar 25, 2025

May I ask how to do that? :D Though the problem is that in the a < b case we don't know which one from the two is u32. It generates this if either of them is u32.

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented Mar 27, 2025

I feel like arg0 >> 31 and (arg0 >> 31) ^ 1 might be too short to not create false positives. On the other hand: should I implement these three?:

temp_r0 = CLZ(arg0);
global = ((1 << (temp_r0 & 0x1F)) & 1) | ((1 >> (0x20 - (temp_r0 & 0x1F))) & 1);
global = (u32) (-(s32) arg0 & ~arg0) >> 0x1FU;
global = (u32) ((arg1 | ~arg0) - ((u32) (arg1 - arg0) >> 1U)) >> 0x1FU;

@dbalatoni13
Copy link
Contributor Author

I ran this through the whole Mario Party 4 decomp and these new patterns caused 30 files to be changed. Only one of them is a false positive, but that function is a mess anyways. I noticed that the order in a < b actually matters when matching, because of the order the values are loaded in, we've sadly lost this info by the point we handle the expression as a whole.

@simonlindholm
Copy link
Collaborator

May I ask how to do that? :D Though the problem is that in the a < b case we don't know which one from the two is u32. It generates this if either of them is u32.

I would suggest using as_uintish() on both operands, which will try to coerce them both to unsigned, falling back to a cast if it fails. It might be wrong (as you say only one has to be u32 for C to auto-convert to u32) but it still feels like a better guess than nothing at all?

On the other hand: should I implement these three?:

Well, sure, sounds good to me. Do you want to do it in the same PR or another? (I'm already taking too long to get around to reviewing this one I feel like...)

@simonlindholm
Copy link
Collaborator

Actually, for comparisons specifically you can use BinaryOp.ucmp() for uint compares, which is shorthand for doing as_uintish() on each argument.


def replace_shift_add_carry(expr: BinaryOp) -> BinaryOp:
"""
Simplify the expressions matching `((b >> 31) + (a >> 31U)) + M2C_CARRY`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Hm, I wish M2C_CARRY expressions kept the addition/subtraction that resulted in the carry, it's tricky to understand why the pattern makes sense right now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants