Skip to content

Conversation

@mezpusz
Copy link
Contributor

@mezpusz mezpusz commented Dec 6, 2025

This amends #783 from the SMTLIB side with the case of Boolean let binders.

I tested with:

  1. TPTP Discount/Otter where 2 problems previously almost timing out with Otter flipped to timeout,
  2. SMTLIB2 with Discount on ALIA, AUFDTLIA, AUFDTLIRA, AUFDTNIRA, AUFLIA, AUFLIRA, AUFNIA and AUFNIRA, where the 20k benchmarks crashing mentioned in Polymorphic arrays #786 all disappeared, and 18951 flipped to being solved and we lose 6 benchmarks for whatever reason.

Not sure if I should test on more SMTLIB benchmarks or look into the differences, but I want to eventually run longer term regressions over SMTLIB too to see if we somehow decline.

@MichaelRawson
Copy link
Contributor

TPTP Discount/Otter where 2 problems previously almost timing out with Otter flipped to timeout,

we lose 6 benchmarks for whatever reason.

I'm not too bothered by these, we'll doubtless get them some other way and being correct is usually slower than being incorrect.

Will now review - but thanks for looking into SMT-LIB, this is the kind of large bug we should catch.

Term* lhs;
if (exprSort == AtomicSort::boolSort()) {
// This solution is ugly, but either := has to be special or we have
// to wrap this as a formula to preserve the term-formula boundary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we can re-consider - I think you're of the opinion that more interpreted symbols and fewer special terms are a good thing, which I agree with. But for now this is a good fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would be happy if the special term wrapper disappeared at some point, but let's see if we can get rid of it.

@mezpusz
Copy link
Contributor Author

mezpusz commented Dec 7, 2025

I ran a regression from f0fb5f9 (before the let-bound PR) to this branch and I found another bug introduced in the let-bound PR. It was because of my seemingly permanent misunderstanding of the functions isBoolean and isLiteral.

Now it still seems a bit suspicious, because we can somehow solve ~500 more benchmarks than before the let-bound PR. ~50 are lost. I will look into it for some time.

@mezpusz
Copy link
Contributor Author

mezpusz commented Dec 7, 2025

Looking at a couple of benchmarks in the difference, it seems that up to clausification the difference is trivial (for example, using $difference(...,1) instead of $sum(...,$uminus(1))) or having different order of definitions/namings). 🤷

@MichaelRawson
Copy link
Contributor

Do I read that as "we're not unsound because this only affects Vampire up to clausification"?

@mezpusz
Copy link
Contributor Author

mezpusz commented Dec 8, 2025

The let-bound changes mostly affected everything up to clausification, otherwise I don't know what other changes in between could have caused this result. Any ideas?

@MichaelRawson
Copy link
Contributor

It was because of my seemingly permanent misunderstanding of the functions isBoolean and isLiteral.

I think I also suffered from this. :-/ Any suggestions for a rename? I could go for isBoolean -> isTrueOrFalse.

The let-bound changes mostly affected everything up to clausification, otherwise I don't know what other changes in between could have caused this result. Any ideas?

I wouldn't be too surprised if this were the case. I'm not sure what it would tell us here, but sometimes I write the clausification output to a file and then run Vampire on the CNF. If you don't see the same effect, it's probably something like term IDs or symbol precedence?

@mezpusz
Copy link
Contributor Author

mezpusz commented Dec 8, 2025

I have no idea what it should be. Fortunately it is not used in many places (and maybe some of them are incorrect too!) and it looks like it is always used in the context of preprocessing special Boolean terms?

But anyways, this PR is nevertheless an improvement over master. I also ran TPTP regression, now it came back without any difference compared to master.

@mezpusz
Copy link
Contributor Author

mezpusz commented Dec 8, 2025

Btw, the function could be probably simplified by taking getSpecialData()->getSort() instead of the loop+switch, otherwise I'm not even sure if the current one is correct because it yields false for special terms with a variable body.

@MichaelRawson MichaelRawson merged commit 2cf06d2 into master Dec 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants