Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(expt 2 8) induction? #448

Closed
kazarmy opened this issue May 18, 2023 · 7 comments
Closed

(expt 2 8) induction? #448

kazarmy opened this issue May 18, 2023 · 7 comments

Comments

@kazarmy
Copy link

kazarmy commented May 18, 2023

I have the following:

(set-info :smt-lib-version 2.6)
(set-option :produce-proofs true)
(set-logic UFDTLIRA)

(define-fun zip ((i Int)) Bool
  (= i 0))
(define-fun-rec expt ((r Int) (i Int)) Int
  (ite (zip i) 1
  (ite (= r 0) 0
  (ite (> i 0) (* r (expt r (+ i (- 1))))
  (- 1)))))

(assert
 (not
  (= (expt 2 7) 128)))

(check-sat)
(get-proof)
(exit)

(yes it's a partial reimplementation of ACL2's expt)

Vampire 4.7 (807e37dd9) proves the above (with some difficulty) using:

vampire --mode portfolio -sched struct_induction -thi all

but when I change the assert to:

(assert
 (not
  (= (expt 2 8) 256)))

Vampire doesn't find the proof within the time limit anymore. Now I know that CVC5 1.0.5 proves the above assertion instantly, so are there are any other options that I need to set to encourage Vampire to go down the constant term instantiation path? -- I hope I'm using terms correctly here.

@MichaelRawson
Copy link
Contributor

@joe-hauns - when you get a minute, should the current theory machinery handle this "evaluation" use case somehow in your view? Otherwise I'll close as I don't think we can do much better here at present.

@selig
Copy link
Contributor

selig commented May 18, 2023

thi all is going to fill up the search space a lot, it's unlikely it's the best option compared to other theory instantiation options; it also works well when combined with a uwa option.

What kind of induction are you hoping to do here (I haven't looked closely at the actual problem)? You have integers but you've picked the structural induction schedule, which is over datatypes and recursive functions (which you also have). I think just picking the induction schedule will try and detect which kinds of induction might apply. Although I don't think any of the induction schedules make broad use of all of the arithmetic reasoning options.

@kazarmy
Copy link
Author

kazarmy commented May 18, 2023

Thanks for the hints! All I can say right now about this is that the struct_induction schedule is significantly faster compared to the integer_induction and induction schedules (~33s vs ~54s on my laptop), and that there are some interesting lines that appear to be options lines e.g.

% lrs+10_1_drc=off:ind=struct:sos=theory:sstl=1:to=lpo_89 on expt.smt2 for (89ds/0Mi)

that might suggest that uwa should be off. And without thi all, Vampire doesn't succeed in proving in 60s "(= (expt 2 3) 8)" much less "(= (expt 2 7) 128)". All of this is very interesting tbh.

It's just that I'm not sure whether I can or should be poring over papers, logs and source code to see what every option does (and how small changes in the problem affect the proving process). Still, I might come back to this when I have the chance.

@joe-hauns
Copy link
Contributor

Well i don't really know anything about how we handle define-fun-rec and how induction is being performed for such functions. I think @mezpusz knows best about that.
Recursive function mechanisms are definitely separate from the usual theory evaluation i've been working on.

@selig
Copy link
Contributor

selig commented May 18, 2023

@joe-hauns I think handling define-fun-rec is orthogonal. If this needs the all then it's needing to do arithmetic reasoning involving constraint solving that seems relevant to your work.... however, these are integers.

@kazarmy ... I feel your pain. The current schedules are generated based on sample problems and it sounds like your use case doesn't fall nicely into any of those sets. The issue you're facing is that some combinations of options probably weren't explored when generating those schedules so we don't see them.

If I had your problem I'd define an experiment that searched the option space to find a nice set of complementary options for a representative set of problems but that's a lot of compute and requires extra knowledge in which strategies you should be trying. Over the last 10 years we keep discussing whether this is a service we could provide to users but it's a big effort that nobody's working on.

@MichaelRawson
Copy link
Contributor

Sadly I think this is something we're not expecting to work well (yet). I looked vaguely at proof search and I'd expect this to rewrite "backwards" from the goal, but it doesn't because the symbol precedence forbids it, I think.Perhaps future work with arithmetic or recursive functions will improve matters. @mezpusz I think you're the expert here, if you've got anything to add please do!

@mezpusz
Copy link
Contributor

mezpusz commented Jul 18, 2023

Sorry for the late reply! As far as I see, induction won't help here since we cannot generalize (= (expt 2 8) 256) to any universal statement, and the arithemetic axioms + the above function definitions should be enough to solve this. I guess the proof search simply diverges due to superposition trying to expand the expt definition which results in larger and larger terms and more and more time is spent on useless superpositions between the axioms..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants