Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #12

Merged
merged 2 commits into from
Aug 13, 2016
Merged

Improve performance #12

merged 2 commits into from
Aug 13, 2016

Conversation

gregr
Copy link
Collaborator

@gregr gregr commented Aug 13, 2016

Fix improper walks by list-split-ground

We were only getting the benefit of set-var-val! vars by not properly
passing the state's substitution to walk. This fix reduces latency by
between 1/3 and 2/3 on the harder benchmarks (the harder, the more
savings, as usual).

We were only getting the benefit of set-var-val! vars by not properly
passing the state's substitution to walk.  This fix reduces latency by
between 1/3 and 2/3 on the harder benchmarks (the harder, the more
savings, as usual).
@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

So the former 3s benchmark (which was 30s before today) now takes about 2s, and the new former 19s benchmark now takes about 7s.

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

FYI, I'm also experimenting with optionally limiting search depth in faster-mk. I have a depth-limiting version which about doubles performance again on top of this pull request (2s -> 1s, 7s -> 4s).

Unfortunately I've still yet to see the newest/hardest queries finish without running out of memory, even then.

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

I finally found a way to synthesize virtually all of append (in under 10 seconds, even):

(define append
  (lambda (l s)
    (if ,A ,B ,C))

But the change needed is disappointing to have to make. I changed the definition of if-primo to expect the condition to always return a boolean. This means that instead of (=/= #f t) for the true case we have (== #t t), which is not the usual Scheme semantics.

The next hardest query (having to guess that if is needed in the first place) still doesn't finish.

@webyrd webyrd merged commit 2358146 into webyrd:master Aug 13, 2016
@webyrd
Copy link
Owner

webyrd commented Aug 13, 2016

How are you keeping the synthesis from cheating in these examples? With
careful tests?

The next hardest query (having to guess that if is needed in the first
place) still doesn't finish.

I think it would be reasonable, when running on a machine with many cores,
to have at least one core guess that the definition is recursive, and
therefore has an if as its body. Of course, speeding up synthesis so
that this isn't needed would be nice, but I think for more complex examples
we'll need these sorts of heuristics/speculative evaluations.

But the change needed is disappointing to have to make. I changed the
definition of if-primo to expect the condition to always return a
boolean. This means that instead of (=/= #f t) for the true case we have (== #t t), which is not the usual Scheme semantics.

Once again, given enough parallelism at the hardware level, it seems
reasonable to have at least one thread guess that the test of an if
returns a Boolean. Also, we should probably have at least one thread guess
that the entire definition is well-typed (for some type system closer to
that of ML or whatever), and try to perform type inference. If this
happens to be the case, synthesis could be far faster.

On Fri, Aug 12, 2016 at 9:22 PM, Greg Rosenblatt notifications@github.com
wrote:

I finally found a way to synthesize virtually all of append (in under 10
seconds, even):

(define append
(lambda (l s)
(if ,A ,B ,C))

But the change needed is disappointing to have to make. I changed the
definition of if-primo to expect the condition to always return a
boolean. This means that instead of (=/= #f t) for the true case we have (==
#t t), which is not the usual Scheme semantics.

The next hardest query (having to guess that if is needed in the first
place) still doesn't finish.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABHThJR-4OjRnyA_Q_oEvzxQkEmC8QF7ks5qfTiSgaJpZM4Jjkzt
.

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

Great! Running multiple versions in parallel should also allow us to make use depth limiting for faster queries in many cases without jeopardizing completeness.

To get this new synthesis result, I haven't had to change the nature of test cases at all, believe it or not. I'm using the same test cases as seen in test-old-interp.scm for the other benchmarks. I just changed the interpreter and it worked everything out by itself!

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

I just managed to synthesize all of append from test cases alone, with Barliman's starting skeleton! That means it guesses the name append, the parameter structure of the lambda, and that an if statement will work.

(define ,A
  (lambda ,B ,C))

Guess:

(define append
  (lambda (_.0 _.1)
    (if (null? _.0)
      _.1
      (cons (car _.0) (append (cdr _.0) _.1)))))

Side conditions: (sym _.0 _.1)

Still using the same test cases as before, with the boolean-only if semantics, but now I've moved if-primo out of prim-expo and placed it just ahead of variadic lambda application. Synthesis takes about 39 seconds.

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

I should note that the 39 second timing is with a depth limit. Without the limit, I haven't seen a result even after a few minutes of waiting.

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

By the way, it turns out you don't even need Barliman's starting skeleton for the full synthesis result. Without the skeleton, you get the same result in the same amount of time.

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

By providing just a little more than the starting skeleton ...

(define append
  (lambda (l s) ,A))

... performance improves dramatically. This one completes correctly in less than 5 seconds.

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

Oh, if you're confused as to how this is possible without additional tests to prevent cheating, I forgot to remind you that the depth limit itself also helps to protect against cheating.

To avoid confusion, note that this effect is not caused by cheats being pruned by the depth limit directly: the cheats we've seen are found at shallower depths than real answers, so that cannot explain this effect.

The real explanation seems to be that interesting lines of search retain more search resources as their siblings die off from hitting the limit. Interesting lines of search are disproportionately rewarded by this sibling pruning, compared with cheating lines of search.

This is the same reason that non-gensym test cases with a depth limit are enough to solve some of the problems that require gensym test cases without a limit. We figured this out while trying to understand why run-da-dls can solve hard 9 without gensyms in its test cases a while back: https://github.com/gregr/racket-misc/blob/8ee15173e25521b57bf14507c6bf61422fd6a57e/interp-tests.rkt#L740-L754

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

I've pushed an experimental branch (don't merge it!) for depth-limited search and tweaks to if semantics for you to play with: https://github.com/webyrd/Barliman/commits/do-not-merge-experimental-dls

You can manually adjust max-search-depth in mk.scm to adjust the depth limit. Let me know if you have any questions.

@webyrd
Copy link
Owner

webyrd commented Aug 13, 2016

If we had ~100 hardware threads, could we just increase the
max-search-depth exponentially on several threads? Would that well, do you
think?

On Sat, Aug 13, 2016 at 12:16 PM, Greg Rosenblatt notifications@github.com
wrote:

I've pushed an experimental branch (don't merge it!) for depth-limited
search and tweaks to if semantics for you to play with:
https://github.com/webyrd/Barliman/commits/do-not-merge-experimental-dls

You can manually adjust max-search-depth in mk.scm to adjust the depth
limit. Let me know if you have any questions.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABHThBQ9Fr39lEmd1MYEPmmLV0KAwt4Qks5qfgnsgaJpZM4Jjkzt
.

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

Absolutely, that's the exact arrangement I was thinking we should try!

@webyrd
Copy link
Owner

webyrd commented Aug 13, 2016

Okay! It's probably time to think about getting access to a beefy machine,
or renting an Amazon X1 with spot pricing. :)

On Sat, Aug 13, 2016 at 12:18 PM, Greg Rosenblatt notifications@github.com
wrote:

Absolutely, that's the exact arrangement I was thinking we should try!


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABHThAjoQJHQxnnFUdKZSkffJhcAvjORks5qfgp3gaJpZM4Jjkzt
.

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

Here's a screenshot of the complete synthesis:

synthesize-append-all

@gregr
Copy link
Collaborator Author

gregr commented Aug 13, 2016

For the simple examples we've been trying, a small depth is always going to work and parallel processes aren't going to help much.

Maybe we should first find more involved examples that can benefit from searching across multiple depth limits simultaneously.

@gregr gregr deleted the improve-performance branch August 13, 2016 18:47
@webyrd
Copy link
Owner

webyrd commented Aug 16, 2016

I was able to use your branch to synthesize all of 'zip' in ~30 seconds,
using the three obvious tests with gensyms. I had no luck synthesizing
map, perhaps partly because I wasn't sure which tests might make sense.

On Sat, Aug 13, 2016 at 12:24 PM, Greg Rosenblatt notifications@github.com
wrote:

For the simple examples we've been trying, a small depth is always going
to work and parallel processes aren't going to help much.

Maybe we should first find more involved examples that can benefit from
searching across multiple depth limits simultaneously.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABHThCrkJ5PLqwG4dwyrkyrvUuRXWTaAks5qfgv7gaJpZM4Jjkzt
.

@gregr
Copy link
Collaborator Author

gregr commented Aug 16, 2016

It seems to have a lot of trouble figuring out to pass the procedure as an argument. This doesn't seem to come back:

(define map
  (lambda (f xs)
    (if (null? xs)
     '() (cons (f (car xs)) (map ,A (cdr xs))))))

But it doesn't take too long for it to guess everything else once you help it with passing the procedure:

(define map
  (lambda (f xs)
    (,A ,B
     ,C (,D ,E (,F f ,G)))))

I wonder why this is. Maybe backwards information flow in this case is more difficult?

@webyrd
Copy link
Owner

webyrd commented Aug 16, 2016

Seems like unfolding might be very helpful here. We are no where close to
using the full information of mostly-ground code. I'm not sure which
approach would work best in general, though.

Do you think Mercury would be able to do something clever with

`
(define map
(lambda (f xs)
(if (null? xs)
'() (cons (f (car xs)) (map ,A (cdr xs))))))

`

On Tue, Aug 16, 2016 at 9:32 AM, Greg Rosenblatt notifications@github.com
wrote:

It seems to have a lot of trouble figuring out to pass the procedure as an
argument. This doesn't seem to come back:

(define map
(lambda (f xs)
(if (null? xs)
'() (cons (f (car xs)) (map ,A (cdr xs))))))

But it doesn't take too long for it to guess everything else once you help
it with passing the procedure:

(define map
(lambda (f xs)
(,A ,B
,C (,D ,E (,F f ,G)))))

I wonder why this is. Maybe backwards information flow in this case is
more difficult?


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABHThJrLWUyzmzVYkdhRbHTPi_dRI_wJks5qgdgNgaJpZM4Jjkzt
.

@gregr
Copy link
Collaborator Author

gregr commented Aug 16, 2016

Not really thinking in terms of Mercury anymore, but I have an idea for using mostly-instantiated code to get to the point faster. I'm going to try adding a conde1 to faster-mk, representing conde with cases that are all mutually exclusive w.r.t. some goal prefix (index), and defer interleaving for as long as we have enough information to make a deterministic choice (pruning all but one branch based on the prefixes).

@webyrd
Copy link
Owner

webyrd commented Aug 16, 2016

That might work!

There has to be a way to take advantage of the determinism, when we have
it. I also wonder if this might be a reasonable case for abstract
interpretation or partial evaluation.

On Tue, Aug 16, 2016 at 11:08 AM, Greg Rosenblatt notifications@github.com
wrote:

Not really thinking in terms of Mercury anymore, but I have an idea for
using mostly-instantiated code to get to the point faster. I'm going to try
adding a conde1 to faster-mk, representing conde with cases that are all
mutually exclusive w.r.t. some goal prefix (index), and defer interleaving
for as long as we have enough information to make a deterministic choice
(pruning all but one branch based on the prefixes).


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABHThHLJf1OrFBOLsk65_OO70Fc6AL4rks5qge6GgaJpZM4Jjkzt
.

@gregr
Copy link
Collaborator Author

gregr commented Aug 16, 2016

Unfortunately my first attempt with conde1 hasn't improved performance, and may have actually reduced it by activating the garbage collector more often. I'll have to figure this out later.

But I did find a set of tests that makes it easier to find a definition for map by passing primitives as the procedure, probably because they are simpler to evaluate than lambdas.

This query completes in about 20 seconds:

(define map
  (lambda (f xs)
    ,A))

Tests:
(map ',g1 '())
'()

(map car '((,g2 . ,g3)))
`(,g2)

(map cdr '((,g4 . ,g5) (,g6 . ,g7)))
`(,g5 ,g7)

Best Guess:
(define map (lambda (f xs) (if (null? xs) xs (cons (f (car xs)) (map f (cdr xs))))))

@gregr
Copy link
Collaborator Author

gregr commented Aug 16, 2016

The full query takes just under 3 minutes:

,A

Best Guess:
(define map (lambda (_.0 _.1) (if (null? _.1) _.1 (cons (_.0 (car _.1)) (map _.0 (cdr _.1))))))

Side conditions:
(sym _.0 _.1)

@webyrd
Copy link
Owner

webyrd commented Aug 18, 2016

Nice! I particularly like this test:

(map ',g1 '())
'()

Parallelization/templated starting guesses should help a lot, I'd think.

On Tue, Aug 16, 2016 at 2:15 PM, Greg Rosenblatt notifications@github.com
wrote:

The full query takes just under 3 minutes:

,A

Best Guess:
(define map (lambda (_.0 _.1) (if (null? _.1) .1 (cons (.0 (car _.1)) (map _.0 (cdr _.1))))))

Side conditions:
(sym _.0 _.1)


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABHThMTVIEd3NTOH-eV12tRjrVXEXceAks5qghpEgaJpZM4Jjkzt
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants