Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERS-13 Optimization #15

Merged
merged 36 commits into from
Mar 8, 2021
Merged

PERS-13 Optimization #15

merged 36 commits into from
Mar 8, 2021

Conversation

kelvinqian00
Copy link
Collaborator

Optimized Persephone

  • Statement Templates are no longer compiled on each Statement read.
  • Compiled JSONPaths are now stored in a stateful cache for quick access.
  • Optimized validation function creation:
    • Presence colls are now turned into sets during compilation.
    • Removed use of spec and explain-data.
  • Optimized FSM creation:
    • Removed pattern matching from pattern-validation/pattern->fsm.
    • Removed fsm/move-nfa as a standalone function.
    • fsm/epsilon-closure now uses transients internally.
  • Optimized Pathetic dep (see api-refactor pull request).
  • Update persephone API function names and optional args to be in line with each other.

@kelvinqian00
Copy link
Collaborator Author

Criterium Benchmarks:

(persephone-gen-test/run-validate-stmt-vs-profile 10)

Commit Mean (ms) SD (ms) Speedup
6593ce3 10839.39 253.56 1
e127223 224.37 23.27 47
0b06637 169.89 11.17 64
aed921d 146.95 1.60 74
9492703 127.62 1.32 85
eecad65 112.93 2.03 96
1fe46ea 118.01 8.59 92
70a724e 105.11 7.26 103

(persephone-gen-test/run-match-stmt-vs-profile 10)

Commit Mean (ms) SD (ms) Speedup
6593ce3 1057.96 71.28 1
e127223 27.41 0.23 39
0b06637 26.78 0.31 40
aed921d 28.56 1.79 37
9492703 25.16 1.47 42

(persephone/profile->validator tc3-profile)

Commit Mean (ms) SD (ms) Speedup
9743089 1069.25 72.67 1
0175b0e 22.43 1.53 48

Note: Tested with REPL reset on second benchmark.

(persephone/profile->fsms tc3-profile)

Commit Mean (s) SD (ms) Speedup
9743089 19.98 888.04 1
0175b0e 4.09 26.80 4.89
ea74850 2.11 15.67 9.47

@kelvinqian00 kelvinqian00 marked this pull request as ready for review March 5, 2021 15:55
@FeLungs
Copy link
Contributor

FeLungs commented Mar 5, 2021

looks good to me, nice optimization!

Copy link
Member

@milt milt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the extent I understand the domain this looks really good and contains some substantial refactoring and optimization.
I didn't fully grok what the macro I commented on there is doing, but it seems similar to what clojure.spec.alpha does when you use a predicate function as a spec, where it returns the var name of the function on error.

Comment on lines +3 to +20
## 0.5.0 - 2021-05-25
- Update persephone API function names and optional args to be in line with each other.
- Statement Templates are no longer compiled on each Statement read.
- Compiled JSONPaths are now stored in a stateful cache for quick access.
- Optimized validation function creation:
- Presence colls are now turned into sets during compilation.
- Removed use of spec and `explain-data`.
- Optimized FSM creation:
- Removed pattern matching from `pattern-validation/pattern->fsm`.
- Removed `fsm/move-nfa` as a standalone function.
- `fsm/epsilon-closure` now uses transients internally.
- Optimized Pathetic dep (see [api-refactor](https://github.com/yetanalytics/pathetic/pull/3)).

## 0.4.0 - 2021-02-25
- Added FSM specs and generative tests in the `gen` namespace.
- Added DATASIM tests in the `gen` namespace to test API functions on statement streams.
- Generative tests have their own aliases in `deps.edn`.
- Modified `match-next-statement` to handle multiple Patterns and Pattern outputs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an insanely great changelog entry. We don't normally have the discipline to maintain these (as I'm sure you've noticed), except in limited cases where it's part of our contract, but I'd like to see them become more common

Comment on lines +19 to +26
(defmacro wrap-pred
"Wrap a predicate function f such that if f returns true,
return nil, and if f returns false, return the keywordized
name of f."
[f]
(assert (symbol? f))
(let [pred-name# (keyword f)]
`(fn [x#] (if (~f x#) nil ~pred-name#)))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this allows you to take a collection of predicates, use this on each and then run them all on something and know (by function name) which specific ones are false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically yes. And as you can see by how it's use in the create-pred functions, the macros allow for predicates to be transformed this way via conditional threading.

It is indeed the same as what spec does - returns the function name on failure - but without all the extra overhead that was a source of unoptimized performance.

@kelvinqian00 kelvinqian00 merged commit c31e147 into master Mar 8, 2021
@kelvinqian00 kelvinqian00 deleted the optimization branch March 8, 2021 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants