Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upprior-string optimization #58
Conversation
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
weavejester
Feb 24, 2016
Owner
That's an interesting result, and gives weight to my own theory that prior-string is the most inefficient part of cljfmt.
Could you use a loop instead? I don't like the idea of exposing an additional argument in the function if we don't have to, and we'd actually save a line of code.
Another potential thing you could try is to memoize prior-string (in another PR). I suspect that will have significant performance increases. Perhaps some sort of soft-reference memoize would work...
|
That's an interesting result, and gives weight to my own theory that Could you use a Another potential thing you could try is to memoize |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
weavejester
Feb 24, 2016
Owner
Actually, now that I look at your code more closely, I'm concerned that this could make prior-string less efficient once it's been memoized. I suggest we try the memoize approach first, and then if that has significant performance gains, revisit this patch to see if this has a positive or negative effect upon a memoized prior-string.
Last I looked, core.memoize didn't have a soft reference memoize strategy, which seems like the best approach in this case, so we might need to build our own.
|
Actually, now that I look at your code more closely, I'm concerned that this could make Last I looked, core.memoize didn't have a soft reference memoize strategy, which seems like the best approach in this case, so we might need to build our own. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 24, 2016
Contributor
Tweaked to use loop as requested, simplified the commit message some.
With regards to memoization, it's not clear to me that there is a common recursive subexpression here which can be memoized out. One thing that struck me before I fell asleep last night is that prior-string is only called by margin, which only looks at the last line of the generated string. So rather than walking arbitrarily far back and trying to memoize that entire computation I think we'd get farther by simply breaking out of the loop after the first term which contains a newline.
|
Tweaked to use With regards to memoization, it's not clear to me that there is a common recursive subexpression here which can be memoized out. One thing that struck me before I fell asleep last night is that |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 24, 2016
Contributor
Okay so this last patch is definitely up for discussion but its results are amazing. Restricting prior-string to prior-line-string (meaning only accumulate enough of a str to capture the first newline) reduces the runtime on clojure/core.clj from 2 minutes to 3.5 SECONDS.
I tried a couple memoization approaches with no particular performance changes.
|
Okay so this last patch is definitely up for discussion but its results are amazing. Restricting I tried a couple memoization approaches with no particular performance changes. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
weavejester
Feb 24, 2016
Owner
With regards to memoization, it's not clear to me that there is a common recursive subexpression here which can be memoized out.
It's a similar idea to memoizing a recursive fibonacci sequence. As you work through the document calling z/next, the prior-string of the zipper location essentially builds on the prior-string of the previous zloc. In theory, we go from O(N^2) to O(N).
However, I did some tests and found that in practice, memoization seemed to slow things down. I'm not sure why this is, but at a guess maybe zippers aren't a good datastructure to memoize. I also realised that the ClojureScript implementation of cljfmt is going to have a harder time with memoization, due to a lack of weak or soft references.
So in practice, maybe memoization ain't such a good idea.
It's a similar idea to memoizing a recursive fibonacci sequence. As you work through the document calling However, I did some tests and found that in practice, memoization seemed to slow things down. I'm not sure why this is, but at a guess maybe zippers aren't a good datastructure to memoize. I also realised that the ClojureScript implementation of cljfmt is going to have a harder time with memoization, due to a lack of weak or soft references. So in practice, maybe memoization ain't such a good idea. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
weavejester
Feb 24, 2016
Owner
Okay so this last patch is definitely up for discussion but its results are amazing. Restricting
prior-stringtoprior-line-string(meaning only accumulate enough of a str to capture the first newline) reduces the runtime on clojure/core.clj from 2 minutes to 3.5 SECONDS.
Excellent! When I wrote prior-string I knew it was a very inefficient way of calculating the margin, but I didn't expect it to have that much of an effect.
Excellent! When I wrote |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 24, 2016
Contributor
Hum... update... while this patch tests cleanly it's doing something wrong on clojure/core. I'm concerned that just backtracking to the first newline isn't enough, we have to backtrack to the first top level form. Gonna keep working on this.
|
Hum... update... while this patch tests cleanly it's doing something wrong on clojure/core. I'm concerned that just backtracking to the first newline isn't enough, we have to backtrack to the first top level form. Gonna keep working on this. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Thanks for the update and the testing you're doing on this. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 24, 2016
Contributor
Sample wrong output:
(def
^{:arglists '([coll x] [coll x & xs])
:doc "conj[oin]. Returns a new collection with the xs
'added'. (conj nil item) returns (item). The 'addition' may
happen at different 'places' depending on the concrete type."
:added "1.0"
:static true}
conj (fn ^:static conj
([] [])
([coll] coll)
([coll x] (clojure.lang.RT/conj coll x))
([coll x & xs]
(if xs
(recur (clojure.lang.RT/conj coll x) (first xs) (next xs))
(clojure.lang.RT/conj coll x)))))|
Sample wrong output: (def
^{:arglists '([coll x] [coll x & xs])
:doc "conj[oin]. Returns a new collection with the xs
'added'. (conj nil item) returns (item). The 'addition' may
happen at different 'places' depending on the concrete type."
:added "1.0"
:static true}
conj (fn ^:static conj
([] [])
([coll] coll)
([coll x] (clojure.lang.RT/conj coll x))
([coll x & xs]
(if xs
(recur (clojure.lang.RT/conj coll x) (first xs) (next xs))
(clojure.lang.RT/conj coll x))))) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 24, 2016
Contributor
Okay so the specific failure case here is pretty clear, given two forms a b on the same line, if b is multiline, b's body forms are indented as if b had been broken to the next line, but b remains on the original line. Otherwise works fine.
Not quite sure how to work around this. It seems like if cljfmt simply broke multi-line right hand side forms onto a new line (sane default IMO) then this is correct behavior even.
|
Okay so the specific failure case here is pretty clear, given two forms Not quite sure how to work around this. It seems like if cljfmt simply broke multi-line right hand side forms onto a new line (sane default IMO) then this is correct behavior even. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Does the failure case work with the old code? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 24, 2016
Contributor
Yes. Prior to this "only one line" change, the above fn would indent correctly.
|
Yes. Prior to this "only one line" change, the above fn would indent correctly. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Do you know why that is? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Not the foggiest. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
weavejester
Feb 24, 2016
Owner
I'll take a look in a little while and try to work out what's different.
|
I'll take a look in a little while and try to work out what's different. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 24, 2016
Contributor
Nevermind I found it. In the early termination case I forgot to cons a term on.
|
Nevermind I found it. In the early termination case I forgot to cons a term on. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 24, 2016
Contributor
Commit rewritten to make use of a reader conditional as mentioned, and to fix a bug wherein the str'd zipper node containing the detected newline would not be included in the output str.
This PR no longer exhibits the above demonstrated incorrect indentation behavior.
|
Commit rewritten to make use of a reader conditional as mentioned, and to fix a bug wherein the This PR no longer exhibits the above demonstrated incorrect indentation behavior. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
weavejester
Feb 24, 2016
Owner
Could you also add a small test in that would have triggered that incorrect indentation behaviour? Since the tests passed despite the error, it shows that there's a gap in the tests that should be filled.
|
Could you also add a small test in that would have triggered that incorrect indentation behaviour? Since the tests passed despite the error, it shows that there's a gap in the tests that should be filled. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Rebased on master and spacing changes to the ns squashed out. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
weavejester
Feb 24, 2016
Owner
Thanks. One more little change and then I'll merge. With commit 1d5987c the summary is a little meaningless out of context. Perhaps instead:
Optimize prior-string by stopping at newline
Because `margin` only needs the prior string up to the previous newline
character, we can change `prior-string` into `prior-line-string` to
improve performance.
|
Thanks. One more little change and then I'll merge. With commit 1d5987c the summary is a little meaningless out of context. Perhaps instead:
|
arrdem
added some commits
Feb 24, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 24, 2016
Contributor
Fixed the test (I wrote the wrong "correct" output) and updated the commit message as requested.
|
Fixed the test (I wrote the wrong "correct" output) and updated the commit message as requested. |
added a commit
that referenced
this pull request
Feb 25, 2016
weavejester
merged commit a49ae47
into
weavejester:master
Feb 25, 2016
1 check passed
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
arrdem
Feb 25, 2016
Contributor
Awesome, thanks for taking these. I look forwards to being able to drop my releases for yours.
|
Awesome, thanks for taking these. I look forwards to being able to drop my releases for yours. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Released as version 0.4.1. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
snoe
Feb 26, 2016
Contributor
Thanks @arrdem, I was tracking this down for clojurescript before I saw this commit. With node, core.clj went from more than 10 minutes to less than 15 seconds!
|
Thanks @arrdem, I was tracking this down for clojurescript before I saw this commit. With node, core.clj went from more than 10 minutes to less than 15 seconds! |
arrdem commentedFeb 24, 2016
prior-stringas written is left recursive and operates by concatenating a series of generated strings.clojure.core/strinternally uses ajava.lang.StringBuilderwhich is a fast right-concatenative structure. The structural left recursion ofprior-stringmeant that fornrecursive callsn-1intermediateStringBuilders would be created and finalized toStrings, the the resulting immutableStringwould be copied in full and discarded in the parent.This patch uses the insight that this structural left recursion can be rewritten into a flat
recurloop by accumulating zipper nodes previously concatenated on the right in a worklist which grows right toleft (immutable cons/push front) and then in the base case making a single pass over this worklist with a single
StringBuilderto generate the result string in a single pass without intermediary structures.This change alone brings the format time on clojure/core.clj down from 7 minutes to 2.