-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add namespace sorting #251
Conversation
Commit messages of this repository should follow the seven rules of a great Git commit message, as mentioned in the project's contributing guidelines. It looks like there's a few issues with the commit messages in this pull request:
|
Commit messages of this repository should follow the seven rules of a great Git commit message, as mentioned in the project's contributing guidelines. It looks like there's a few issues with the commit messages in this pull request:
|
Thanks for the PR. Can you first remove all extraneous commits and changes that are not relevant to the feature being implemented, as per the contributing guidelines. It makes code review much more difficult. |
Done. I'll split the others into a separate PR, if that's ok. |
This looks rather complex for its stated purpose. My first thought is that we'd want functions more like: (defn- ns-reference? [zloc]
...)
(defn- partition-children [zloc p?]
...)
(defn- sort-children [zloc p? comparator]
...) If I have time later I'll see if I can put something together that's a little more detailed. |
ea74c6f
to
372b88f
Compare
I'm not sure what predicates you have in mind for the partitioning and sorting, but I understand the idea of a more general Furthermore, I also added support for reader macros, they now always come first inside the reference form, and multiple ones (which should be rare) are ordered by their string representation. |
I've investigated the problem a little. I think the design you currently have is not something I'd want to merge and maintain. Rather, I was considering a design more like: (def ^:private ns-reference-symbols
#{:import :require :require-macros :use})
(defn- ns-reference? [zloc]
(and (z/list? zloc)
(some-> zloc z/up ns-form?)
(-> zloc z/sexpr first ns-reference-symbols)))
(defn- update-children [zloc f]
(z/replace zloc (n/replace-children (z/node zloc) (f (z/down zloc)))))
(defn- partition-form [zloc f]
(->> (iterate z/right* zloc)
(take-while some?)
(partition-by f)
(map (partial map z/node))))
(defn- semantic-boundary? [zloc]
;; some logic to work out how to group comments with elements
)
(defn- sort-arguments-by [zloc f]
(let [[head & args] (partition-form zloc semantic-boundary?)]
(->> args (sort-by f) (cons head) (apply concat))))
(defn- sort-references [zloc]
(update-children zloc #(sort-arguments-by % n/string)))
(defn sort-ns-references [form]
(transform form edit-all ns-reference? sort-references)) This is currently incomplete and only a rough sketch. I need to consider further on how to intelligently sort elements while maintaining spacing but bringing along comments. Possibly some preprocessing is required to add dummy nodes after comments, so that comments can be rearranged without affecting newlines. |
Cool, the refactoring into multiple functions certainly helps. I've changed the current draft to be close to that outline and moved it all into core again. Also renamed it to sort-ns-references. It's mostly the the partition function that is different now (and the sort key). I like your approach there, but I don't see a good way to do the split with |
I had another go at it, now using a (defn- new-libspec-with-padding? [zloc]
(or (first-line-break-after-libspec? (z/left* zloc))
(libspec-without-line-breaks-since-last-libspec? zloc))) Postponing Some of the naming still feels a bit verbose, and |
It's better, but I think the design still needs some consideration. My thought is that we want to determine the significant groups in a form that can be reordered, while keeping every other part of the form the same. So for example: (:require
; comment for c
c
a ; comment for a
b) The groups would be: (:require
X
Y
Z) Where: X="; comment for c
c"
Y="a ; comment for a"
Z="b" So we'd need to determine significant vs insignificant; given a particular zloc, is it part of the sort or not? I'll see if I can find time to implement it this week. |
I see your point, but I think that complicates matters a lot for what seems to be little gain. Just to make sure I understand it correctly, I'm thinking of a case like this: (:require
c
a
b) which in my version would become (:require
a
b
c) But with your approach of interleaving it with existing indentation, it should become (:require
a
b
c) That makes sense to me and I explored something like that as well, but I ran into problems, especially with comments, because usually their own indentation remains untouched, and it's not obvious how the indentation between comment and libspec should be treated/changed. I think you hinted at that with your example of I'm all for it, if you can improve it, but let me outline my thoughts and some examples I already pondered, perhaps it'll affect your view on the matter. The biggest "normal" case: (:require
b
; comment for a
a
c) I'd expect this to be reformatted to: (:require
; comment for a
a
b
c) but I suspect with your approach it would become something like (:require
; comment for a
a
b
c) A similar problem comes up when the libspec indentation steps out of line: (:require
b
; comment for a
a
c) in my mind should become (:require
; comment for a
a
b
c) but with interleaving would probably have to reset I can also imagine someone using indentation to group some requires, perhaps something like: (:require
b
[a]
[a.foo.a]
[a.foo.b]
[a.foo.c]) I think that should become (:require
[a]
[a.foo.a]
[a.foo.b]
[a.foo.c]
b) but with interleaving would strangely become (:require
[a]
[a.foo.a]
[a.foo.b]
[a.foo.c]
b) That's why I designed based on the assumption that it more often is correct to treat the indentation of a libspec line (and its surrounding comments) as part of that libspec. All that being said: I think we can agree that these are all edge cases, unless you have some specific common situations in mind that I haven't considered. I expect these lines to generally have the same indentation (with comments being special sometimes). Therefore it seems to me the interleaving adds quite a bit of complexity for a minority of edge cases where it would improve the situation, while there are (I'm guessing) more common situations where it would likely result in unexpected/undesired results. |
3639bdb
to
06e8283
Compare
Did you find some time to look at this? |
I spent a couple of hours on it, but I wasn't able to come up with a solution I was satisfied with. I'll take another crack at it this week or next. |
Any joy? |
I've thought about it a little, but I haven't invested a lot of time in this yet. This feels like a regular expression problem (in the general sense), as we want to group the following tokens:
The question is whether we use something like Spec or another data regex library, or whether we write something out manually. Alternatively, we assign each token type a unique character, concatenate them into a string, and take advantage of the inbuilt text regex support to concisely work out the group indices. |
So perhaps we start with something like: (defn re-seq-matcher [re charmap coll]
{:pre (every? charmap coll)}
(let [s (apply str (map charmap coll))
v (vec coll)
m (re-matcher re s)]
(letfn [(next-match []
(when (.find m)
{:subvec (subvec v (.start m) (.end m))
:start (.start m)
:end (.end m)}))]
(take-while some? (repeatedly next-match))))) Then use that to find the slices of comments/elements we want to treat as a continuous block. We sort according to the string value of the non-whitespace non-comment elements, then replace the existing slices with their sorted equivalent. We probably also need to add newlines after every comment before the sort, then remove them afterwards. This would ensure newlines are preserved once the comments are moved around. I'll see if I can come up with a more concrete implementation. |
Cool, I can see how that'd work, but I personally don't see the benefit yet. It seems like it'll be more complicated, without solving a concrete problem. I also don't think the string value of the non-comment element will work well for sorting. That might be a vector starting with spaces, for instance. I think the sorting in the PR makes sense: put reader macros first, then strings, then symbols either on their own or as first element in vectors, e.g. "[a], b, [c.foo]" rather than grouping them by vector and symbol, maybe then strings for whatever else flew by those nets. |
The idea is that it will be less complex, more concise, more maintainable, and easier to adapt to similar problems, such as sorting maps or sets. I've created a draft implementation, but I've yet to thoroughly test it. In this implementation, elements of a form are extracted with comments on the same and any preceding lines, sorted alphanumerically (ignoring whitespace and comments), then spliced back into the original form. |
I've opened a branch for this: https://github.com/weavejester/cljfmt/tree/ns-sorting I'm currently undecided on the sorting order. I think you might be right about needing more sophisticated sorting, though there may be a way to achieve the same effect more concisely. |
Nice. The regex needs to match more cases: there might be only spaces between libspecs. Or a libspec followed by spaces and a newline. Also need a |
The regex should handle those instances.
Since uneval nodes aren't going to be written in natural language, I don't think it makes sense to treat them as comments in the traditional sense.
An excellent point. Thanks for reminding me. |
What's left to do here to make it pass the tests and get it shipshape? Anything I can help with? |
The two things left to do are to get it working with ClojureScript, and to tweak the sorting order. I have some ideas on the latter, but I've yet to find time to work on this. |
06e8283
to
d89925b
Compare
Commit messages of this repository should follow the seven rules of a great Git commit message, as mentioned in the project's contributing guidelines. It looks like there's a few issues with the commit messages in this pull request:
|
I've pulled your new design in here, merging it with the docs update. I've also merged in my previous tests, showing a few that fail now. On top of that I put several proposal commits intended for fixup:
I've also added a draft of a cljs version of |
d89925b
to
4077214
Compare
Thanks for the code. Hiding the implentation is a good idea, but I don't think we need a protocol for it, as there's no open polymorphism. Instead, what about something like: (defn- re-indexes [re s]
(let [matcher #?(:clj (re-matcher re s)
:cljs (js/RegExp. (.-source re) "g"))
next-match #?(:clj #(when (.find matcher)
[(.start matcher) (.end matcher)])
:cljs #(when-let [result (.exec matcher s)]
[(.-index result) (.-lastIndex matcher)]))]
(take-while some? (repeatedly next-match))))
(defn- re-seq-matcher [re charmap coll]
{:pre (every? charmap coll)}
(let [s (apply str (map charmap coll))
v (vec coll)]
(for [[start end] (re-indexes re s)]
{:value (subvec v start end)
:start start
:end end}))) Running the namespace sort first rather than last also makes a lot of sense. |
4077214
to
3bbe167
Compare
Great, that's much better. |
Did you have a chance to look at the sorting? |
Apologies for the delay. My current thought is to sort by:
In other words: ignore brackets, metadata, and leading spaces, then sort alphanumerically. This will keep conditionals together, strings together, and also sort namespaces in an intuitive order (e.g. It's a little crude, so I'd appreciate any edge cases you can think of that might cause issues. |
I think that's hard to grok and won't be sufficient to deal with metadata that's map-shaped, I've added a test for that: (is (reformats-to?
["(ns foo.bar"
" (:require"
" ^{:foo :bar"
" :bar :foo}"
" [c]"
" [a.b :as b]"
" b))"]
["(ns foo.bar"
" (:require"
" [a.b :as b]"
" b"
" ^{:foo :bar"
" :bar :foo}"
" [c]))"]
{:sort-ns-references? true})) The result is (ns foo.bar
(:require
^{:foo :bar
:bar :foo}
[c]
[a.b :as b]
b)) Because I want to pitch again to have a sorting function that makes the order explicit. It could still use the removal of any irrelevant nodes and use |
The two remaining tests that fail are due to a mix of newlines and libspecs on the same line, potentially without spaces, which then result in strange indentation and reassembly due to splicing. Please have a look whether you agree with the tests, otherwise the behaviour might be acceptable to you. Is there really a big advantage in the splicing? I think this regex approach could work well with a pattern that includes all whitespace (and comments) to the left and a newline, if it exists, to the right, so those move with the libspec. There's a potential change of newline after the last libspec, but that seems like an edge case that can be ignored, since other options take care of it (at least when a newline should be removed). And the more important behaviour is that all indentation and newlines being kept as they are, if the order doesn't change, which is the case. |
01a4ba0
to
d28eada
Compare
One more update: I had to change the regex to allow for more special cases with uneval blocks above the libspec: (is (reformats-to?
["(ns foo.bar"
" (:require"
" [foo]"
" ;; comment above"
" #_[comment above] #_another"
" ;; comment above"
" [b]))"]
["(ns foo.bar"
" (:require"
" ;; comment above"
" #_[comment above] #_another"
" ;; comment above"
" [b]"
" [foo]))"]
{:sort-ns-references? true})) |
Yes, that's true. I was hoping to get away with a regular expression, but I probably want to map over the nodes and remove metadata that way.
I'd rather have something that's easy to remember. "Sorted alphanumerically, ignoring metadata and brackets" is simpler than trying to remember an arbitrary order. I think its important that functions behave in a way that's predictable and can be concisely explained and recalled. We want to boil down the behaviour into a minimal set of rules.
I think that's fine. This sort of edge case is almost always going to be in error, and could potentially be solved by an
That potentially results in odd behaviour for the first and last elements, and I feel its generally unintuitive behaviour. Ideally we want to start from the simplest possible behaviour, i.e. "it sorts non-whitespace elements", and then add the minimum amount of complexity needed, i.e. "it sorts non-whitespace elements, keeping the relative position of comments above and to the right of each element".
I think my preference is not to treat uneval blocks the same as comments in this case, as they fulfil different roles. The purpose of a comment is to describe code, while an uneval block is code we don't want to evaluate. |
Ok, then let's treat uneval as non-comment. I can see a situation where a libspec is unevaled, so it'd make sense to sort that as well. It would have to be unpacked like the metadata.
I understand that and can live with it, but it's still hiding a lot of implicit stuff, I think. For instance, the reader-macros only end up on top because there's a keyword in there that starts with :, if I understand it correctly. Similar with the " of strings. The code doesn't really reveal that.
I now think it might be simpler if cljfmt dictates how the form should look (setting linebreaks and indentation), rather than working so hard to preserve indentation and special cases. But maybe this approach is more re-usable. Those two concerns aside, I've pushed a new version that removes the uneval changes, adds a test for an unevaled libspec that is expected to be sorted as well, changed/removed all the tests that had super edgy splicing problem cases, e.g. (defn- unpack-metadata-and-uneval [nodes]
(mapcat (fn [node]
(case (n/tag node)
:meta (rest (n/children node))
:uneval (n/children node)
[node])) nodes))
(defn- node-sort-string [nodes]
(-> (remove (some-fn n/comment? n/whitespace?) (unpack-metadata-and-uneval nodes))
(nodes-string)
(str/replace #"[\[\]\(\)\{\}]" "")
(str/trim))) Feel free to change it as needed. All tests pass now, and I think it's starting to look like a decent solution. |
Maybe. I'm in two minds about this. The downside is that the description of the sorting behaviour becomes less concise: Elements are sorted alphanumerically, ignoring brackets, metadata, and uneval prefixes. Vs. Elements are sorted alphanumerically, ignoring brackets and metadata. Putting in a special case for brackets and metadata makes sense, but I'm unsure if uneval is important enough to warrant making the behaviour more complex. I have a slight inclination to default to the simpler option.
Almost: it's because they start with
Implicit is fine, as long as its predictable. If I sort alphanumerically, items that start with the same character are grouped together. That might be implicit, but it shouldn't be surprising.
So my intent here is simplicity, in the Rich Hickey sense of the word. It might harder to sort while leaving the spacing alone, but it's simpler because there a single purpose to this system. It just sorts. It doesn't make spacing or indentation changes; it only reorders the significant elements in a form.
I can see this being useful for sorting literal maps and sets as well, which may potentially be useful. |
It's worth considering that using However, that could be revisited when it is actually needed for map literals, where unevals probably are more likely. And then there's a whole new issue with stacked It's not a blocker for me either way, feel free to remove it. |
An attempt to add namespace sorting in
ns
reference blocks. This is one of the features requested in #13.I saw it was previously implemented in #95 and #113, but both PRs appear stale and neither looks like they handle the cases I'm interested in. Most importantly handling comments and all libspec variants (e.g.
a.b.c
,[a.b.c ...]
,"npm-dependency"
) and all flavours of reference blocks (i.e.:require
,:require-macros
,:use
,:import
).This implementation tries to address those issues.
It tries to preserve newlines and spaces and will honour these styles:
It'll also move comments before a line and at the end of a line with the libspec element, e.g.
A messy mix of newlines and same line libspecs and comments and non-indentation whitespaces in the source file can still result in unexpected results, especially if no other options for indentation and whitespace removal are turned on, but I think those should be rare, and at that point it's really hard to guess what might be the desired formating.
I introduced a new namespace, which might not be what you want, but it seemed like those functions would enlarge
cljfmt.core
quite a bit, and since there were other ideas for ns-related improvements in #13... perhaps that's a place where they could live.Finally, all but the last commit are preparation and (I hope) small improvements. They are orthogonal to the feature, so either can be cherry-picked or dropped.
Note: This code also assumes
ns-form?
works correctly to identify the top-levelns
form, as #250 is already working on that I use the function as is.