Skip to content
This repository has been archived by the owner. It is now read-only.

New ns form parser for more reliable disambiguation #53

merged 59 commits into from Nov 9, 2013


Copy link

@guns guns commented Nov 4, 2013

Hi @technomancy,

This is a large changeset, so I am submitting this as a pull request.

Slamhound is one of my favorite Clojure tools, and one I would like to
see more widely used. Unfortunately, aside from the very large problem
of issue #19, there have been numerous issues related to disambiguation
that I think discourages users.
#24, #41, and #50 are some closed issues of this type. #26 and #27 are

still open.

In each of the addressed changes, the old ns form is used to divine user
preferences when disambiguating between candidate namespaces. However,
because asplode/asplode simply disassembled the old ns without parsing
it, matching old :require clauses was done with structural pattern

Namespace reference clauses like :require and :use are very flexible
and support prefix lists, multiple keyval pairs per clause, multiple
entries per namespace, and trailing flags. Pattern matching against
these clauses is therefore difficult and undependable.

I added regrow/expand-libs in f1f10ad
as an attempt toward homogenizing libspecs for better pattern matching,
but private attempts at extending this function convinced me that it is
a dead end.

This patch series introduces a proper ns parser. The general algorithm
is the same as before, but disambiguation is now very dependable.

Here are the three main steps in light of the new parser:

1. asplode

asplode/asplode parses ns references into the following map:

;; #'slam.hound.asplode/empty-ns-references
{:import     #{}   ; #{class-sym}
 :require    #{}   ; #{ns-sym}
 :alias      {}    ; {ns-sym ns-sym}
 :refer      {}    ; {ns-sym #{var-sym}}
 :xrefer     #{}   ; #{ns-sym} - exclusively referred namespaces
 :refer-all  #{}   ; #{ns-sym}
 :exclude    {}    ; {ns-sym #{var-sym}}
 :rename     {}    ; {ns-sym {var-sym var-sym}}
 :reload     false ; true/false
 :reload-all false ; true/false
 :verbose    false ; true/false
 :load       nil   ; a seq of file paths
 :gen-class  nil   ; a seq of option pairs, possibly empty

Parsing follows the logic of clojure.core/import and
clojure.core/load-libs, so all documented ways of writing ns
references are supported.

2. regrow

Candidate generation has not changed, except now regrow/candidates
returns a set of class or ns symbols that correspond to the given
:failure-type, instead of a seq of literal libspec forms
('([my.ns :as my] [my.other.ns :as my])).

Since we now have the old ns as real data, regrow/disambiguate can do
its work more accurately, with some new enhancements:

  • Imports from the old ns form are preferred (solving issue #26)

  • If a missing symbol is capitalized and the old ns has a referred var
    with the same name, the search order is set to [:refer :import] to
    give it a chance to find the capitalized var (solving issue #27)

  • Candidates that match vars that were excluded directly either via

    (:use [my.ns :exclude […]]),

    or implicitly excluded from clojure.core via

    (:refer-clojure :only […]),

    are now disjoined from the set of candidates.

The predicate functions for sorting candidates have been changed to
ranking functions that return numbers in order to express non-binary

This is a nice way to reimplement pull request #24, which ranked vars
that were explicitly referred higher than vars that were implicitly
referred via :use or :refer :all.

I initially planned to support addition and exclusion of candidates
based on the :rename option of :use, :refer-clojure, and the
undocumented :refer clause, but I dropped it since I have never actually
seen it in use. It is easy to add in the future if a user requests it.

3. stitch

If the regrow step successfully augments the ns-map with a new :import,
:alias, :refer, or :refer-all, the stitch-up function now has the
responsibility of emitting a real ns-form from data.

This allows the easy generation of compact libspecs such as:

(ns my.ns
  {:comment "Now supports re-emitting ns metadata!"}
  (:require [ :as bar :refer :all]
            [ :as foo :refer [a b c]]
            :reload-all :verbose))

Notice that multiple options per spec, the flags on :require, and the
metadata map after the docstring can all be emitted.


Here is a high level list of all enhancements and changes for reference:

  • New comprehensive ns parser

  • Metadata maps in ns forms are parsed and re-emitted

  • The :require flags :reload, :reload-all, and :verbose are preserved
    and re-emitted

  • Multiple :key value options are emitted per libspec; i.e. no separate
    :as and :refer vectors.

  • Refer candidates are now subject to :exclude rules from the old ns

  • Excluding vars in clojure.core also excludes vars from cljs.core.
    See 5e7f292 for more details

  • The relatively expensive dynamic class search function
    regrow/ns-import-candidates is only invoked when the faster static
    class name search fails

  • Disambiguation of candidates based on presence in the old ns-map now
    works perfectly. This includes :imports, as well as capitalized vars
    that shadow class names. Addresses #26 and #27.

  • Any mass refers from the old-ns that match a top candidate during
    regrow are carried over to the new ns. The only change from #50 is
    that now bare (:use my.ns) statements also trigger this behavior.

    This can be restricted again to only [… :refer :all] if desired.

  • cond-> and as-> from Clojure 1.5 are backported to
    slam.hound.future for convenience. They should be removed when
    upgrading the Clojure dependency

  • Unit tests for all new functions. Overall test coverage has increased,
    with no known regressions.

Final notes

The Master once said,

Programmers know the benefits of everything and the tradeoffs of

Indeed. As this is a large patchset, I am obligated to state the

Here is what we gain:

  • Candidate disambiguation works with all valid ns forms (and many
    implicitly valid forms)

  • The ns data used for disambiguation are maps, sets, and booleans, not
    sequences of irregular Clojure forms.

    The disambiguation functions in regrow are correspondingly easier to
    write and understand.

  • Better separation of concerns: ns parsing is done entirely in
    asplode, and form construction entirely within stitch.

    Currently, ns forms are parsed somewhat in all three namespaces, and
    form construction is handled by both regrow and stitch

And here are the costs:

  • +187 lines of code and comments in slam.hound.asplode

    The old asplode is an easy to grok two-function namespace that
    disassembles ns forms.

    The new asplode is 12 functions + 2 constants that handle all
    variations of namespace reference clauses.

  • +51 lines of code and comments in slam.hound.stitch

    ~20 lines of form collapsing and sorting code is replaced by ~70 lines
    of code generation from namespace data.

    The new stitch functions have complete control over the format of
    the emitted code.

Unsurprisingly, I think increasing the reliability and capability of the
disambiguation process is worth the moderate complexity of the ns parser
(which will likely go untouched until the ns macro is amended).

Slamhound is great when it DWIM, but it is infuriating when it refuses
to DWIM, even if I write exactly what I want in the ns form. These
changes are primarily directed at avoiding the latter scenario.

As always, I am eager to discuss these changes and make adjustments, so
please comment at your convenience.

guns added 30 commits Oct 28, 2013
This is the first commit towards a proper ns parser for more accurately
divining the user's preferences from the old ns-map.

regrow/expand-libs and associated functions will be removed once the
parser is complete.
This is the core data structure to which ns forms will be compiled.
parse-refers for handling :refer-clojure and :use forms.

cond-> is backported from Clojure 1.5 for convenience; it should be
removed when the Clojure dep is updated.
For parsing :require statements.
For parsing :use forms, which are a combination of both 'require
and 'refer, and thus implemented in terms of 'parse-requires and
Generic ns key/value parsing function built with the other parse-* fns.
Handles all reference forms, including :load and :gen-class.
Since ns forms are handled imperatively, repeated declarations like

        (ns foo
          (:require [clojure.test :refer [deftest]]
                    [clojure.test :refer [is]]))

result in both 'deftest and 'is being referred, instead of just 'is as
might be expected.

Thus we should recursively call vmerge for deep merging via itself.
This breaks the rest of the library for now.
Capitalized symbols for Vars are uncommon, but not illegal, and are in
use in some common libraries (prismatic/schema is one).
First of a series of *-from-map functions that generate ns subforms from
the data structures described in asplode/default-namespace-references.
Intended for generating :gen-class and :load forms
The meat of ns-to-map. Generating from data finally allows for compound
requires like:

        (:require [clojure.string :as string :refer [trim]])
We will be emitting this separately as :refer-clojure
Special emission of :refer-clojure for 'clojure.core
Given regular data as input, ns-from-map can now produce much better ns
forms than before.
Deleted code dealt mostly with parsing, which is now done in the asplode
More specifically, it should only appear when it appears in the old ns
Sorting takes place at three different levels:

- The top-level reference forms are hand sorted:

        (:require …)
        (:import …)
        (:refer-clojure …)
        (:gen-class …)
        (:load …)

- The libspecs in :require and :import can be string sorted:

        (:require …)            ; sort-by str
        (:import …)             ; sort-by str

- Finally, the option pairs of each reference form are placed in
  predetermined order (:as … :refer … :exclude … :rename …). Vector
  option values are sorted with sort, and map values (for :rename) are
  converted to (sorted-set).

  :gen-class and :load are _not_ sorted. The order of :load is
  significant, and we wish to respect the user's ordering with

        (:require …)
        (:import …)
        (:refer-clojure …)
        (:gen-class …)
        (:load …)
clojure.core/ns allows metadata to be attached to the ns:

        (ns my.ns
          "Optional docstring"
          {:note "This is metadata"})
Should be sorted, and always come after the docstring.
default-namespace-references -> default-ns-references
namespace-reference-keys -> ns-clauses

New names are shorter and more conventional
guns added 18 commits Nov 2, 2013
Not yet complete, but now contains filter-excludes, which is a new
addition compared to the old version
I am not a fan of private functions, but since slamhound does hide some
functions, we will expose disambiguate and hide its component functions.
Now that we have data instead of ns clauses, we can accurately determine
if a candidate existed in the old ns map, regardless if it was an
import, alias, or refer.

This should address technomancy#26: "Use old ns form to disambiguate potential
classes for import".
New code and tests now supersede old forms at all points.
Issue technomancy#22 and commit 835a671 describe
a preference of directly referred vars over implicitly referred vars
through :use or :refer :all.

In order to implement this we change the disambiguating predicates to
return numbers instead of bools. This allows expression of non-binary
preferences, which may also be useful in the future.
Addresses technomancy#27 "Always assumes any capitalized symbol is a class"

Since it is Clojure convention to reserve capitalized symbols for Class
names, we will continue to default to the possible-types order of
[:import :refer].

However, if the old ns-map contains a :refer of the capitalized var, we
switch the search order to [:refer :import] to give it an opportunity of
being found.
Running slamhound under 1.5.0 generates shadow warnings.
In ClojureScript, cljs.core defines macros for the clojure.core
namespace, and is mass-referred into namespaces by default.

Therefore, (:refer-clojure :exclude […]) should also :exclude vars
within 'cljs.core.

        (ns my.ns.macros
          (:refer-clojure :exclude [defn]))

        (defmacro defn [] …)

        (ns my.ns
          (:refer-clojure :exclude [defn]))

        (defn foo [])

Reconstructing my.ns would likely add '{:refer {cljs.core #{defn}}} as a
candidate since cljs.core is shorter than my.ns.macros.
Copy link

@technomancy technomancy commented Nov 6, 2013

Sorry for the delay here. I'm getting over a fever and am still catching up on things. I appreciate your detailed explanation and hope to find some time to review this in the next couple days.

Copy link
Collaborator Author

@guns guns commented Nov 6, 2013

Thanks @technomancy, hope you get well soon.

Copy link

@technomancy technomancy commented Nov 9, 2013

Wow, this looks like a really solid improvement. It's nice when the codebase can remain simpler, but I think you're right that in this case the original implementation was very naieve; this is a lot more comprehensive and thorough.

It looks like you have :reload/:reload-all in slam.hound.asplode/empty-ns-references as something that applies across the whole ns form, but typically these are nested inside just a single clause. I'm guessing :verbose is the same way? I could see issues with ns forms that have multiple :require clauses, one for :reloads and one without. However, I don't think the old code handled this properly anyway, and IMO it's of questionable worth--arguably :reload isn't something you'd leave in your source but something you'd insert while working on something and remove once you got it figured out.

Appreciate all the work you've put into this lib; it's great to see it mature.

technomancy added a commit that referenced this issue Nov 9, 2013
New ns form parser for more reliable disambiguation
@technomancy technomancy merged commit efd3650 into technomancy:master Nov 9, 2013
Copy link
Collaborator Author

@guns guns commented Nov 10, 2013

I'm happy you agree this is an improvement. Sending very large patches upstream is usually a losing battle, but I thought this was worth the effort. Thank you for reviewing it.

It looks like you have :reload/:reload-all in slam.hound.asplode/empty-ns-references as something that applies across the whole ns form, but typically these are nested inside just a single clause. I'm guessing :verbose is the same way? I could see issues with ns forms that have multiple :require clauses, one for :reloads and one without

Ah, I didn't think of multiple :requires. It's kind of gross that clojure.core/load-libs imperatively modifies namespaces in a loop, but I guess it does provide more flexibility.

Interestingly, due to the implementation of load-libs, require flags can be specified per libspec:

(:require [my.ns :as m :reload-all true])

Of course, this is undocumented, so it's not an option for slamhound.¹

If you're happy to emit a separate :require for different sets of require flags, I've implemented that here:

The first two commits on the branch are worth cherry-picking on their own even if you don't want the last two commits for emitting multiple :require clauses.

¹ On a related note, I'd like to mention that :exclude lists are not re-emitted (aside from :refer-clojure) because they are not officially supported in the (:require) clause. It does work however, so we can re-emit var exclusions if someone desires it:

(:require [my.ns :refer :all :exclude [a b c]])
; equivalent to (:use [my.ns :exclude [a b c]])

Copy link

@technomancy technomancy commented Nov 11, 2013

Sending very large patches upstream is usually a losing battle

Don't think of it as upstream, think of it as same-stream. =)

If you're happy to emit a separate :require for different sets of require flags, I've implemented that here

I was going to say it's not a big deal, but if you've already gone to the trouble of supporting it that'd be great. I like the look of the patch.

At this point feel free to use your judgement when merging and even making releases. I'm happy to be a second pair of eyes for review, but I trust you to make the right decisions going forward.

Copy link
Collaborator Author

@guns guns commented Nov 12, 2013

Thanks for the vote of confidence, it is much appreciated. I promise to be careful when implementing features.

WRT releases, I think now is a great time to cut 1.5.0 since quite a few issues have been resolved since 1.4.0. Or perhaps you're still working on the new #'slam-ns with :nrepl/op...

BTW, do you have a policy on revving the org.clojure/clojure dependency? If we update to 1.5.1 we can remove slam.hound.future before release.

Finally, I have a small todo list if you'd like to see what I am hoping to address in the future:

@guns guns deleted the new-ns-parser branch Nov 12, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants