New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ns form parser for more reliable disambiguation #53

Merged
merged 59 commits into from Nov 9, 2013

Conversation

Projects
None yet
2 participants
@guns
Copy link
Collaborator

guns commented Nov 4, 2013

Hi @technomancy,

This is a large changeset, so I am submitting this as a pull request.

Slamhound is one of my favorite Clojure tools, and one I would like to
see more widely used. Unfortunately, aside from the very large problem
of issue #19, there have been numerous issues related to disambiguation
that I think discourages users.
#24, #41, and #50 are some closed issues of this type. #26 and #27 are

still open.

In each of the addressed changes, the old ns form is used to divine user
preferences when disambiguating between candidate namespaces. However,
because asplode/asplode simply disassembled the old ns without parsing
it, matching old :require clauses was done with structural pattern
matching.

Namespace reference clauses like :require and :use are very flexible
and support prefix lists, multiple keyval pairs per clause, multiple
entries per namespace, and trailing flags. Pattern matching against
these clauses is therefore difficult and undependable.

I added regrow/expand-libs in f1f10ad
as an attempt toward homogenizing libspecs for better pattern matching,
but private attempts at extending this function convinced me that it is
a dead end.

This patch series introduces a proper ns parser. The general algorithm
is the same as before, but disambiguation is now very dependable.

Here are the three main steps in light of the new parser:

1. asplode

asplode/asplode parses ns references into the following map:

;; #'slam.hound.asplode/empty-ns-references
{:import     #{}   ; #{class-sym}
 :require    #{}   ; #{ns-sym}
 :alias      {}    ; {ns-sym ns-sym}
 :refer      {}    ; {ns-sym #{var-sym}}
 :xrefer     #{}   ; #{ns-sym} - exclusively referred namespaces
 :refer-all  #{}   ; #{ns-sym}
 :exclude    {}    ; {ns-sym #{var-sym}}
 :rename     {}    ; {ns-sym {var-sym var-sym}}
 :reload     false ; true/false
 :reload-all false ; true/false
 :verbose    false ; true/false
 :load       nil   ; a seq of file paths
 :gen-class  nil   ; a seq of option pairs, possibly empty
 }

Parsing follows the logic of clojure.core/import and
clojure.core/load-libs, so all documented ways of writing ns
references are supported.

2. regrow

Candidate generation has not changed, except now regrow/candidates
returns a set of class or ns symbols that correspond to the given
:failure-type, instead of a seq of literal libspec forms
('([my.ns :as my] [my.other.ns :as my])).

Since we now have the old ns as real data, regrow/disambiguate can do
its work more accurately, with some new enhancements:

  • Imports from the old ns form are preferred (solving issue #26)

  • If a missing symbol is capitalized and the old ns has a referred var
    with the same name, the search order is set to [:refer :import] to
    give it a chance to find the capitalized var (solving issue #27)

  • Candidates that match vars that were excluded directly either via

    (:use [my.ns :exclude […]]),

    or implicitly excluded from clojure.core via

    (:refer-clojure :only […]),

    are now disjoined from the set of candidates.

The predicate functions for sorting candidates have been changed to
ranking functions that return numbers in order to express non-binary
preferences.

This is a nice way to reimplement pull request #24, which ranked vars
that were explicitly referred higher than vars that were implicitly
referred via :use or :refer :all.

I initially planned to support addition and exclusion of candidates
based on the :rename option of :use, :refer-clojure, and the
undocumented :refer clause, but I dropped it since I have never actually
seen it in use. It is easy to add in the future if a user requests it.

3. stitch

If the regrow step successfully augments the ns-map with a new :import,
:alias, :refer, or :refer-all, the stitch-up function now has the
responsibility of emitting a real ns-form from data.

This allows the easy generation of compact libspecs such as:

(ns my.ns
  "Docstring"
  {:comment "Now supports re-emitting ns metadata!"}
  (:require [my.bar :as bar :refer :all]
            [my.foo :as foo :refer [a b c]]
            :reload-all :verbose))

Notice that multiple options per spec, the flags on :require, and the
metadata map after the docstring can all be emitted.

Changelist

Here is a high level list of all enhancements and changes for reference:

  • New comprehensive ns parser

  • Metadata maps in ns forms are parsed and re-emitted

  • The :require flags :reload, :reload-all, and :verbose are preserved
    and re-emitted

  • Multiple :key value options are emitted per libspec; i.e. no separate
    :as and :refer vectors.

  • Refer candidates are now subject to :exclude rules from the old ns

  • Excluding vars in clojure.core also excludes vars from cljs.core.
    See 5e7f292 for more details

  • The relatively expensive dynamic class search function
    regrow/ns-import-candidates is only invoked when the faster static
    class name search fails

  • Disambiguation of candidates based on presence in the old ns-map now
    works perfectly. This includes :imports, as well as capitalized vars
    that shadow class names. Addresses #26 and #27.

  • Any mass refers from the old-ns that match a top candidate during
    regrow are carried over to the new ns. The only change from #50 is
    that now bare (:use my.ns) statements also trigger this behavior.

    This can be restricted again to only [… :refer :all] if desired.

  • cond-> and as-> from Clojure 1.5 are backported to
    slam.hound.future for convenience. They should be removed when
    upgrading the Clojure dependency

  • Unit tests for all new functions. Overall test coverage has increased,
    with no known regressions.

Final notes

The Master once said,

Programmers know the benefits of everything and the tradeoffs of
nothing.

Indeed. As this is a large patchset, I am obligated to state the
tradeoffs.

Here is what we gain:

  • Candidate disambiguation works with all valid ns forms (and many
    implicitly valid forms)

  • The ns data used for disambiguation are maps, sets, and booleans, not
    sequences of irregular Clojure forms.

    The disambiguation functions in regrow are correspondingly easier to
    write and understand.

  • Better separation of concerns: ns parsing is done entirely in
    asplode, and form construction entirely within stitch.

    Currently, ns forms are parsed somewhat in all three namespaces, and
    form construction is handled by both regrow and stitch

And here are the costs:

  • +187 lines of code and comments in slam.hound.asplode

    The old asplode is an easy to grok two-function namespace that
    disassembles ns forms.

    The new asplode is 12 functions + 2 constants that handle all
    variations of namespace reference clauses.

  • +51 lines of code and comments in slam.hound.stitch

    ~20 lines of form collapsing and sorting code is replaced by ~70 lines
    of code generation from namespace data.

    The new stitch functions have complete control over the format of
    the emitted code.

Unsurprisingly, I think increasing the reliability and capability of the
disambiguation process is worth the moderate complexity of the ns parser
(which will likely go untouched until the ns macro is amended).

Slamhound is great when it DWIM, but it is infuriating when it refuses
to DWIM, even if I write exactly what I want in the ns form. These
changes are primarily directed at avoiding the latter scenario.

As always, I am eager to discuss these changes and make adjustments, so
please comment at your convenience.

guns

guns added some commits Oct 28, 2013

Add asplode/expand-imports and asplode/expand-libspecs
This is the first commit towards a proper ns parser for more accurately
divining the user's preferences from the old ns-map.

regrow/expand-libs and associated functions will be removed once the
parser is complete.
Add asplode/default-namespace-references
This is the core data structure to which ns forms will be compiled.
Add asplode/parse-refers
parse-refers for handling :refer-clojure and :use forms.

cond-> is backported from Clojure 1.5 for convenience; it should be
removed when the Clojure dep is updated.
Add asplode/parse-requires
For parsing :require statements.
Add asplode/parse-uses
For parsing :use forms, which are a combination of both 'require
and 'refer, and thus implemented in terms of 'parse-requires and
'parse-refers.
Add asplode/parse-libs
Generic ns key/value parsing function built with the other parse-* fns.
Handles all reference forms, including :load and :gen-class.
Change vmerge strategy
Since ns forms are handled imperatively, repeated declarations like

        (ns foo
          (:require [clojure.test :refer [deftest]]
                    [clojure.test :refer [is]]))

result in both 'deftest and 'is being referred, instead of just 'is as
might be expected.

Thus we should recursively call vmerge for deep merging via itself.
Rewrite asplode/asplode with parse-libs
This breaks the rest of the library for now.
Rename class-name? -> capitalized?
Capitalized symbols for Vars are uncommon, but not illegal, and are in
use in some common libraries (prismatic/schema is one).
Add stitch/imports-from-map
First of a series of *-from-map functions that generate ns subforms from
the data structures described in asplode/default-namespace-references.
Add generic stitch/keyword-list-from-map
Intended for generating :gen-class and :load forms
Add stitch/requires-from-map
The meat of ns-to-map. Generating from data finally allows for compound
requires like:

        (:require [clojure.string :as string :refer [trim]])
Add special handling for {:refer {clojure.core :all}}
We will be emitting this separately as :refer-clojure
Add stitch/refer-clojure-from-map
Special emission of :refer-clojure for 'clojure.core
Rewrite stitch/ns-from-map
Given regular data as input, ns-from-map can now produce much better ns
forms than before.
Remove newly unused code in slam.hound.stitch
Deleted code dealt mostly with parsing, which is now done in the asplode
step.
Restrict insertion of (:refer-clojure :only [])
More specifically, it should only appear when it appears in the old ns
Ensure :import lists are sorted
Sorting takes place at three different levels:

- The top-level reference forms are hand sorted:

        (:require …)
        (:import …)
        (:refer-clojure …)
        (:gen-class …)
        (:load …)

- The libspecs in :require and :import can be string sorted:

        (:require …)            ; sort-by str
        (:import …)             ; sort-by str

- Finally, the option pairs of each reference form are placed in
  predetermined order (:as … :refer … :exclude … :rename …). Vector
  option values are sorted with sort, and map values (for :rename) are
  converted to (sorted-set).

  :gen-class and :load are _not_ sorted. The order of :load is
  significant, and we wish to respect the user's ordering with
  :gen-class.

        (:require …)
        (:import …)
        (:refer-clojure …)
        (:gen-class …)
        (:load …)
Parse leading maps in namespaces as metadata
clojure.core/ns allows metadata to be attached to the ns:

        (ns my.ns
          "Optional docstring"
          {:note "This is metadata"})
Emit metadata in stitch-up
Should be sorted, and always come after the docstring.
Rename some constants
default-namespace-references -> default-ns-references
namespace-reference-keys -> ns-clauses

New names are shorter and more conventional

guns added some commits Nov 2, 2013

Add new regrow/disambiguate
Not yet complete, but now contains filter-excludes, which is a new
addition compared to the old version
Make regrow/filter-excludes private
I am not a fan of private functions, but since slamhound does hide some
functions, we will expose disambiguate and hide its component functions.
Rewrite regrow/in-originals-pred for new ns-map
Now that we have data instead of ns clauses, we can accurately determine
if a candidate existed in the old ns map, regardless if it was an
import, alias, or refer.

This should address #26: "Use old ns form to disambiguate potential
classes for import".
Remove newly unused code in slam.hound.regrow
New code and tests now supersede old forms at all points.
Change disambugating predicates to return numbers
Issue #22 and commit 835a671 describe
a preference of directly referred vars over implicitly referred vars
through :use or :refer :all.

In order to implement this we change the disambiguating predicates to
return numbers instead of bools. This allows expression of non-binary
preferences, which may also be useful in the future.
Check :refer before :import if capitalized var is in old :refer
Addresses #27 "Always assumes any capitalized symbol is a class"

Since it is Clojure convention to reserve capitalized symbols for Class
names, we will continue to default to the possible-types order of
[:import :refer].

However, if the old ns-map contains a :refer of the capitalized var, we
switch the search order to [:refer :import] to give it an opportunity of
being found.
Rename future/cond-> and as-> to future/cond->* and as->*
Running slamhound under 1.5.0 generates shadow warnings.
Duplicate (:refer-clojure) parse map with 'cljs.core
In ClojureScript, cljs.core defines macros for the clojure.core
namespace, and is mass-referred into namespaces by default.

Therefore, (:refer-clojure :exclude […]) should also :exclude vars
within 'cljs.core.

e.g.
        (ns my.ns.macros
          (:refer-clojure :exclude [defn]))

        (defmacro defn [] …)

        (ns my.ns
          (:refer-clojure :exclude [defn]))

        (defn foo [])

Reconstructing my.ns would likely add '{:refer {cljs.core #{defn}}} as a
candidate since cljs.core is shorter than my.ns.macros.
@technomancy

This comment has been minimized.

Copy link
Owner

technomancy commented Nov 6, 2013

Sorry for the delay here. I'm getting over a fever and am still catching up on things. I appreciate your detailed explanation and hope to find some time to review this in the next couple days.

@guns

This comment has been minimized.

Copy link
Collaborator Author

guns commented Nov 6, 2013

Thanks @technomancy, hope you get well soon.

@technomancy

This comment has been minimized.

Copy link
Owner

technomancy commented Nov 9, 2013

Wow, this looks like a really solid improvement. It's nice when the codebase can remain simpler, but I think you're right that in this case the original implementation was very naieve; this is a lot more comprehensive and thorough.

It looks like you have :reload/:reload-all in slam.hound.asplode/empty-ns-references as something that applies across the whole ns form, but typically these are nested inside just a single clause. I'm guessing :verbose is the same way? I could see issues with ns forms that have multiple :require clauses, one for :reloads and one without. However, I don't think the old code handled this properly anyway, and IMO it's of questionable worth--arguably :reload isn't something you'd leave in your source but something you'd insert while working on something and remove once you got it figured out.

Appreciate all the work you've put into this lib; it's great to see it mature.

technomancy added a commit that referenced this pull request Nov 9, 2013

Merge pull request #53 from guns/new-ns-parser
New ns form parser for more reliable disambiguation

@technomancy technomancy merged commit efd3650 into technomancy:master Nov 9, 2013

@guns

This comment has been minimized.

Copy link
Collaborator Author

guns commented Nov 10, 2013

I'm happy you agree this is an improvement. Sending very large patches upstream is usually a losing battle, but I thought this was worth the effort. Thank you for reviewing it.

It looks like you have :reload/:reload-all in slam.hound.asplode/empty-ns-references as something that applies across the whole ns form, but typically these are nested inside just a single clause. I'm guessing :verbose is the same way? I could see issues with ns forms that have multiple :require clauses, one for :reloads and one without

Ah, I didn't think of multiple :requires. It's kind of gross that clojure.core/load-libs imperatively modifies namespaces in a loop, but I guess it does provide more flexibility.

Interestingly, due to the implementation of load-libs, require flags can be specified per libspec:

(:require [my.ns :as m :reload-all true])

Of course, this is undocumented, so it's not an option for slamhound.¹

If you're happy to emit a separate :require for different sets of require flags, I've implemented that here:

https://github.com/guns/slamhound/compare/multiple-require-clauses

The first two commits on the branch are worth cherry-picking on their own even if you don't want the last two commits for emitting multiple :require clauses.

¹ On a related note, I'd like to mention that :exclude lists are not re-emitted (aside from :refer-clojure) because they are not officially supported in the (:require) clause. It does work however, so we can re-emit var exclusions if someone desires it:

(:require [my.ns :refer :all :exclude [a b c]])
; equivalent to (:use [my.ns :exclude [a b c]])
@technomancy

This comment has been minimized.

Copy link
Owner

technomancy commented Nov 11, 2013

Sending very large patches upstream is usually a losing battle

Don't think of it as upstream, think of it as same-stream. =)

If you're happy to emit a separate :require for different sets of require flags, I've implemented that here

I was going to say it's not a big deal, but if you've already gone to the trouble of supporting it that'd be great. I like the look of the patch.

At this point feel free to use your judgement when merging and even making releases. I'm happy to be a second pair of eyes for review, but I trust you to make the right decisions going forward.

@guns

This comment has been minimized.

Copy link
Collaborator Author

guns commented Nov 12, 2013

Thanks for the vote of confidence, it is much appreciated. I promise to be careful when implementing features.

WRT releases, I think now is a great time to cut 1.5.0 since quite a few issues have been resolved since 1.4.0. Or perhaps you're still working on the new #'slam-ns with :nrepl/op...

BTW, do you have a policy on revving the org.clojure/clojure dependency? If we update to 1.5.1 we can remove slam.hound.future before release.

Finally, I have a small todo list if you'd like to see what I am hoping to address in the future:

https://gist.github.com/guns/7425023

@guns guns deleted the guns:new-ns-parser branch Nov 12, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment