Skip to content
This repository has been archived by the owner on Oct 2, 2020. It is now read-only.

Commit

Permalink
Rank alias candidates by "alias" distance, then prefer initialisms
Browse files Browse the repository at this point in the history
Where "alias distance" is defined as the number of insertions between
the first and last character of an alias necessary to create a
front-anchored subsequence of the candidate string.

This is meant to favor matching single-letter or straight subsequence
aliases, even when other candidates are shorter. Two common examples
include:

    s/replace should prefer clojure.string, not clojure.core
    st/join   should prefer clojure.string, not clojure.set

Currently, dashes are not recognized as word separators, although they
arguably should:

    (alias-distance "fbz" "foo-bar-zab") ; 6
    (alias-distance "fbz" "frobnobz")    ; 5

It's clear that the intent is to alias foo-bar-zab; we should be able to
address this in the future.

A somewhat common convention for aliases are initialisms:

    juc -> java.util.concurrent
    cji -> clojure.java.io

We try to detect these as a last resort before falling to the default
case.
  • Loading branch information
guns committed Nov 26, 2013
1 parent 31b9485 commit 4befa3f
Show file tree
Hide file tree
Showing 4 changed files with 80 additions and 2 deletions.
5 changes: 5 additions & 0 deletions CHANGES
@@ -1,6 +1,11 @@
Changes from 1.4.0 to 1.5.0, 24 November 2013
=============================================

- Prefer aliases with shorter "alias" distance (number of insertions to create
anchored subsequence).

- Prefer alias that are namespace initialisms.

- Prevent multiple aliases to a single namespace.

- Disjoin current ns from candidate namespaces.
Expand Down
54 changes: 53 additions & 1 deletion src/slam/hound/regrow.clj
Expand Up @@ -127,6 +127,9 @@
cs)
candidates))

(defn- last-segment [s]
(peek (string/split s #"\.")))

(def ^:private disambiguator-blacklist
(if-let [v (resolve 'user/slamhound-disambiguator-blacklist)]
@v
Expand Down Expand Up @@ -160,7 +163,7 @@
(let [alias (name missing)]
(fn [candidate]
(if (and (= type :alias)
(= alias (peek (string/split (name candidate) #"\."))))
(= alias (last-segment (name candidate))))
0
1))))

Expand All @@ -173,6 +176,53 @@
0
1)))

(defn- alias-distance [^String alias ^String cand]
(if (= (first alias) (first cand))
(let [alen (.length alias)
clen (.length cand)]
(loop [d 0 ; alias-distance
i 1 ; alias index
j 1 ; candidate index
matched? true ; current alias match state
]
(if (or (>= i alen) (>= j clen))
(if (and matched? (= i alen))
d
Long/MAX_VALUE)
(if (= (.charAt alias i) (.charAt cand j))
(recur d (inc i) (inc j) true)
(recur (inc d) i (inc j) false)))))
Long/MAX_VALUE))

(defn- alias-distance-fn
"If the candidate shares the same first character with the missing alias,
how many characters must be added between the first and last characters of
the alias to form a subsequence of the last segment of the candidate?
e.g. 0: st -> clojure.string
1: st -> clojure.set
2: st -> my.switchboard
MAX_VALUE: str -> clojure.set
MAX_VALUE: ring -> clojure.string"
[type missing]
(let [alias (name missing)]
(fn [candidate]
(if (= type :alias)
(alias-distance alias (last-segment (name candidate)))
Long/MAX_VALUE))))

(defn- initials-match-alias-fn
"Do the initials of the candidate match the missing alias?"
[type missing]
(let [alias (name missing)]
(fn [candidate]
(if (and (= type :alias)
(= alias (->> (string/split (name candidate) #"\.")
(map first)
(string/join))))
0
1))))

(defn disambiguate
"Select the most likely class or ns symbol in the given set of candidates,
returning [type candidate-sym]"
Expand All @@ -192,6 +242,8 @@
(sort-by (juxt (in-originals-fn type missing old-ns-map)
(last-segment-matches-fn type missing)
(is-project-namespace-fn type)
(alias-distance-fn type missing)
(initials-match-alias-fn type missing)
(comp count str))))]
(when-let [c (first cs)]
;; Honor any old [c :refer :all] specs - issue #50
Expand Down
20 changes: 20 additions & 0 deletions test/slam/hound/regrow_test.clj
Expand Up @@ -42,6 +42,18 @@
(is (= (candidates :refer 'join '((join #{:a} #{:b})))
'#{clojure.set clojure.string korma.core}))))

(deftest test-alias-distance
(let [d #'slam.hound.regrow/alias-distance
max? (partial = Long/MAX_VALUE)]
(is (max? (d "zbc" "abcdef")))
(is (max? (d "azc" "abcdef")))
(is (max? (d "abz" "abcdef")))
(is (max? (d "abcd" "abc")))
(is (= (d "a" "abcdef") 0))
(is (= (d "abc" "abcdef") 0))
(is (= (d "ace" "abcdef") 2))
(is (= (d "fbb" "foo-bar-baz") 6))))

(deftest ^:unit test-disambiguate
(testing "removes namespace matching :name in old-ns-map"
(is (= (disambiguate '#{foo bar} :alias 'foo '{:old-ns-map {:name foo}})
Expand Down Expand Up @@ -92,6 +104,14 @@
(is (= (disambiguate
'#{clojure.string slam.hound.regrow-test} :refer 'trim {})
'[:refer slam.hound.regrow-test])))
(testing "prefers candidates whose initials match the alias"
(is (= (disambiguate '#{xray.yankee.zulu abc} :alias 'xyz {})
'[:alias xray.yankee.zulu])))
(testing "prefers candidates with the shortest alias-distance"
(is (= (disambiguate '#{clojure.string clojure.core} :alias 's {})
'[:alias clojure.string]))
(is (= (disambiguate '#{clojure.string clojure.set} :alias 'st {})
'[:alias clojure.string])))
(testing "prefers shortest candidates when no other predicates match"
(is (= (disambiguate '#{clojure.java.io clojure.set clojure.string}
:alias 'a {})
Expand Down
3 changes: 2 additions & 1 deletion todo.org
Expand Up @@ -23,7 +23,8 @@

- [ ] Improve performance of slam.hound.reload/pre-load-namespaces if possible
* Minor enhancements
- [X] Sort alias candidates by L̶e̶v̶e̶n̶s̶h̶t̶e̶i̶n̶ "alias" distance, with emphasis on initial letters
- [ ] Count dashes as word separators when calculating "alias" distance
- [ ] Satisfy all reflection warnings (there are only a handful)
- [ ] Support Unicode characters in regrow/missing-sym-name
- [ ] Sort alias candidates by Levenshtein distance, with emphasis on initial letters
- [ ] Sort refer candidates by matching arity. e.g. (join []) should prefer clojure.string/join

0 comments on commit 4befa3f

Please sign in to comment.