Skip to content
This repository has been archived by the owner on Nov 9, 2017. It is now read-only.

Add efficient rseq support #1

Closed
wants to merge 1 commit into from
Closed

Conversation

glenjamin
Copy link

I'm fairly new to simple-check and the protocols and data-types side of clojure, so please be as picky as you like on the review :)

Since we have a sorted map, i figure it should be easy enough to do a reverse sequence.

I note that the existing Seq implementation is not lazy, so i've just implemented a basic reverse iterator around this rather than walking the tree right-first - I was actually fairly surprised core clojure doesn't provide Reversible for ArrayLists already.

Below is a benchmark showing this provides very little overhead vs a forward sequence.

Would you be interested in a properly lazy implementation of Seq? I'm only about halfway through the paper that describes Patricia trees, so I'm not 100% clear if this is possible (I think yes?)

user=> (require '[immutable-int-map-test :as it]
  #_=>          '[immutable-int-map :as i]
  #_=>          '[criterium.core :as c])
nil
user=>

user=> (let [m (into (i/int-map) it/entries)]
  #_=>   (println "seq")
  #_=>   (c/quick-bench (seq m))
  #_=>   (println "rseq")
  #_=>   (c/quick-bench (rseq m)))
seq
WARNING: Final GC required 250.84424916979708 % of runtime
Evaluation count : 24 in 6 samples of 4 calls.
             Execution time mean : 25.983582 ms
    Execution time std-deviation : 933.981058 µs
   Execution time lower quantile : 25.099248 ms ( 2.5%)
   Execution time upper quantile : 26.978498 ms (97.5%)
                   Overhead used : 1.811797 ns
rseq
WARNING: Final GC required 186.5399445093766 % of runtime
Evaluation count : 24 in 6 samples of 4 calls.
             Execution time mean : 26.184582 ms
    Execution time std-deviation : 927.418740 µs
   Execution time lower quantile : 24.723498 ms ( 2.5%)
   Execution time upper quantile : 26.955748 ms (97.5%)
                   Overhead used : 1.811797 ns
nil

@ztellman
Copy link
Owner

ztellman commented May 8, 2014

This is actually getting turned into clojure.data.int-map, and adding in more code will require you to sign a CA and post to the Clojure-Dev mailing list, so I'm going to hold off on merging this. Sorry for not having mentioned this earlier.

Laziness adds a fair amount of overhead, and I'm not sure whether it's particularly useful. For what it's worth, though, since your reversed seq is lazy, saying that "it adds very little overhead" isn't quite the whole story. I'd add a dorun around each, to get a more accurate understanding of the overhead.

@glenjamin
Copy link
Author

For what it's worth, though, since your reversed seq is lazy, saying that "it adds very little overhead" isn't quite the whole story. I'd add a dorun around each, to get a more accurate understanding of the overhead.

d'oh - thats rather obvious in retrospect

This is actually getting turned into clojure.data.int-map, and adding in more code will require you to sign a CA and post to the Clojure-Dev mailing list, so I'm going to hold off on merging this. Sorry for not having mentioned this earlier.

I'm happy to do this if needed, although I haven't really added anything useful yet :)

Laziness adds a fair amount of overhead, and I'm not sure whether it's particularly useful.

My current use-case is for a large league table calculation: Roughly 1 million people predict 7 outcomes each, and are ranked according to how close they are. Each time the state of the outcomes changes the table is re-ranked. I'm experimenting with Clojure to see how quickly I can perform this re-rank by keeping all the data in memory. The currently live naive "batch" approach takes about 6 minutes, the current Clojure prototype takes ~20 seconds.

For kicks and my own personal interest, i'm trying out a few libraries to see if i can get this to go even faster, which is where the int-map comes in.

I'm currently using clojure.data.avl to hold the computed table itself, I was looking at trying to plug int-map into this but the current implementation of seq means that my top-15 function reads all 1millon key/vals into an arraylist - hence the desire for laziness in the seq/rseq case. From a quick skim of the code, avl appears to define a type specifically to expose its map as a lazy seq: https://github.com/clojure/data.avl/blob/master/src/main/clojure/clojure/data/avl.clj#L1111

The current scoring system equates higher to better, which is why I wanted an rseq - but there's nothing stopping me making the internal representation max-score - score to reverse the sort.

I hope that all makes sense!

@glenjamin
Copy link
Author

After adding the dorun I noticed a big slowdown, which ended up being due to accidentally hitting reflection. This is now as efficient as initially intended (ie. slight overhead vs seq).

@ztellman ztellman closed this Sep 15, 2014
@glenjamin
Copy link
Author

Is it worth me opening this as an issue on jira now?

@ztellman
Copy link
Owner

If you have a CA signed, that would be great. Sorry for the extra work.

On Mon, Sep 15, 2014 at 1:13 PM, Glen Mailer notifications@github.com
wrote:

Is it worth me opening this as an issue on jira now?


Reply to this email directly or view it on GitHub
#1 (comment)
.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants