-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A question on fluokitten performance #27
Comments
Can you share an example project with the specific dependencies and exactly the code that you tried? For example, it the code you've posted, |
Just for reference, I've just tried this code with the current Neanderthal snapshot on the same machine I used for writing that post (i7 4790k) and got the following: (with-progress-reporting (quick-bench (foldmap p+ 0.0 p* nx ny)))
;; Execution time mean : 189.349159 µs It's a little bit faster than reported in the blog post (198 µs) but close. Please try to run that benchmark on your machine a couple of times and see whether there are some changes. Maybe the combination of the version and the settings of your JVM/OS have something to do with that, but it is difficult to say without data. |
I should have specified all parameters at the first place. All configurations are:
The computer I use is slow, yet it doesn't explain the huge performance differences. I also tried it on another Xeon E5-2686 machine this morning -- same results. Further checking shows: (fold (fmap * nx ny))
;; => ClassCastException clojure.core$_STAR_ cannot be cast to clojure.lang.IFn$DDD uncomplicate.neanderthal.impl.buffer-block/vector-fmap* (buffer_block.clj:349) Yet in my tests, this code runs. I suspect you have a different |
I think that I know what might be the source of the problem: the old Clojure compiler was somewhat inconsistent in applying protocol implementations, so it dispatches to the non-primitive function implementation in your case (why? I don't know). In my tests, Clojure 1.10 fixed this non-determinism. Please upgrade the project to 1.10 and report the timings. BTW |
Just tested on Cojure 1.10, but still the performance is sluggish. Also |
Here I create a project so you can check it out: fluokitten_test. Below are my run results:
|
When I tried your project as-is on my computer (but starting the benchmark from the repl instead of main), I got 4ms. Then I added the direct linking option to :jvm-opts ^:replace ["-Dclojure.compiler.direct-linking=true"
"-XX:MaxDirectMemorySize=16g" "-XX:+UseLargePages"] I restarted your project a few times with different versions of neanderthal (SNAPHOT and 0.20.4) and Clojure 1.8.0, 1.10.0, and I always got the same result. However, when I started the repl from the benchmarks example project (https://github.com/uncomplicate/neanderthal/blob/master/examples/benchmarks/src/benchmarks/map_reduce.clj) I always get around 200 microseconds, as reported in the blog post. So, it is definitely related to JVM/Clojure compiler settings, and possibly order in whick Clojure loads namespaces. I don't have time now to compare your project further and see if there is another setting that you've missed. Can you try the code from the benchmarks project and report your numbers (seeing that our CPUs got 20ms vs 4 ms for the initial version, I should expect that you'll get around 1ms with the benchmarks project)? |
Thanks for the prompt reply. I will check under the options provided, and report back later. |
Now I found the cause of the issue. In addition to the direct-linking options you pointed out, another factor is the The fluokitten_test is updated with the change. Now I can get around 240us as the end results. |
I followed the instructions in Fast Map and Reduce for Primitive Vectors and Matrices to test the performance of neaderthal and fluokitten. What I found are:
For instance,
Also, the same tests were performed using the current dev version of fluokitten and it also ended up with much slower execution time than what in the post. I wonder what might be causes of such large discrepancies?
It would be great if someone can also conduct similar tests.
The text was updated successfully, but these errors were encountered: