Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression #209

Closed
Zlika opened this issue Mar 21, 2013 · 14 comments
Closed

Performance regression #209

Zlika opened this issue Mar 21, 2013 · 14 comments

Comments

@Zlika
Copy link

Zlika commented Mar 21, 2013

Hi all,

I discover a slow performance drift on the last versions of JNA.
I made 2 tests that run 2 native fonctions. The prototypes of the native fonctions are:

  • for test 1: int foo(int, int, int, int[])
  • for test 2 : int foo(int, struct*)

(Note: the 2 test functions are "real" native functions that do some real work, so the test durations do not only reflect the duration of the native calls but also the actual work of these functions).
For each test, I measure the duration of a loop of 100000 function calls with different versions of JNA (each time using Direct Mapping).

  • For test 1: 4650ms with JNA 3.2.7, 5039ms with JNA 3.4, 5439ms with JNA 3.5.1
  • For test 2: 4450ms with JNA 3.2.7, 5172ms with JNA 3.4, 5437ms with JNA 3.5.1

Are you aware of this performance drift?

Regards,
Thomas

@twall
Copy link
Contributor

twall commented Mar 21, 2013

Thanks for the investigation.

What platform are you running on?

There are some performance tests (not automatically run) in JNA's unit test suite; I believe there are some baseline numbers recorded for comparison between JNI, interface, and direct mapping (test/com/sun/jna/PerformanceTest.java).

Offhand it might be some changes to libffi; I don't recall many changes to the direct mapping stuff, although I'd have to run diffs to be sure.

On Mar 21, 2013, at 6:30 AM, Zlika wrote:

Hi all,

I discover a slow performance drift on the last versions of JNA.
I made 2 tests that run 2 native fonctions. The prototypes of the native fonctions are:

• for test 1: int foo(int, int, int, int[])
• for test 2 : int foo(int, struct*)
(Note: the 2 test functions are "real" native functions that do some real work, so the test durations do not only reflect the duration of the native calls but also the actual work of these functions).
For each test, I measure the duration of a loop of 100000 function calls with different versions of JNA (each time using Direct Mapping).

• For test 1: 4650ms with JNA 3.2.7, 5039ms with JNA 3.4, 5439ms with JNA 3.5.1
• For test 2: 4450ms with JNA 3.2.7, 5172ms with JNA 3.4, 5437ms with JNA 3.5.1
Are you aware of this performance drift?

Regards,
Thomas


Reply to this email directly or view it on GitHub.

@Zlika
Copy link
Author

Zlika commented Mar 21, 2013

The tests were executed on Windows XP 32bit.

@twall
Copy link
Contributor

twall commented Mar 22, 2013

Please run com.sun.jna.PerformanceTest using all three versions and post the results.

@twall twall closed this as completed Mar 24, 2013
@twall
Copy link
Contributor

twall commented Mar 24, 2013

I see a drop in performance from 3.2.7 to 3.4.0 on OSX (64-bit).

@twall twall reopened this Mar 24, 2013
@twall
Copy link
Contributor

twall commented Mar 25, 2013

Comparison of three versions on OSX (64-bit). Checking performance of different access methods (100000 iterations, in ms).

                                                 3.2.7   3.4.0  3.5.2
                                                 -----   -----  -----
cos (JNA interface):                             263     266    268
cos (JNA function):                              151     145    140
cos (JNA direct):                                25      54     57
cos (JNI ffi):                                   26      26     26
cos (JNI):                                       7       8      8
cos (pure java):                                 5       5      4
memset (JNA interface):                          267     249    252
memset (JNA function):                           142     173    139
memset (JNA direct Pointer/size_t):              46      107    83
memset (JNA direct Pointer/primitive):           28      68     59
memset (JNA direct primitives):                  28      81     59
memset (JNI ffi):                                43      50     43
memset (JNI):                                    1       2      2
strlen (JNA interface):                          512     724    514
strlen (JNA function):                           320     392    242
strlen (JNA direct - String):                    105     225    156
strlen (JNA direct - Pointer):                   135     217    171
strlen (JNA direct - byte[]):                    83      206    143
strlen (JNA direct - Buffer):                    65      145    141
strlen (JNI ffi):                                29      42     30
direct Buffer write:                             1       2      1
direct Buffer write (bulk):                      40      50     39
Memory write:                                    8       14     9
Memory write (bulk):                             31      33     28
callback (JNA interface):                        231     283    232
callback (JNA direct):                           77      213    193
callback w/NativeMapped (JNA interface):         317     356    307
callback w/NativeMapped (JNA direct):            235     264    249

@twall
Copy link
Contributor

twall commented Mar 25, 2013

Run with -Djna.preserve_last_error=false and see what you get.

@twall
Copy link
Contributor

twall commented Mar 25, 2013

Default for preserve last error must have been "off" prior to 3.4.0. With this explicitly turned off, the numbers for 3.5.2 fall back in line with 3.2.7.

There still seem to be performance issues in direct mode with byte[]/Buffer arguments and callbacks, though.

@Zlika
Copy link
Author

Zlika commented Mar 25, 2013

I executed the performance test on Linux (Ubuntu 12.04 64bit) because it is more simple to compile JNA on Linux then on Windows.
Here are the results. There is indeed an improvement with -Djna.preserve_last_error=false (except for callbacks).

                                       3.2.7     3.4.0     3.5.1   3.5.1(jna.preserve_last_error=false)
cos (JNA interface):                   229ms     225ms     232ms   192ms
cos (JNA function):                    122ms     121ms     135ms   115ms
cos (JNA direct):                       22ms      42ms      43ms    24ms
cos (JNI ffi):                          27ms      29ms      29ms    27ms
cos (JNI):                               6ms       6ms       6ms     6ms
cos (pure java):                         4ms       4ms       4ms     5ms
memset (JNA interface):                228ms      209ms    221ms   189ms
memset (JNA function):                 126ms      123ms    122ms    98ms
memset (JNA direct Pointer/size_t):     38ms       60ms     60ms    46ms
memset (JNA direct Pointer/primitive):  27ms       47ms     49ms    28ms
memset (JNA direct primitives):         27ms       48ms     50ms    27ms
memset (JNI ffi):                       30ms       30ms     31ms    32ms
memset (JNI):                            N/A        5ms      5ms     5ms
strlen (JNA interface):                349ms      368ms    386ms   277ms
strlen (JNA function):                 212ms      201ms    202ms   243ms
strlen (JNA direct - String):           60ms      124ms    138ms    66ms
strlen (JNA direct - Pointer):         117ms      112ms    119ms   130ms
strlen (JNA direct - byte[]):           52ms      125ms    148ms    58ms
strlen (JNA direct - Buffer):           97ms       99ms    104ms    79ms
strlen (JNI ffi):                       44ms       33ms     34ms    33ms
direct Buffer write:                     6ms        5ms      5ms     5ms
direct Buffer write (bulk):             32ms       33ms     32ms    32ms
Memory write:                            8ms        5ms      8ms     8ms
Memory write (bulk):                    41ms       33ms     33ms    33ms
callback (JNA interface):              182ms      197ms    203ms   204ms
callback (JNA direct):                  59ms      165ms    166ms   204ms
callback w/NativeMapped (JNA interface):230ms     231ms    242ms   245ms
callback w/NativeMapped (JNA direct):  176ms      179ms    198ms   183ms

@Zlika
Copy link
Author

Zlika commented Mar 26, 2013

I just made a test on one of my applications involving high speed DMA transfers: the use of -Djna.preserve_last_error=false lead to a performance increase (in terms of data transfer rate) of up to x3!
As we can see, the default value for jna.preserve_last_error comes with an important performance penalty. Wouldn't it make sens to revert its default value to false?

@twall
Copy link
Contributor

twall commented Mar 26, 2013

Well, if you're calling a function and expect to be able to call errno/GetLastError, you're not expecting to have to jump through extra hoops to get operational correctness.

On the other hand, if raw performance is important to you, you're usually willing to make a few tweaks to get it.

Currently the overhead is in making a call back to the VM for thread-local storage; if that were implemented natively it'd avoid the extra performance hit (at the time it was easier to call back to Java than to implement per-platform thread-local storage).

On Mar 26, 2013, at 5:56 AM, Zlika wrote:

I just made a test on one of my applications involving high speed DMA transfers: the use of -Djna.preserve_last_error=false lead to a performance increase (in terms of data transfer rate) of up to x3!
As we can see, the default value for jna.preserve_last_error comes with an important performance penalty. Wouldn't it make sens to revert its default value to false?


Reply to this email directly or view it on GitHub.

@twall
Copy link
Contributor

twall commented Mar 26, 2013

Try out the improved_last_error branch. Should see better performance even without setting jna.preserve_last_error false.

@twall
Copy link
Contributor

twall commented Mar 26, 2013

The direct callback regression is an artifact of the test configuration (the direct and non-direct versions of the callback overwrite one another in cache, which isn't likely to happen in real life).

@twall
Copy link
Contributor

twall commented Apr 3, 2013

Resolved and fixed, to be released as 3.6.0.

@twall twall closed this as completed Apr 3, 2013
@Zlika
Copy link
Author

Zlika commented Apr 3, 2013

I'm sorry I didn't have time to test your fixes because of the birth of my first child :-)
I will test the 3.6 release.
Thanks for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants