Test PR to measure accuracy and performance of Event size computation #17758

andsel · 2025-06-30T09:56:33Z

Summary

This PR is used to verify how Event memory estimation kinds varies, asked in #17736, respect to accuracy and performance.
All should be compared against a byte-perfect measure, ideally considering object headers and all the details about memory layout alignment covered by JOL library. However, the JOL library when used to compute the retained size of an event also consider the references down to the JRuby runtime, like hprof (heap dump files) does when analyzed with tools like Eclipse Memory Analyzer does. So determining what's the byte-perfect real size of an Event is not so obvious.

How the test is conducted?

Test fixtures

The test consider various form of events, regarding nesting of layers fields and size of the values assigned to each field.
I tested 3 sizes of values: 11 bytes, 512 and 2KB. Each event has 6 layers of nested maps with 10 elements in each node.
Another test was done with 2KB payload and quite flat event (2 layers only) with 10 keys each, to understand how the measures move reducing the nesting of events. I think having 6 layers of nested values could be an unusual case for a Logstash event.

Test structure

The test is composed of 2 halves:

measure the size of events by three methods (heap dump is for reference)
benchmark the performances of the three methods to understand how they varies varying event size and structure

Each run generates an heap dump, that was opened with Eclipse memory analyzer to calculate the retained size of the single org.logstash.Event present.
Also the JOL computed the retained size, which means that contains also the full JRuby runtime, because the event contains JRuby strings that has reference to the underlying JRuby classes.

Size measures results

Values are in bytes, the variation in map navigation and cbor is calculated against the raw size.

test name	raw	map navigation	cbor	jol (retained)	hprof(retained)
apache 1KB	983	600(-38.96%)	1384 (40.79%)	12048416	3504
apache 2KB	2339	1865(-20.27%)	2776 (18.68%)	12107000	5128
apache 4KB	3057	2521(-17.53%)	3534 (15.60%)	12109536	6216
apache 16KB	16383	16144(-1.46%)	16754(2.26%)	12152984	20048
apache 32KB	32767	32528(-0.73%)	33154(1.18%)	12217176	38096
apache 128KB	131071	130832(-0.18%)	131534(0.35%)	12505896	146224
cloudTrail 1KB	1602	493(-69.23%)	2167 (35.27%)	12116952	5368
cloudTrail 2KB	2465	730(-70.39%)	3152 (27.87%)	12120408	7648
cloudTrail 4KB	3078	989(-67.87%)	3822 (24.17%)	12122640	9200
cloudTrail 16KB	16384	15561(-5.02%)	17036(3.98%)	12389616	21412
cloudTrail 32KB	32768	31945(-2.51%)	33432(2.03%)	12407640	39440
cloudTrail 128KB	131072	130249(-0.63%)	131811(0.56%)	12749432	147576
snmp 1KB	856	394(-53.97%)	1730(102.10%)	12116264	4944
snmp 2KB	1739	925(-46.81%)	3242 (86.43%)	12119832	8656
snmp 4KB	3017	1723(-42.89%)	5389 (78.62%)	12126184	13776
snmp 16KB	20535	11167(-45.62%)	28314(37.88%)	12678112	73160
snmp 32KB	41125	22385(-45.57%)	56430(37.22%)	12727432	145296
snmp 128KB	165100	89930(-45.53%)	225720(36.72%)	13265664	579640

Calculation benchmarks

Values are ops/second (higher better), the results are in ops/microsecond except for JOL which are in ops/second

Small set of benchmark executed running for 30 seconds:

Benchmark	map navigation (ops/ms)	cbor (ops/ms)
apache 1KB	3416.043 ± 116.241 (x6.9)	496.853 ± 6.772
apache 2KB	2869.710 ± 35.520 (x8.1)	352.564 ± 4.181
apache 4KB	2553.733 ± 20.230 (x8.6)	295.903 ± 2.774
apache 16KB	1562.214 ± 15.322 (x16.5)	94.704 ± 0.648
apache 32KB	532.964 ± 10.288 (x10.0)	53.366 ± 0.575
apache 128KB	232.794 ± 6.071 (x15.8)	14.688 ± 0.194

Full set of benchmark executed running for 3 seconds:

Benchmark	map navigation (ops/ms)	cbor (ops/ms)	JOL (ops/s)
apache 1KB	3411.148 ± 269.988 (x7.0)	486.767 ± 34.517	2.341 ± 0.159
apache 2KB	2824.454 ± 191.709 (x8.1)	349.975 ± 25.439	2.230 ± 0.300
apache 4KB	2399.100 ± 166.685 (x8.3)	289.526 ± 19.217	2.312 ± 0.129
apache 16KB	1618.269 ± 66.494 (x17.0)	95.368 ± 7.417	2.328 ± 0.145
apache 32KB	547.731 ± 33.207 (x10.7)	51.898 ± 2.962	1.935 ± 0.103
apache 128KB	233.352 ± 10.044 (x16.6)	14.653 ± 0.877	2.345 ± 0.136
cloudTrail 1KB	995.575 ± 28.435 (x4.0)	245.268 ± 9.794	2.379 ± 0.138
cloudTrail 2KB	654.018 ± 32.642 (x3.3)	197.738 ± 16.743	2.347 ± 0.129
cloudTrail 4KB	604.989 ± 26.025 (x3.7)	161.719 ± 11.014	1.997 ± 0.096
cloudTrail 16KB	612.762 ± 25.644 (x6.9)	88.038 ± 6.166	2.074 ± 0.133
cloudTrail 32KB	551.232 ± 30.878 (11.2)	49.984 ± 2.780	2.152 ± 0.143
cloudTrail 128KB	258.711 ± 12.238 (x18.4)	14.476 ± 1.290	2.081 ± 0.102
snmp 1KB	1128.517 ± 33.982 (x3.6)	312.146 ± 20.351	2.118 ± 0.139
snmp 2KB	715.210 ± 34.349 (x4.2)	168.884 ± 9.136	2.315 ± 0.107
snmp 4KB	294.513 ± 84.895 (x4.7)	61.864 ± 22.254	1.373 ± 0.453
snmp 16KB	115.842 ± 9.596 (x4.8)	23.650 ± 1.413	2.456 ± 0.151
snmp 32KB	42.389 ± 5.794 (x3.5)	11.942 ± 0.707	1.708 ± 0.137
snmp 128KB	14.783 ± 1.564 (x4.8)	2.936 ± 0.202	2.391 ± 0.136

Analysis of the results

JOL and hprof provides retained size of the object graph. Hprof is not a viable solution for runtime measures and it's used only as benchmark. JOL navigate the graph more deeply and takes in a big chunk of the JRuby runtime classes (I think).
ConvertedMap custom navigation is constantly less than the real size for such small events and CBOR is constantly above the raw size. The weight of the variation is influenced by the event structure.
ConvertedMaps calculation doesn't contains the keys because are interned, and that would justify the fact that the delta against raw is constantly negative.
From a performance perspective the ConvertedMap custom navigation performs better than CBOR serialization and JOL. JOL is orders of magnitude slower than the other (measured in seconds instead of milliseconds).
Map navigation and CBOR are in the order of millions of ops per second, so doesn't provide any performance penalty for Logstash.

…ues (heap dump, custom hashmaps navigator and CBOR serialization)

github-actions · 2025-06-30T09:56:44Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

mergify · 2025-06-30T09:57:15Z

This pull request does not have a backport label. Could you fix it @andsel? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
If no backport is necessary, please add the backport-skip label

…orld use cases

…ent size estimation

…tead of each invokation

elasticmachine · 2025-07-08T07:09:33Z

💔 Build Failed

Buildkite Build
Commit: 9ac11d1

Failed CI Steps

History

💔 Build #3086 failed be7d5fe
💔 Build #3084 failed d47de1a
💔 Build #3081 failed ce8a19e
💔 Build #3073 failed 9067f70

andsel added 2 commits June 30, 2025 09:41

Add test to compare Event retained size computed with various techniq…

88c4f87

…ues (heap dump, custom hashmaps navigator and CBOR serialization)

Added benchmark to compare ConvertedMap navigation with CBOR estimations

9067f70

Added computation using JOL

ce8a19e

andsel mentioned this pull request Jun 30, 2025

Track size of emitted batches in a monitoring metric #7417

Open

andsel added 5 commits July 2, 2025 11:48

Changed size test and benchmark to use events more adherent to real w…

d47de1a

…orld use cases

Extended test of size computation for 16KB, 32KB, 128KB

3639c47

Updated benchmark to cover 16, 32 and 128 KB cases

18e3f9b

Added benchmark to measure the impact of JSON deserialization over Ev…

be7d5fe

…ent size estimation

Switched benchmark setup Level to execute just one time for trial ins…

9ac11d1

…tead of each invokation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test PR to measure accuracy and performance of Event size computation #17758

Test PR to measure accuracy and performance of Event size computation #17758

andsel commented Jun 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

mergify bot commented Jun 30, 2025

Uh oh!

elasticmachine commented Jul 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Test PR to measure accuracy and performance of Event size computation #17758

Are you sure you want to change the base?

Test PR to measure accuracy and performance of Event size computation #17758

Conversation

andsel commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How the test is conducted?

Test fixtures

Test structure

Size measures results

Calculation benchmarks

Analysis of the results

Uh oh!

github-actions bot commented Jun 30, 2025

🤖 GitHub comments

Uh oh!

mergify bot commented Jun 30, 2025

Uh oh!

elasticmachine commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💔 Build Failed

Failed CI Steps

History

Uh oh!

Uh oh!

andsel commented Jun 30, 2025 •

edited

Loading

elasticmachine commented Jul 8, 2025 •

edited

Loading