Introduce BatchCodec to substitute BatchIterator #152

zommiommy · 2025-10-11T13:33:27Z

We had a BatchIterator that would both store the batch on disk and load it.

In this PR, I replaced it with a generic trait BatchCodec that explicitly defines the encoding and decoding of batches, allowing for different implementations. The compression format of BatchIterator is now done by GapsCodec.

This had the nice consequence that now SortPair, ParSortPair, transpose, and simplify don't have to be aware of BitSerializer, BitDeserializer, and memory-mapping.

Moreover, we can now select at compile-time the codes to use (it used to be always Gamma), and there's a new implementation called GroupedGapsCodec that, instead of encoding each arc as <src-gap><dst-gap>, it encodes groups of arcs with the same source as <src-gap><outdegree><dst-gap1><dst-gap2>....

Here's the effect of these changes on a couple of graphs, running RUSTFLAGS="-Ctarget-cpu=native" cargo run --release -- transform transpose $GRAPH ${GRAPH}-t.
The codec codes are: GapsCodec<SRC_CODE, DST_CODE>, and GroupedGaps<OUTDEGREE_CODE, SRC_CODE, DST_CODE>.

Graph	Codec	bits / arc	time (s)
`enwiki-2024`	`GapsCodec<Gamma, Gamma>`	16.97	7.921
`enwiki-2024`	`GapsCodec<Gamma, Delta>`	14.23	8.273
`enwiki-2024`	`GapsCodec<Unary, Pi(2)>`	12.99	8.253
`enwiki-2024`	`GapsCodec<Gamma, ExpGolomb(1)>`	16.13	8.588

`enwiki-2024`	`GroupedGaps<Gamma, Gamma, Gamma>`	16.24	7.500
`enwiki-2024`	`GroupedGaps<Delta, Gamma, Delta>`	13.51	8.055
`enwiki-2024`	`GroupedGaps<ExpGolomb(3), Gamma, Delta>`	13.46	8.033
`enwiki-2024`	`GroupedGaps<ExpGolomb(3), Golomb(2), Pi(2)>`	12.25	7.99
`enwiki-2024`	`GroupedGaps<ExpGolomb(2), ExpGolomb(1), ExpGolomb(1)`	15.33	8.052

`twitter-2010`	`GapsCodec<Gamma, Gamma>`	19.13	80.598
`twitter-2010`	`GapsCodec<Gamma, Delta>`	15.57	84.414
`twitter-2010`	`GapsCodec<Unary, Pi(2)>`	14.33	84.634
`twitter-2010`	`GapsCodec<Gamma, ExpGolomb(1)>`	18.28	85.141

`twitter-2010`	`GroupedGaps<Gamma, Gamma, Gamma>`	18.32	79.156
`twitter-2010`	`GroupedGaps<Delta, Gamma, Delta>`	14.77	83.364
`twitter-2010`	`GroupedGaps<ExpGolomb(3), Gamma, Delta>`	14.73	81.955
`twitter-2010`	`GroupedGaps<ExpGolomb(3), Golomb(2), Pi(2)>`	13.49	81.050
`twitter-2010`	`GroupedGaps<ExpGolomb(2), ExpGolomb(1), ExpGolomb(1)`	17.43	81.351

`eu-2015`	`GapsCodec<Gamma, Gamma>`	5.55,	2'949.72
`eu-2015`	`GapsCodec<Gamma, Delta>`	6.05	2'861.18
`eu-2015`	`GapsCodec<Unary, Pi(2)>`	6.27	2'842.616
`eu-2015`	`GapsCodec<Gamma, ExpGolomb(1)>`	4.64	2'846.274

`eu-2015`	`GroupedGaps<Gamma, Gamma, Gamma>`	4.65	2'735.416
`eu-2015`	`GroupedGaps<Delta, Gamma, Delta>`	5.15	2'933.321
`eu-2015`	`GroupedGaps<ExpGolomb(3), Gamma, Delta>`	5.14	3'428.260
`eu-2015`	`GroupedGaps<ExpGolomb(3), Golomb(2), Pi(2)>`	5.20	2'886.386
`eu-2015`	`GroupedGaps<ExpGolomb(2), ExpGolomb(1), ExpGolomb(1)`	3.72	2'898.69

The time measurements are the average of two runs, except for eu-2015 where it's only one run. In the first two cases, all arcs fit in a single batch, which might explain why Unary is the best code for src; however, on larger graphs, I doubt it will remain optimal. Instead, eu-2015 is split into 30 batches of 3'136'103'168 arcs each.

Given the results above, I believe that GroupedGaps<ExpGolomb(3), Gamma, Delta> could be a suitable default that utilises universal codes.

vigna · 2025-10-11T13:58:31Z

That's amazing but those two graphs have very specific structure, and they're small. I'd try a could of billion-node web graphs such as eu or gsh and a small SWH dataset before reaching any conclusion.

zommiommy · 2025-10-11T14:31:51Z

Ok, running the benchmarks on eu-2015, another "BatchCodec" I wanted to experiment with is just to write the src and dst as diffs as Little-Endian 64-bit integers in a Zstd stream

vigna · 2025-10-11T16:57:02Z

Good idea. Also, at least in theory, with wildly different number of partitions/processors, as that affects the distribution, too...

vigna · 2025-10-12T10:27:54Z

webgraph/src/utils/batch_codec/mod.rs

+impl<L> RadixKey for Triple<L> {
+    const LEVELS: usize = 16;
+
+    fn get_level(&self, level: usize) -> u8 {


This can be arithmetized if pairs are [usize; 2] instead of (usize, usize), and this is how things were working in the previous code. Is there any particular reason to change it?

The main reason is to have Triple to be #[repr(transparent)] so the transmute is safe. Before we were transmuting between (usize, size, L) and

pub struct Triple<L> { pair: [usize; 2], label: L }

We are not guaranteed that they have the same memory layout if the labels have alignment bigger than 16 bytes. Example https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=14c83315faf1a8830cb53aec0af4c803

And this version avoids the bound-check https://godbolt.org/z/vKjh1aqj5

But by adding an unreachable_unchecked, I can remove the bound check, and now the arithmetized version is slightly better https://godbolt.org/z/Tfrsb69dj

webgraph/src/utils/batch_codec/mod.rs

webgraph/src/utils/batch_codec/gaps.rs

vigna · 2025-10-12T20:51:01Z

Given the results above, I believe that GroupedGaps<ExpGolomb(3), Gamma, Delta> could be a suitable default that utilises universal codes.

Frankly, it is difficult to read any hard evidence from this data, except that with very specifically selected codes grouped gaps perform better. But we don't have the luxury. I mean, at some point we're trying again to compress a graph.

GroupedGaps<Delta, Gamma, Delta> and GroupedGaps<Gamma, Gamma, Gamma> seems to perform pretty well in a range of cases.

These two lines are inexplicable to me:

eu-2015 | GroupedGaps<ExpGolomb(3), Gamma, Delta> | 5.14 | 3'428.260
eu-2015 | GroupedGaps<ExpGolomb(3), Golomb(2), Pi(2)> | 5.20 | 2'886.386

Golomb is really slow. Albeit the dispatch system might remapped to Rice(1).

vigna · 2025-10-13T07:04:21Z

Or we can place a bet on the graph being web-like and use π₂ for the successors. That has a significant impact on space and it's fast. That's a bit like the ζ₃ default assumption we make for compression (even if technically the distribution is similar to ζ₄).

Maybe we should try this on a small SWH dataset to see what happens, at least for some combination of ɣ, δ and π₂ for successors. It would be really weird if we perform worse on the very only source of financing. 😂

vigna · 2025-10-13T07:18:24Z

The docs are amazing BTW. Albeit here and there AI went a bit too far 😂.

zommiommy · 2025-10-13T09:28:24Z

These two lines are inexplicable to me:
eu-2015 | GroupedGaps<ExpGolomb(3), Gamma, Delta> | 5.14 | 3'428.260
eu-2015 | GroupedGaps<ExpGolomb(3), Golomb(2), Pi(2)> | 5.20 | 2'886.386
Golomb is really slow. Albeit the dispatch system might remapped to Rice(1).

Yeah, Indeed it's not a Golomb(2) but a Rice(1), here's why https://github.com/vigna/dsi-bitstream-rs/blob/34ef68344607414875d9d9ac81f081e7b68fe8df/src/dispatch/static.rs#L104C6-L104C20

And yeah, I don't think the change of code makes it that much slower. I think it's an outlier, and I need to re-run the experiment. Perhaps there was an issue with my system.

zommiommy · 2025-10-15T16:22:12Z

While transposing the 2025-05-18 swh graph, using batches of 4'217'583'744 arcs (~48GB), I get:

Codec	batch 1 bits/arc	batch 2 bits / arc	batch 3 bits/arc
`Gaps<Gamma, Gamma>`	8.46	8.43	7.85
`Gaps<Gamma, Delta>`	7.92	7.90	7.47

`GroupedGaps<Gamma, Gamma, Gamma>`	7.77	7.69	7.12
`GroupedGaps<Delta, Gamma, Delta>`	7.29	7.21	6.77
`GroupedGaps<ExpGolomb(3), Gamma, Delta>`	7.27	7.20	6.76
`GroupedGaps<ExpGolomb(1), ExpGolomb(1), Omega>`	6.62	6.56	6.00

Another detail is that all these experiments are conducted on LLP-ed graphs, which definitely affects the distribution of gaps and thus the optimal codes.

As you suggested, I think GroupedGaps<Delta, Gamma, Delta> is good enough as a default.

vigna · 2025-10-15T16:26:12Z

Ok now I'm confused. There are the only combinations you tried or the best ones?

zommiommy · 2025-10-15T16:27:00Z

The ones I tried, on the first three batches. I didn't compute the best ones yet

zommiommy · 2025-10-15T17:01:08Z

The best codes for the grouped gaps on the first few batches of the latest swh graphs are:

Batch 1

Outdegree stats
  Code: ExpGolomb(1) Size:   826822146
  Code: ExpGolomb(2) Size:   983652894
  Code: Zeta(2)      Size:  1008979093
  Code: Gamma        Size:  1046122192
  Code: Zeta(1)      Size:  1046122192
  Code: ExpGolomb(0) Size:  1046122192
  Code: Omega        Size:  1078801532
  Code: ExpGolomb(3) Size:  1201024790
  Code: Zeta(3)      Size:  1220366868
  Code: Pi(2)        Size:  1233353382
Src stats
  Code: ExpGolomb(1) Size:   687969746
  Code: Zeta(2)      Size:   922633714
  Code: ExpGolomb(2) Size:   926210806
  Code: Omega        Size:   940796998
  Code: Gamma        Size:   946742420
  Code: Zeta(1)      Size:   946742420
  Code: ExpGolomb(0) Size:   946742420
  Code: Zeta(3)      Size:  1178647606
  Code: Pi(2)        Size:  1179326251
  Code: ExpGolomb(3) Size:  1184344760
Dst stats
  Code: Omega        Size: 26389279419
  Code: Zeta(2)      Size: 26456991952
  Code: ExpGolomb(1) Size: 27001073964
  Code: Pi(2)        Size: 28012185117
  Code: Delta        Size: 28529143231
  Code: Zeta(3)      Size: 28577104495
  Code: ExpGolomb(2) Size: 29485168268
  Code: Gamma        Size: 30777499162
  Code: Zeta(1)      Size: 30777499162
  Code: ExpGolomb(0) Size: 30777499162

Batch 2

Outdegree stats
  Code: ExpGolomb(1) Size:   652113812
  Code: ExpGolomb(2) Size:   819015090
  Code: Zeta(2)      Size:   834866657
  Code: Gamma        Size:   857479784
  Code: Zeta(1)      Size:   857479784
  Code: ExpGolomb(0) Size:   857479784
  Code: Omega        Size:   877467199
  Code: ExpGolomb(3) Size:  1024090268
  Code: Zeta(3)      Size:  1036255400
  Code: Pi(2)        Size:  1044289759
Src stats
  Code: ExpGolomb(1) Size:   591930392
  Code: Zeta(2)      Size:   800522374
  Code: ExpGolomb(2) Size:   802174478
  Code: Omega        Size:   817631278
  Code: Gamma        Size:   820035360
  Code: Zeta(1)      Size:   820035360
  Code: ExpGolomb(0) Size:   820035360
  Code: Zeta(3)      Size:  1024576369
  Code: Pi(2)        Size:  1025245511
  Code: ExpGolomb(3) Size:  1027365132
Dst stats
  Code: Omega        Size: 26424759521
  Code: Zeta(2)      Size: 26534666897
  Code: ExpGolomb(1) Size: 26920562118
  Code: Pi(2)        Size: 28139074459
  Code: Delta        Size: 28539926175
  Code: Zeta(3)      Size: 28619985395
  Code: ExpGolomb(2) Size: 29398177596
  Code: Gamma        Size: 30762480430
  Code: Zeta(1)      Size: 30762480430
  Code: ExpGolomb(0) Size: 30762480430

Batch 3

Outdegree stats
  Code: ExpGolomb(1) Size:   701958214
  Code: ExpGolomb(2) Size:   823630262
  Code: Zeta(2)      Size:   848584593
  Code: Gamma        Size:   881486766
  Code: Zeta(1)      Size:   881486766
  Code: ExpGolomb(0) Size:   881486766
  Code: Omega        Size:   912592540
  Code: ExpGolomb(3) Size:   998793882
  Code: Zeta(3)      Size:  1017715602
  Code: Pi(2)        Size:  1029976088
Src stats
  Code: ExpGolomb(1) Size:   546081978
  Code: Zeta(2)      Size:   748727998
  Code: ExpGolomb(2) Size:   752494884
  Code: Omega        Size:   758899516
  Code: Gamma        Size:   764892138
  Code: Zeta(1)      Size:   764892138
  Code: ExpGolomb(0) Size:   764892138
  Code: Zeta(3)      Size:   965445015
  Code: Pi(2)        Size:   965453262
  Code: ExpGolomb(3) Size:   970791896
Dst stats
  Code: Omega        Size: 24037769582
  Code: ExpGolomb(1) Size: 24387505744
  Code: Zeta(2)      Size: 24550143998
  Code: Pi(2)        Size: 26644930965
  Code: Delta        Size: 26736720380
  Code: Zeta(3)      Size: 26909967899
  Code: ExpGolomb(2) Size: 27455899062
  Code: Gamma        Size: 28367620096
  Code: Zeta(1)      Size: 28367620096
  Code: ExpGolomb(0) Size: 28367620096

vigna reviewed Oct 12, 2025

View reviewed changes

webgraph/src/utils/batch_codec/mod.rs Show resolved Hide resolved

vigna reviewed Oct 12, 2025

View reviewed changes

webgraph/src/utils/batch_codec/gaps.rs Outdated Show resolved Hide resolved

zommiommy force-pushed the batch-codec branch from e8a71fe to d8e8721 Compare November 11, 2025 21:25

zommiommy and others added 4 commits November 11, 2025 22:30

Introduce BatchCodec to substitute BatchIterator

74bc9cb

BatchCodec accepts Endianness

01f88b4

Docs review

c640566

BatchCodec print stats and encoding time

9e3a1f7

zommiommy force-pushed the batch-codec branch from d8e8721 to 9e3a1f7 Compare November 11, 2025 21:30

vigna merged commit 6f920f6 into main Nov 14, 2025
7 checks passed

Introduce BatchCodec to substitute BatchIterator #152

Introduce BatchCodec to substitute BatchIterator #152

Uh oh!

Conversation

zommiommy commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vigna commented Oct 11, 2025

Uh oh!

zommiommy commented Oct 11, 2025

Uh oh!

vigna commented Oct 11, 2025

Uh oh!

vigna Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

zommiommy Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zommiommy Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

zommiommy Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vigna commented Oct 12, 2025

Uh oh!

vigna commented Oct 13, 2025

Uh oh!

vigna commented Oct 13, 2025

Uh oh!

zommiommy commented Oct 13, 2025

Uh oh!

zommiommy commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vigna commented Oct 15, 2025

Uh oh!

zommiommy commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zommiommy commented Oct 15, 2025

Batch 1

Batch 2

Batch 3

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zommiommy commented Oct 11, 2025 •

edited

Loading

zommiommy Oct 12, 2025 •

edited

Loading

zommiommy commented Oct 15, 2025 •

edited

Loading

zommiommy commented Oct 15, 2025 •

edited

Loading