-
Notifications
You must be signed in to change notification settings - Fork 11
Introduce BatchCodec to substitute BatchIterator #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
That's amazing but those two graphs have very specific structure, and they're small. I'd try a could of billion-node web graphs such as eu or gsh and a small SWH dataset before reaching any conclusion. |
|
Ok, running the benchmarks on eu-2015, another "BatchCodec" I wanted to experiment with is just to write the src and dst as diffs as Little-Endian 64-bit integers in a Zstd stream |
|
Good idea. Also, at least in theory, with wildly different number of partitions/processors, as that affects the distribution, too... |
| impl<L> RadixKey for Triple<L> { | ||
| const LEVELS: usize = 16; | ||
|
|
||
| fn get_level(&self, level: usize) -> u8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be arithmetized if pairs are [usize; 2] instead of (usize, usize), and this is how things were working in the previous code. Is there any particular reason to change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is to have Triple to be #[repr(transparent)] so the transmute is safe. Before we were transmuting between (usize, size, L) and
pub struct Triple<L> {
pair: [usize; 2],
label: L
}We are not guaranteed that they have the same memory layout if the labels have alignment bigger than 16 bytes. Example https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=14c83315faf1a8830cb53aec0af4c803
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this version avoids the bound-check https://godbolt.org/z/vKjh1aqj5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But by adding an unreachable_unchecked, I can remove the bound check, and now the arithmetized version is slightly better https://godbolt.org/z/Tfrsb69dj
Frankly, it is difficult to read any hard evidence from this data, except that with very specifically selected codes grouped gaps perform better. But we don't have the luxury. I mean, at some point we're trying again to compress a graph.
These two lines are inexplicable to me: Golomb is really slow. Albeit the dispatch system might remapped to Rice(1). |
|
Or we can place a bet on the graph being web-like and use π₂ for the successors. That has a significant impact on space and it's fast. That's a bit like the ζ₃ default assumption we make for compression (even if technically the distribution is similar to ζ₄). Maybe we should try this on a small SWH dataset to see what happens, at least for some combination of ɣ, δ and π₂ for successors. It would be really weird if we perform worse on the very only source of financing. 😂 |
|
The docs are amazing BTW. Albeit here and there AI went a bit too far 😂. |
Yeah, Indeed it's not a And yeah, I don't think the change of code makes it that much slower. I think it's an outlier, and I need to re-run the experiment. Perhaps there was an issue with my system. |
|
While transposing the
Another detail is that all these experiments are conducted on LLP-ed graphs, which definitely affects the distribution of gaps and thus the optimal codes. As you suggested, I think |
|
Ok now I'm confused. There are the only combinations you tried or the best ones? |
|
The ones I tried, on the first three batches. I didn't compute the best ones yet |
|
The best codes for the grouped gaps on the first few batches of the latest swh graphs are: Batch 1Batch 2Batch 3 |
e8a71fe to
d8e8721
Compare
d8e8721 to
9e3a1f7
Compare
We had a
BatchIteratorthat would both store the batch on disk and load it.In this PR, I replaced it with a generic trait
BatchCodecthat explicitly defines theencodinganddecodingof batches, allowing for different implementations. The compression format ofBatchIteratoris now done byGapsCodec.This had the nice consequence that now
SortPair,ParSortPair,transpose, andsimplifydon't have to be aware ofBitSerializer,BitDeserializer, and memory-mapping.Moreover, we can now select at compile-time the codes to use (it used to be always Gamma), and there's a new implementation called
GroupedGapsCodecthat, instead of encoding each arc as<src-gap><dst-gap>, it encodes groups of arcs with the same source as<src-gap><outdegree><dst-gap1><dst-gap2>....Here's the effect of these changes on a couple of graphs, running
RUSTFLAGS="-Ctarget-cpu=native" cargo run --release -- transform transpose $GRAPH ${GRAPH}-t.The codec codes are:
GapsCodec<SRC_CODE, DST_CODE>, andGroupedGaps<OUTDEGREE_CODE, SRC_CODE, DST_CODE>.enwiki-2024GapsCodec<Gamma, Gamma>enwiki-2024GapsCodec<Gamma, Delta>enwiki-2024GapsCodec<Unary, Pi(2)>enwiki-2024GapsCodec<Gamma, ExpGolomb(1)>enwiki-2024GroupedGaps<Gamma, Gamma, Gamma>enwiki-2024GroupedGaps<Delta, Gamma, Delta>enwiki-2024GroupedGaps<ExpGolomb(3), Gamma, Delta>enwiki-2024GroupedGaps<ExpGolomb(3), Golomb(2), Pi(2)>enwiki-2024GroupedGaps<ExpGolomb(2), ExpGolomb(1), ExpGolomb(1)twitter-2010GapsCodec<Gamma, Gamma>twitter-2010GapsCodec<Gamma, Delta>twitter-2010GapsCodec<Unary, Pi(2)>twitter-2010GapsCodec<Gamma, ExpGolomb(1)>twitter-2010GroupedGaps<Gamma, Gamma, Gamma>twitter-2010GroupedGaps<Delta, Gamma, Delta>twitter-2010GroupedGaps<ExpGolomb(3), Gamma, Delta>twitter-2010GroupedGaps<ExpGolomb(3), Golomb(2), Pi(2)>twitter-2010GroupedGaps<ExpGolomb(2), ExpGolomb(1), ExpGolomb(1)eu-2015GapsCodec<Gamma, Gamma>eu-2015GapsCodec<Gamma, Delta>eu-2015GapsCodec<Unary, Pi(2)>eu-2015GapsCodec<Gamma, ExpGolomb(1)>eu-2015GroupedGaps<Gamma, Gamma, Gamma>eu-2015GroupedGaps<Delta, Gamma, Delta>eu-2015GroupedGaps<ExpGolomb(3), Gamma, Delta>eu-2015GroupedGaps<ExpGolomb(3), Golomb(2), Pi(2)>eu-2015GroupedGaps<ExpGolomb(2), ExpGolomb(1), ExpGolomb(1)The time measurements are the average of two runs, except for
eu-2015where it's only one run. In the first two cases, all arcs fit in a single batch, which might explain whyUnaryis the best code forsrc; however, on larger graphs, I doubt it will remain optimal. Instead,eu-2015is split into 30 batches of 3'136'103'168 arcs each.Given the results above, I believe that
GroupedGaps<ExpGolomb(3), Gamma, Delta>could be a suitable default that utilises universal codes.