Idea: Reduce GC pressure by representing domains more memory efficient #123

pvdz · 2016-07-22T12:20:38Z

Right now, about 40~50% of the time spent is GC. We can't really control that directly so we must try to alleviate the pressure on it. I think by far the biggest pressure comes from generating lots of objects, especially the arrays for the domains. So let's change that.

One part has already been done here; domains only containing values lower than 31 will become bitfields. This already brought some huge savings.

The other part is the larger domains, which we can't efficiently express in terms of bitfields. Let's try to turn them into strings.

Currently the SUP is 100000000, which is well above 16bit but well below 32bit. This means every (range) value of a domain can't be encoded in one 16bit char, but we can have two of them. Or maybe even better, use the UTF way of using a high-bit for multi-byte overflow.

I am worried that this will create a lot of string concat overhead on its own. Perhaps this turns out to be superduper fast though so we'll just have to give this a spin. Not a trivial change to flush through the system but it has high potentials.

The text was updated successfully, but these errors were encountered:

jonnor · 2016-07-22T12:26:19Z

Is there a fixed number of domain values (not variable) per var? If so, maybe can keep an Unit32Array addressed by the index of the trie that corresponds to the var?

pvdz · 2016-07-22T12:30:47Z

Not entirely sure what you're asking. A var can have a domain which can contain any number of values between SUB and SUP, inclusive. Those constants are 0 and 100000000 respectfully (they are arbitrary and have been since FD.js). As an optimization, the array only stores range pairs, so instead of an array with 100000000 elements you get two elements [0, 100000000]. If the range is not continuous you get multiple such ranges.

I can think of some implementations with Tries, but they all feel useless (or worse) to me.

pvdz · 2016-07-22T12:32:56Z

To illustrate, this ticket proposes to encode those domains as strings. A domain [99, 103] would encode as the string "\0c\0g". Obviously most domains don't encode in a legible string, but that's completely irrelevant.

pvdz · 2016-07-22T12:47:59Z

The utf encoding is interesting. So there are two ways of encoding the string;

encode them as lo-hi 16bit values pairs (per number, so two pairs per range in the domain).

This has the advantage of being able to search efficiently in the string since you know how many bytes each range will take up. You can get the middle of the domain (in terms of ranges, at least) without having to walk through the domain.

The downside is, of course, that it takes more space than is required. Many values will be below 16bit which means our encoded strings would have a lot of wasted space.

encode them in a UTF fashion, which uses the high bit to determine whether the next byte is also part of the current value. In our case that means a number can be encoded as one or two 16bit "words".

The dis/advantages are opposite of the fixed encoding. String length is unpredictable since every number may be encoded as either one or two "words". Far less wasted space.

Note that this wastes no space in our case since our highest number is still well below 32bit so numbers cannot end up encoded as three "words".

From the top of my head we currently don't really use predictability to get to the middle of a domain. There are strategies that would use it, though, which we currently don't actually use to the best of my knowledge.

I think it's something that we could switch relatively easily as long as all domains are de/constructed with central methods.

pvdz · 2016-07-22T14:18:33Z

I'm going to pursue fixed sized domains because I don't think the overhead is super relevant at the moment while I think it does make a lot of the related code easier to work with.

pvdz · 2016-08-01T12:42:52Z

Implemented the string domains but they didn't make much of a difference.

pvdz mentioned this issue Jul 22, 2016

[FD] performance improvement ideas #101

Open

22 tasks

pvdz closed this as completed Aug 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Reduce GC pressure by representing domains more memory efficient #123

Idea: Reduce GC pressure by representing domains more memory efficient #123

pvdz commented Jul 22, 2016

jonnor commented Jul 22, 2016

pvdz commented Jul 22, 2016

pvdz commented Jul 22, 2016 •

edited

Loading

pvdz commented Jul 22, 2016 •

edited

Loading

pvdz commented Jul 22, 2016

pvdz commented Aug 1, 2016

Idea: Reduce GC pressure by representing domains more memory efficient #123

Idea: Reduce GC pressure by representing domains more memory efficient #123

Comments

pvdz commented Jul 22, 2016

jonnor commented Jul 22, 2016

pvdz commented Jul 22, 2016

pvdz commented Jul 22, 2016 • edited Loading

pvdz commented Jul 22, 2016 • edited Loading

pvdz commented Jul 22, 2016

pvdz commented Aug 1, 2016

pvdz commented Jul 22, 2016 •

edited

Loading

pvdz commented Jul 22, 2016 •

edited

Loading