Unknown length encoding #5

christian-marie · 2014-02-21T00:22:36Z

In the case of encoding protobufs, we don't really know how long the encoded buffer will be beforehand.

Have you had any thoughts on how to implement dynamic size encoding?

I was thinking either -- do something crazy and realloc the buffer 2x every time you attempt to write past the end, or, use something sane like ByteString's builder (blaze).

Do you have an opinion?

christian-marie · 2014-02-21T03:36:40Z

So I just had a think about this and went to see how the first implementation would look. The trouble with the 2x or ^2 allocation is that you have to re-run the whole mondaic chain, which will obviously either waste lots of memory or be slow :(

vincenthz · 2014-02-21T09:14:24Z

Yes, this is something I thought about.

One thing is the builder approach is terribly slow, so bytestring's builder (or blaze) is not a good option.

My favorite option is a consumer callback approach, where you have an IO callback when the buffer is "full" and that you need to process it. Usually you would tie this to a flush-to-handle or something similar, but I think the same approach could give a list of bytestring output too, and remain externally pure (no IO exposed). In any case, reallocating memory is not a good option.

For the decoding, I want to take the same approach except that you'ld have a producer callback.

Every non singular byte type (word16, word32, ..) would have to have a slow version that would use byte access to get things too.

istathar · 2014-05-01T05:41:29Z

Consumer. Producer. Sounds like a case for pipes :) Hey, @Gabriel439 , don't suppose you have any thoughts on the topic of managing allocation in case of not knowing size ahead of time?

AfC

vincenthz · 2014-05-02T09:28:35Z

my approach to this is just having a:

type Popper = Strict.ByteString -> IO ()

The packer still work the same, it allocate fixed size buffer of N size (configurable), and the only thing that change is when an operation would result in going after the buffer (and would raise an exception), instead the current buffer is popped (through the Popper typed function), a new buffer is allocated (or the current one is reset) and the operation is resumed.

Also, as an optimisation it would convenient to have a way to not copy bytestring that are over a certain size (configurable too), and directly call the popper instead.

This give the ability to support a very close to the machine and efficient lowlevel construct, and also the same interface can be made to create lazy bytestring, or to support higher level conduit/pipe constructs too.

If anyone got better idea, don't hesitate.

christian-marie · 2014-05-02T10:25:10Z

This wouldn't work with holes though, would it?

vincenthz · 2014-05-02T11:41:22Z

Not directly yes, but either you deny holes to happens with this scheme (completely at calling time or they need to be fill before popping), or you could still pop them to a special popper that would keep them in a queue with the number of holes to fill. For example:

[(1, <buf>), (0,<buf>), (2, <buf>)]

if you fill the first buf's hole, the first buf is full, then you can really pop (with the real popper) the 2 first bufs and then the queue is:

[(2, <buf>)]

christian-marie · 2014-05-02T11:55:45Z

That sounds reasonable. Tricky to implement, though.

Gabriella439 · 2014-05-02T16:18:53Z

The way I do this is to double the buffer size whenever going past the end. You can see an example of this in my foldl library for how I fold a stream into a vector.

There's nothing crazy at all about it. This is pretty standard in the C world. When you grow the buffer you don't have to recompute previous values.

However, I'm missing context for a lot of this and it's not clear what this buffer would be used for. If this is for streaming over the wire then there are other possible solutions which I can describe.

vincenthz · 2014-05-04T05:23:27Z

@Gabriel439 AFAIK, all the Vector grow operations are always implemented with a new allocation + copy contrary to C realloc which try to grow the buffer first and then allocate+copy as last resort. I think that would be pretty bad for large buffer.

christian-marie · 2016-09-05T07:13:52Z

This sort of fizzled out, now it's dead.

christian-marie mentioned this issue Feb 24, 2014

Suspicious amount of time and allocation on decode path alphaHeavy/protobuf#4

Closed

christian-marie closed this as completed Sep 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unknown length encoding #5

Unknown length encoding #5

christian-marie commented Feb 21, 2014

christian-marie commented Feb 21, 2014

vincenthz commented Feb 21, 2014

istathar commented May 1, 2014

vincenthz commented May 2, 2014

christian-marie commented May 2, 2014

vincenthz commented May 2, 2014

christian-marie commented May 2, 2014

Gabriella439 commented May 2, 2014

vincenthz commented May 4, 2014

christian-marie commented Sep 5, 2016

Unknown length encoding #5

Unknown length encoding #5

Comments

christian-marie commented Feb 21, 2014

christian-marie commented Feb 21, 2014

vincenthz commented Feb 21, 2014

istathar commented May 1, 2014

vincenthz commented May 2, 2014

christian-marie commented May 2, 2014

vincenthz commented May 2, 2014

christian-marie commented May 2, 2014

Gabriella439 commented May 2, 2014

vincenthz commented May 4, 2014

christian-marie commented Sep 5, 2016