Skip to content
This repository has been archived by the owner on Sep 20, 2023. It is now read-only.

Unknown length encoding #5

Closed
christian-marie opened this issue Feb 21, 2014 · 10 comments
Closed

Unknown length encoding #5

christian-marie opened this issue Feb 21, 2014 · 10 comments

Comments

@christian-marie
Copy link
Contributor

In the case of encoding protobufs, we don't really know how long the encoded buffer will be beforehand.

Have you had any thoughts on how to implement dynamic size encoding?

I was thinking either -- do something crazy and realloc the buffer 2x every time you attempt to write past the end, or, use something sane like ByteString's builder (blaze).

Do you have an opinion?

@christian-marie
Copy link
Contributor Author

So I just had a think about this and went to see how the first implementation would look. The trouble with the 2x or ^2 allocation is that you have to re-run the whole mondaic chain, which will obviously either waste lots of memory or be slow :(

@vincenthz
Copy link
Owner

Yes, this is something I thought about.

One thing is the builder approach is terribly slow, so bytestring's builder (or blaze) is not a good option.

My favorite option is a consumer callback approach, where you have an IO callback when the buffer is "full" and that you need to process it. Usually you would tie this to a flush-to-handle or something similar, but I think the same approach could give a list of bytestring output too, and remain externally pure (no IO exposed). In any case, reallocating memory is not a good option.

For the decoding, I want to take the same approach except that you'ld have a producer callback.

Every non singular byte type (word16, word32, ..) would have to have a slow version that would use byte access to get things too.

@istathar
Copy link

istathar commented May 1, 2014

Consumer. Producer. Sounds like a case for pipes :) Hey, @Gabriel439 , don't suppose you have any thoughts on the topic of managing allocation in case of not knowing size ahead of time?

AfC

@vincenthz
Copy link
Owner

my approach to this is just having a:

type Popper = Strict.ByteString -> IO ()

The packer still work the same, it allocate fixed size buffer of N size (configurable), and the only thing that change is when an operation would result in going after the buffer (and would raise an exception), instead the current buffer is popped (through the Popper typed function), a new buffer is allocated (or the current one is reset) and the operation is resumed.

Also, as an optimisation it would convenient to have a way to not copy bytestring that are over a certain size (configurable too), and directly call the popper instead.

This give the ability to support a very close to the machine and efficient lowlevel construct, and also the same interface can be made to create lazy bytestring, or to support higher level conduit/pipe constructs too.

If anyone got better idea, don't hesitate.

@christian-marie
Copy link
Contributor Author

This wouldn't work with holes though, would it?

@vincenthz
Copy link
Owner

Not directly yes, but either you deny holes to happens with this scheme (completely at calling time or they need to be fill before popping), or you could still pop them to a special popper that would keep them in a queue with the number of holes to fill. For example:

[(1, <buf>), (0,<buf>), (2, <buf>)]

if you fill the first buf's hole, the first buf is full, then you can really pop (with the real popper) the 2 first bufs and then the queue is:

[(2, <buf>)]

@christian-marie
Copy link
Contributor Author

That sounds reasonable. Tricky to implement, though.

@Gabriella439
Copy link

The way I do this is to double the buffer size whenever going past the end. You can see an example of this in my foldl library for how I fold a stream into a vector.

There's nothing crazy at all about it. This is pretty standard in the C world. When you grow the buffer you don't have to recompute previous values.

However, I'm missing context for a lot of this and it's not clear what this buffer would be used for. If this is for streaming over the wire then there are other possible solutions which I can describe.

@vincenthz
Copy link
Owner

@Gabriel439 AFAIK, all the Vector grow operations are always implemented with a new allocation + copy contrary to C realloc which try to grow the buffer first and then allocate+copy as last resort. I think that would be pretty bad for large buffer.

@christian-marie
Copy link
Contributor Author

This sort of fizzled out, now it's dead.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants