-
Notifications
You must be signed in to change notification settings - Fork 6
Unknown length encoding #5
Comments
So I just had a think about this and went to see how the first implementation would look. The trouble with the 2x or ^2 allocation is that you have to re-run the whole mondaic chain, which will obviously either waste lots of memory or be slow :( |
Yes, this is something I thought about. One thing is the builder approach is terribly slow, so bytestring's builder (or blaze) is not a good option. My favorite option is a consumer callback approach, where you have an IO callback when the buffer is "full" and that you need to process it. Usually you would tie this to a flush-to-handle or something similar, but I think the same approach could give a list of bytestring output too, and remain externally pure (no IO exposed). In any case, reallocating memory is not a good option. For the decoding, I want to take the same approach except that you'ld have a producer callback. Every non singular byte type (word16, word32, ..) would have to have a slow version that would use byte access to get things too. |
Consumer. Producer. Sounds like a case for pipes :) Hey, @Gabriel439 , don't suppose you have any thoughts on the topic of managing allocation in case of not knowing size ahead of time? AfC |
my approach to this is just having a:
The packer still work the same, it allocate fixed size buffer of N size (configurable), and the only thing that change is when an operation would result in going after the buffer (and would raise an exception), instead the current buffer is popped (through the Popper typed function), a new buffer is allocated (or the current one is reset) and the operation is resumed. Also, as an optimisation it would convenient to have a way to not copy bytestring that are over a certain size (configurable too), and directly call the popper instead. This give the ability to support a very close to the machine and efficient lowlevel construct, and also the same interface can be made to create lazy bytestring, or to support higher level conduit/pipe constructs too. If anyone got better idea, don't hesitate. |
This wouldn't work with holes though, would it? |
Not directly yes, but either you deny holes to happens with this scheme (completely at calling time or they need to be fill before popping), or you could still pop them to a special popper that would keep them in a queue with the number of holes to fill. For example:
if you fill the first buf's hole, the first buf is full, then you can really pop (with the real popper) the 2 first bufs and then the queue is:
|
That sounds reasonable. Tricky to implement, though. |
The way I do this is to double the buffer size whenever going past the end. You can see an example of this in my There's nothing crazy at all about it. This is pretty standard in the C world. When you grow the buffer you don't have to recompute previous values. However, I'm missing context for a lot of this and it's not clear what this buffer would be used for. If this is for streaming over the wire then there are other possible solutions which I can describe. |
@Gabriel439 AFAIK, all the Vector grow operations are always implemented with a new allocation + copy contrary to C realloc which try to grow the buffer first and then allocate+copy as last resort. I think that would be pretty bad for large buffer. |
This sort of fizzled out, now it's dead. |
In the case of encoding protobufs, we don't really know how long the encoded buffer will be beforehand.
Have you had any thoughts on how to implement dynamic size encoding?
I was thinking either -- do something crazy and realloc the buffer 2x every time you attempt to write past the end, or, use something sane like ByteString's builder (blaze).
Do you have an opinion?
The text was updated successfully, but these errors were encountered: