Optimize binary operations #94

savonarola · 2016-12-10T16:36:20Z

Hello!

We use eredis widely in many Erlang and Elixir projects. It's an awesome library, but recently we had an issue with an edge case.

We noticed that one of our services experienced strange short peaks of load, during which all VM scheduler threads were consumed.

After some research we found that the peaks were caused by calling Redis GET command over keys holding very large values (~150MB). Having looked into eredis sources we found that eredis parser does not behave very well against this particular case: it accumulates binary data of response recreating binary each time:

parse_bulk({IntSize, Acc0}, Data) ->
    Acc = <<Acc0/binary, Data/binary>>,

etc. When the size of a bulk is large, reallocating happens many thousand times (since the response comes in many thousands of pieces) and possibly causes quadratic memory consumption for a short term of time.

So we implemented an iolist-based variant of parsing (it even had beed suggested in the comments).

There are also some benchmarks measuring CPU consumption of both variants on large and relatively small keys: https://github.com/savonarola/eredis_bench

Thanks!

Made binaries not be concatenated to avoid reallocation. Now binaries are accumulated into lists and concatenated only when there is enough data for a whole response protocol item value.

Removed direct matches against buffer structs.

Removed the suggested iolist parser improvement from the list of suggested improvements since it has been implemented.

av-ast · 2016-12-10T16:46:37Z

👍

knutin · 2016-12-10T16:48:23Z

This is great. Thanks for the patch! I'm in favour of merging and tagging a new release.

I know of some other users who ran into a similar issue recently. I hacked together a similar solution for the parser, but the performance gain wasn't that great in their case so I did not push it. The solution that gave me the best performance was to increase the size of the Erlang inet_tcp buffer to match the kernel recbuf.

7> {ok, S} = gen_tcp:listen(8080, []).
{ok,#Port<0.523>}
8> inet:getopts(S, [recbuf]).
{ok,[{recbuf,131072}]}
9> inet:getopts(S, [buffer]).
{ok,[{buffer,1460}]}
10> inet:setopts(S, [{buffer, 131072}]).
ok

Maybe the default behaviour of eredis should be to set buffer to match recbuf and sendbuf on the particular platform where it's running. If you're interested, I suggest you try it out and see if there's performance to be gained also in your specific case.

savonarola · 2016-12-10T17:41:26Z

Thanks for the advice, we'll definitely try to also tune socket params.

But as far as I understand the situation, tuning buffer size to be set to lager values means that the response will arrive in greater chunks. This obviously reduces the load, since we have fewer (re)allocations.

I have tried to test resource consumption of parsing 150mb responses coming in 128kb chunks (updated https://github.com/savonarola/eredis_bench README), and got that large chunks reduce CPU load for the optimized iolist variant even better than for the original concatenating implementation, so the relative overhead of concatenating binaries even increased :)

So, I think it still would be wonderful to have some optimized parser implementation to be included into future releases.

knutin · 2017-01-13T16:50:26Z

Just wanted to check in on this PR. Have you used this branch in production? :)

netDalek · 2017-01-31T06:21:08Z

Yes, we are. Has already deployed

knutin · 2017-02-01T13:43:03Z

Great. I'll go ahead and merge this. I'll tag new release soon.

av added 3 commits December 10, 2016 01:58

Optimized eredis_parser

1808193

Made binaries not be concatenated to avoid reallocation. Now binaries are accumulated into lists and concatenated only when there is enough data for a whole response protocol item value.

Made buffer in eredis_parser opaque

de84251

Removed direct matches against buffer structs.

Removed the suggestion about iolist parser improvement

4c807ab

Removed the suggested iolist parser improvement from the list of suggested improvements since it has been implemented.

knutin merged commit f11794f into wooga:master Feb 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize binary operations #94

Optimize binary operations #94

savonarola commented Dec 10, 2016

av-ast commented Dec 10, 2016

knutin commented Dec 10, 2016

savonarola commented Dec 10, 2016 •

edited

knutin commented Jan 13, 2017

netDalek commented Jan 31, 2017

knutin commented Feb 1, 2017

Optimize binary operations #94

Optimize binary operations #94

Conversation

savonarola commented Dec 10, 2016

av-ast commented Dec 10, 2016

knutin commented Dec 10, 2016

savonarola commented Dec 10, 2016 • edited

knutin commented Jan 13, 2017

netDalek commented Jan 31, 2017

knutin commented Feb 1, 2017

savonarola commented Dec 10, 2016 •

edited