Rewriting decoder buffering #7

mtlynch · 2015-07-20T22:05:46Z

Decoder had several bugs in its buffering logic that caused it to report
invalid data on redis data that was actually valid.

In particular the "second attempt to read" check incorrectly returns
error when the data is larger than the buffer size and the buffer is
initially empty.

This also fixes the (separate, but related) error reported in
gosexy/redis:
gosexy/redis#42

This change rewrites the buffering logic to use standard bufio APIs
instead of reimplementing buffering logic in this library.

Decoder had several bugs in its buffering logic that caused it to report invalid data on redis data that was actually valid. In particular the "second attempt to read" check incorrectly returns error when the data is larger than the buffer size and the buffer is initially empty. This also fixes the (separate, but related) error reported in gosexy/redis: gosexy/redis#42 This change rewrites the buffering logic to use standard bufio APIs instead of reimplementing buffering logic in this library.

xiam · 2015-07-20T23:09:26Z

Thanks a lot @mtlynch! Please allow me some time to go through it.

xiam · 2015-07-20T23:29:06Z

@mtlynch

I still need to go through the code but I am bit a concerned on the speed penalties of using the bytes.Buffer package directly.

This is what I got after running the benchmarking program on your proposal:

// Marshalling benchmarks were omitted
BenchmarkJSONUnmarshalString     1000000          1410 ns/op
BenchmarkRESPUnmarshalString      200000         10560 ns/op
BenchmarkJSONUnmarshalBytes  1000000          2344 ns/op
BenchmarkRESPUnmarshalBytes   200000         10452 ns/op
BenchmarkJSONUnmarshalInteger    1000000          1168 ns/op
BenchmarkRESPUnmarshalInteger     200000         10428 ns/op
BenchmarkJSONUnmarshalArrayString     500000          3326 ns/op
BenchmarkRESPUnmarshalArrayString     100000         16302 ns/op
BenchmarkJSONUnmarshalArrayBytes      500000          3271 ns/op
BenchmarkRESPUnmarshalArrayBytes      100000         12927 ns/op
BenchmarkJSONUnmarshalArrayInteger    500000          3787 ns/op
BenchmarkRESPUnmarshalArrayInteger    100000         19161 ns/op
BenchmarkJSONUnmarshalArrayArray      200000         10422 ns/op
BenchmarkRESPUnmarshalArrayArray           50000         32340 ns/op

Against the current implementation (which is compared to the JSON library):

// Marshalling benchmarks were omitted
BenchmarkJSONUnmarshalString     1000000          1251 ns/op
BenchmarkRESPUnmarshalString     1000000          1210 ns/op
BenchmarkJSONUnmarshalBytes  1000000          2399 ns/op
BenchmarkRESPUnmarshalBytes  1000000          1160 ns/op
BenchmarkJSONUnmarshalInteger    1000000          1201 ns/op
BenchmarkRESPUnmarshalInteger    1000000          1108 ns/op
BenchmarkJSONUnmarshalArrayString     500000          3148 ns/op
BenchmarkRESPUnmarshalArrayString     300000          5139 ns/op
BenchmarkJSONUnmarshalArrayBytes      500000          3319 ns/op
BenchmarkRESPUnmarshalArrayBytes      300000          3879 ns/op
BenchmarkJSONUnmarshalArrayInteger    500000          3670 ns/op
BenchmarkRESPUnmarshalArrayInteger    200000          7460 ns/op
BenchmarkJSONUnmarshalArrayArray      200000         13126 ns/op
BenchmarkRESPUnmarshalArrayArray      100000         17545 ns/op

But that's does not prove anything, my current implementation is probably flawed and that might be affecting benchmarking... so in order to actually compare both solutions in terms of speed I guess I'd have to fix this bug in my code and try again.

mtlynch · 2015-07-22T01:02:35Z

Yes, I am seeing similar results. I think the implementation in master is "cheating" a bit because I noticed that it throws out some data, but I don't think that explains the drastic performance difference. I profiled it a bit (results below), but I wasn't able to determine the cause. I suspect that the proposed code is doing some sort of unnecessary copying of an array when it could just pass a reference but I can't pinpoint anywhere this is happening.

Between the two, I'd prefer the proposed version because I'd rather have slower code that yields correct results than have faster code that yields "usually" correct results. : )

master: https://gist.github.com/mtlynch/dfaac62b22d66a5015a9
rewrite-reader: https://gist.github.com/mtlynch/f52bdefc632395c4ff3e

New code results in much more time in scanblock
- This post says this indicates more frequent heap allocations
Most of the allocations seem to be from bufio.NewReaderSize (which is indirectly called by NewReader)
- I'm not sure if this is actually where the issue is because I can't see the allocation amount, not sure what's going on with pprof

xiam · 2015-07-23T03:38:55Z

Hello @mtlynch,

I just added a test case for this error and tested with and without your solution.

Between the two, I'd prefer the proposed version because I'd rather have slower code that yields correct results than have faster code that yields "usually" correct results. : )

I have to agree with that. Besides the bug you referred has been open for some days and I haven't been able to take a stab at it, so I guess it's better fixed than perfect.

I'm still concerned about speed and I'll eventually try (more carefully) with []byte again and measure results but this is definitely better than the previous code and allows users to continue working more safely than before.

Thanks,

xiam self-assigned this Jul 20, 2015

xiam mentioned this pull request Jul 23, 2015

The resp library fails to decode large arrays #8

Closed

xiam merged commit 0bfd682 into xiam:master Jul 23, 2015

This was referenced Jul 23, 2015

Freeze API and focus on speed #5

Open

GET return unorder bytes gosexy/redis#41

Open

LRANGE fails on size 999 gosexy/redis#42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewriting decoder buffering #7

Rewriting decoder buffering #7

mtlynch commented Jul 20, 2015

xiam commented Jul 20, 2015

xiam commented Jul 20, 2015

mtlynch commented Jul 22, 2015

xiam commented Jul 23, 2015

Rewriting decoder buffering #7

Rewriting decoder buffering #7

Conversation

mtlynch commented Jul 20, 2015

xiam commented Jul 20, 2015

xiam commented Jul 20, 2015

mtlynch commented Jul 22, 2015

xiam commented Jul 23, 2015