Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriting decoder buffering #7

Merged
merged 1 commit into from
Jul 23, 2015
Merged

Conversation

mtlynch
Copy link
Contributor

@mtlynch mtlynch commented Jul 20, 2015

Decoder had several bugs in its buffering logic that caused it to report
invalid data on redis data that was actually valid.

In particular the "second attempt to read" check incorrectly returns
error when the data is larger than the buffer size and the buffer is
initially empty.

This also fixes the (separate, but related) error reported in
gosexy/redis:
gosexy/redis#42

This change rewrites the buffering logic to use standard bufio APIs
instead of reimplementing buffering logic in this library.

Decoder had several bugs in its buffering logic that caused it to report
invalid data on redis data that was actually valid.

In particular the "second attempt to read" check incorrectly returns
error when the data is larger than the buffer size and the buffer is
initially empty.

This also fixes the (separate, but related) error reported in
gosexy/redis:
gosexy/redis#42

This change rewrites the buffering logic to use standard bufio APIs
instead of reimplementing buffering logic in this library.
@xiam
Copy link
Owner

xiam commented Jul 20, 2015

Thanks a lot @mtlynch! Please allow me some time to go through it.

@xiam xiam self-assigned this Jul 20, 2015
@xiam
Copy link
Owner

xiam commented Jul 20, 2015

@mtlynch

I still need to go through the code but I am bit a concerned on the speed penalties of using the bytes.Buffer package directly.

This is what I got after running the benchmarking program on your proposal:

// Marshalling benchmarks were omitted
BenchmarkJSONUnmarshalString     1000000          1410 ns/op
BenchmarkRESPUnmarshalString      200000         10560 ns/op
BenchmarkJSONUnmarshalBytes  1000000          2344 ns/op
BenchmarkRESPUnmarshalBytes   200000         10452 ns/op
BenchmarkJSONUnmarshalInteger    1000000          1168 ns/op
BenchmarkRESPUnmarshalInteger     200000         10428 ns/op
BenchmarkJSONUnmarshalArrayString     500000          3326 ns/op
BenchmarkRESPUnmarshalArrayString     100000         16302 ns/op
BenchmarkJSONUnmarshalArrayBytes      500000          3271 ns/op
BenchmarkRESPUnmarshalArrayBytes      100000         12927 ns/op
BenchmarkJSONUnmarshalArrayInteger    500000          3787 ns/op
BenchmarkRESPUnmarshalArrayInteger    100000         19161 ns/op
BenchmarkJSONUnmarshalArrayArray      200000         10422 ns/op
BenchmarkRESPUnmarshalArrayArray           50000         32340 ns/op

Against the current implementation (which is compared to the JSON library):

// Marshalling benchmarks were omitted
BenchmarkJSONUnmarshalString     1000000          1251 ns/op
BenchmarkRESPUnmarshalString     1000000          1210 ns/op
BenchmarkJSONUnmarshalBytes  1000000          2399 ns/op
BenchmarkRESPUnmarshalBytes  1000000          1160 ns/op
BenchmarkJSONUnmarshalInteger    1000000          1201 ns/op
BenchmarkRESPUnmarshalInteger    1000000          1108 ns/op
BenchmarkJSONUnmarshalArrayString     500000          3148 ns/op
BenchmarkRESPUnmarshalArrayString     300000          5139 ns/op
BenchmarkJSONUnmarshalArrayBytes      500000          3319 ns/op
BenchmarkRESPUnmarshalArrayBytes      300000          3879 ns/op
BenchmarkJSONUnmarshalArrayInteger    500000          3670 ns/op
BenchmarkRESPUnmarshalArrayInteger    200000          7460 ns/op
BenchmarkJSONUnmarshalArrayArray      200000         13126 ns/op
BenchmarkRESPUnmarshalArrayArray      100000         17545 ns/op

But that's does not prove anything, my current implementation is probably flawed and that might be affecting benchmarking... so in order to actually compare both solutions in terms of speed I guess I'd have to fix this bug in my code and try again.

@mtlynch
Copy link
Contributor Author

mtlynch commented Jul 22, 2015

Yes, I am seeing similar results. I think the implementation in master is "cheating" a bit because I noticed that it throws out some data, but I don't think that explains the drastic performance difference. I profiled it a bit (results below), but I wasn't able to determine the cause. I suspect that the proposed code is doing some sort of unnecessary copying of an array when it could just pass a reference but I can't pinpoint anywhere this is happening.

Between the two, I'd prefer the proposed version because I'd rather have slower code that yields correct results than have faster code that yields "usually" correct results. : )

master: https://gist.github.com/mtlynch/dfaac62b22d66a5015a9
rewrite-reader: https://gist.github.com/mtlynch/f52bdefc632395c4ff3e

  • New code results in much more time in scanblock
    • This post says this indicates more frequent heap allocations
  • Most of the allocations seem to be from bufio.NewReaderSize (which is indirectly called by NewReader)
    • I'm not sure if this is actually where the issue is because I can't see the allocation amount, not sure what's going on with pprof

@xiam xiam merged commit 0bfd682 into xiam:master Jul 23, 2015
@xiam
Copy link
Owner

xiam commented Jul 23, 2015

Hello @mtlynch,

I just added a test case for this error and tested with and without your solution.

Between the two, I'd prefer the proposed version because I'd rather have slower code that yields correct results than have faster code that yields "usually" correct results. : )

I have to agree with that. Besides the bug you referred has been open for some days and I haven't been able to take a stab at it, so I guess it's better fixed than perfect.

I'm still concerned about speed and I'll eventually try (more carefully) with []byte again and measure results but this is definitely better than the previous code and allows users to continue working more safely than before.

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants