Skip to content

Commit

Permalink
codec: major update with MAJOR PERFORMANCE improvements
Browse files Browse the repository at this point in the history
Along with these changes, we support go 1.4+ across the board, and test across
the last 3 go releases (go 1.7+) and tip before each github push.

The changes articulated below are contained in this push:

expand support for careful contained use of unsafe for performance

    This mostly involved creating unsafe and safe versions of a few functions, and
    using them across the codebase.

    - []byte-->string or string-->[]byte when we know there is no write done.
      e.g. map lookup, looking up a struct field given a field name, etc
    - rt2id(...): getting an id for a type usable as an id
    - rv2i(...):  getting an interface from a reflect.Value in our narrow context for encoding/decoding
    - ptrToRvMap: caching reflect.Value got for the same pointers
    - using atomic pointers to reduce sync.RWMutex contention (lock-free model)
    - fast conversion from reflect.Value to a pointer when we know exactly what the type is.

    Note that we only support unsafe for the 3 most recent go releases.
    This is because use of unsafe requires intimate knowledge of implementation details.

Simplify codebase during decode, making things consistent and improving performance

    - Use single conditional to remove duplicate codes for decoding into maps, arrays, slices and chans
    - Make changes across: reflection, fast-path and codecgen modes

    Litany of changes
    - prepare safe and unsafe variants for *decFnInfo methods
    - clean up struct for decoding into structs and collections (maps, slices, arrays, chans)
      so they are consistent in the reflection path.
      Previously, i had separate paths for length-prefix vs separator-based collections.
    - Update fast-path and codecgen template generation scripts to make all code mirror themselves
    - Apply some optimizations along the way (specifically for decode)
    - removed some unnecessary reflection calls in kMap
    - add some CanSet() checks and reuse reflect.Value (if immutablekind) during decode

use safe and appengine tags to judiciously use "unsafe" by default for performance

    The tags are set up this way, so that a user must explicitly pass
    one of the following tags, to jse the standard "slower" apis.

json: use lookup table for safe characters to improve performance

    The previous check for characters inside of a JSON string that needed
    to be escaped performed seven different boolean comparisons before
    determining that a ASCII character did not need to be escaped. Most
    characters do not need to be escaped, so this check can be done in a
    more performant way *typically* using a single/simple memory table lookup.

    We use [128]bool to represent safe and unsafe characters, and look up
    there instead of doing a computation.

use sync/atomic to lookup typeinfo without synchronization

    Also, move from using a map to just do a binary search when looking
    up typeInfo. This should be faster (arguably).

optimize for map with string keys to avoid allocation if possible

re-introduce decDriver.DecodeStringAsBytes

    This reduces allocation significantly and is clearer.

    DecodeString and DecodeStringAsBytes are 2 versions of the same code.

add native helper functions for performant json parsing
improved json interaction with codec package for better performance.

    One way to improve json performance is to bypass the multiple readn1
    calls in a tight loop, and just have a single interface call to do the
    combo functionality.

    json does some things a lot during decoding:
    - skip whitespace characters
    - constructs numbers
    - read strings

    doing these one character at a time incur interface indirection and indirect function
    call overhead, as it all goes through decDriver interface to an implementation like *bytesDecDriver.

    We can make some major gains by exposing "combo-like" methods that expose this functionality
    as a single call (as opposed to calls that do one byte at a time).
        // skip will skip any byte that matches, and return the first non-matching byte
        skip(accept *[256]bool) (token byte)
        // readTo will read any byte that matches, stopping once no-longer matching.
        readTo(in []byte, accept *[256]bool) (out []byte)
        // readUntil will read, only stopping once it matches the 'stop' byte.
        readUntil(in []byte, stop byte) (out []byte)

    Currently, only json handle has use of these methods, giving it much better perf.

    In addition, since we can just request a block of text, and work on it directly.
    we might as well remove the logic that does strconv.ParseFloat at iterator time.
    Instead, we just grab the block of text representing a number, and pass that to the
    strconv.ParseXXX functions.

    This resulted in the removal of all the string->number logic. Which is A-ok.

    passing through interface calls to an implementation (function and indirection).

    We saw performance gains when using [256]bool arrays in place of multiple comparisons.
    We continue to leverage that pattern to get some performance improvements without the
    need to pass callback functions for matching bytes. This is good because the call-backs
    were costing us over 10% in performance loss.

    Performance gains are very good now over the last few code iterations.

re-packaged all language-specific code in appropriately named files

json: add PreferFloat option to default to decoding numbers at float

test suite and organization

    We now use test suites to define and run our tests, making
    most of what tests.sh and bench.sh did unnecessary.

    We now drive scripts from run.sh, and those just call
    go test ... for running tests and benchmarks.

    This required some re-factoring in the tests to make them all work.

    organize tests so they depend on helper functions, allowing for many suites.

    This affords XSuite, CodecSuite and AllSuite .

    It also allows us separate the tests from benchmarks, and further separate benchmarks into
    codec+stdlib only, and external ones.

codecgen: remove unsafe - delegating to codec's safe/unsafe instead.

    codecgen will now use whatever tag (safe or unsafe) the codec package is built with.

    To do this, expose stringView to codecgen (as genHelperDecoder.StringView(...)
    and use that within the generated code.

conditional build: to support go1.4+

    support go1.4+ and use goversion_* to conditionally compile things based on go version

codecgen: gen.go is only used by codecgen execution, behind build tag "codecgen.exec"

    Do this by
    - adding the build tag "codecgen.exec" to gen.go
    - moving the definition of GenVersion to gen-helper.go.tmpl (which is used at runtime)
      which requires adding a "Version" field to genInternal struct
    - passing the "codecgen.exec" tag as one of the values to "go run ..."

    Also, codecgen depends on fastpath support.
    Document that, and update run.sh appropriately.

    Also, values_codecgen_generated_test.go MUST not have the 'x' build tag set.
    That 'x' build tag is only for codebases which are not codec based.

    Code coverage is currently at 71%, which is just above our goals.

----------

Testing
    - expand list of float values to test
    - add a true large string for testing
    - add testUseIOEnc support to json-iterator benchmark
    - add a test suite for running all codec tests and grabbing codecoverage metrics
    - add test for json indent
    - exercise all paths in fast-path using full-featured generated type

Misc:
    - add json-iterator and easyjson to the benchmarks
  • Loading branch information
ugorji committed Sep 18, 2017
1 parent 8c0409f commit 54210f4
Show file tree
Hide file tree
Showing 43 changed files with 9,285 additions and 11,159 deletions.
34 changes: 15 additions & 19 deletions codec/0doc.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,26 @@
// Use of this source code is governed by a MIT license found in the LICENSE file.

/*
High Performance, Feature-Rich Idiomatic Go codec/encoding library for
binc, msgpack, cbor, json.
High Performance, Feature-Rich Idiomatic Go 1.4+ codec/encoding library for
binc, msgpack, cbor, json
Supported Serialization formats are:
- msgpack: https://github.com/msgpack/msgpack
- binc: http://github.com/ugorji/binc
- cbor: http://cbor.io http://tools.ietf.org/html/rfc7049
- json: http://json.org http://tools.ietf.org/html/rfc7159
- simple:
- simple:
To install:
go get github.com/ugorji/go/codec
This package understands the 'unsafe' tag, to allow using unsafe semantics:
- When decoding into a struct, you need to read the field name as a string
so you can find the struct field it is mapped to.
Using `unsafe` will bypass the allocation and copying overhead of []byte->string conversion.
To install using unsafe, pass the 'unsafe' tag:
go get -tags=unsafe github.com/ugorji/go/codec
This package will carefully use 'unsafe' for performance reasons in specific places.
You can build without unsafe use by passing the safe or appengine tag
i.e. 'go install -tags=safe ...'. Note that unsafe is only supported for the last 3
go sdk versions e.g. current go release is go 1.9, so we support unsafe use only from
go 1.7+ . This is because supporting unsafe requires knowledge of implementation details.
For detailed usage information, read the primer at http://ugorji.net/blog/go-codec-primer .
Expand All @@ -38,9 +34,9 @@ Rich Feature Set includes:
- Very High Performance.
Our extensive benchmarks show us outperforming Gob, Json, Bson, etc by 2-4X.
- Multiple conversions:
Package coerces types where appropriate
Package coerces types where appropriate
e.g. decode an int in the stream into a float, etc.
- Corner Cases:
- Corner Cases:
Overflows, nil maps/slices, nil values in streams are handled correctly
- Standard field renaming via tags
- Support for omitting empty fields during an encoding
Expand All @@ -56,7 +52,7 @@ Rich Feature Set includes:
- Fast (no-reflection) encoding/decoding of common maps and slices
- Code-generation for faster performance.
- Support binary (e.g. messagepack, cbor) and text (e.g. json) formats
- Support indefinite-length formats to enable true streaming
- Support indefinite-length formats to enable true streaming
(for formats which support it e.g. json, cbor)
- Support canonical encoding, where a value is ALWAYS encoded as same sequence of bytes.
This mostly applies to maps, where iteration order is non-deterministic.
Expand All @@ -68,12 +64,12 @@ Rich Feature Set includes:
- Encode/Decode from/to chan types (for iterative streaming support)
- Drop-in replacement for encoding/json. `json:` key in struct tag supported.
- Provides a RPC Server and Client Codec for net/rpc communication protocol.
- Handle unique idiosyncrasies of codecs e.g.
- For messagepack, configure how ambiguities in handling raw bytes are resolved
- For messagepack, provide rpc server/client codec to support
- Handle unique idiosyncrasies of codecs e.g.
- For messagepack, configure how ambiguities in handling raw bytes are resolved
- For messagepack, provide rpc server/client codec to support
msgpack-rpc protocol defined at:
https://github.com/msgpack-rpc/msgpack-rpc/blob/master/spec.md
Extension Support
Users can register a function to handle the encoding or decoding of
Expand Down
40 changes: 29 additions & 11 deletions codec/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,11 @@ To install:

go get github.com/ugorji/go/codec

This package understands the `unsafe` tag, to allow using unsafe semantics:

- When decoding into a struct, you need to read the field name as a string
so you can find the struct field it is mapped to.
Using `unsafe` will bypass the allocation and copying overhead of `[]byte->string` conversion.

To use it, you must pass the `unsafe` tag during install:

```
go install -tags=unsafe github.com/ugorji/go/codec
```
This package will carefully use 'unsafe' for performance reasons in specific places.
You can build without unsafe use by passing the safe or appengine tag
i.e. 'go install -tags=safe ...'. Note that unsafe is only supported for the last 3
go sdk versions e.g. current go release is go 1.9, so we support unsafe use only from
go 1.7+ . This is because supporting unsafe requires knowledge of implementation details.

Online documentation: http://godoc.org/github.com/ugorji/go/codec
Detailed Usage/How-to Primer: http://ugorji.net/blog/go-codec-primer
Expand All @@ -36,8 +30,13 @@ the standard library (ie json, xml, gob, etc).
Rich Feature Set includes:

- Simple but extremely powerful and feature-rich API
- Support for go1.4 and above, while selectively using newer APIs for later releases
- Good code coverage ( > 70% )
- Very High Performance.
Our extensive benchmarks show us outperforming Gob, Json, Bson, etc by 2-4X.
- Careful selected use of 'unsafe' for targeted performance gains.
100% mode exists where 'unsafe' is not used at all.
- Lock-free (sans mutex) concurrency for scaling to 100's of cores
- Multiple conversions:
Package coerces types where appropriate
e.g. decode an int in the stream into a float, etc.
Expand Down Expand Up @@ -146,3 +145,22 @@ Typical usage model:
//OR rpcCodec := codec.MsgpackSpecRpc.ClientCodec(conn, h)
client := rpc.NewClientWithCodec(rpcCodec)

## Running Tests

To run tests, use the following:

go test

To run the full suite of tests, use the following:

go test -tags alltests -run Suite

You can run the tag 'safe' to run tests or build in safe mode. e.g.

go test -tags safe -run Json
go test -tags "alltests safe" -run Suite

## Running Benchmarks

Please see http://github.com/ugorji/go-codec-bench .

15 changes: 8 additions & 7 deletions codec/binc.go
Original file line number Diff line number Diff line change
Expand Up @@ -728,11 +728,12 @@ func (d *bincDecDriver) DecodeString() (s string) {
return
}

func (d *bincDecDriver) DecodeBytes(bs []byte, isstring, zerocopy bool) (bsOut []byte) {
if isstring {
bsOut, _ = d.decStringAndBytes(bs, false, zerocopy)
return
}
func (d *bincDecDriver) DecodeStringAsBytes() (s []byte) {
s, _ = d.decStringAndBytes(d.b[:], false, true)
return
}

func (d *bincDecDriver) DecodeBytes(bs []byte, zerocopy bool) (bsOut []byte) {
if !d.bdRead {
d.readNextBd()
}
Expand Down Expand Up @@ -789,7 +790,7 @@ func (d *bincDecDriver) decodeExtV(verifyTag bool, tag byte) (xtag byte, xbs []b
}
xbs = d.r.readx(l)
} else if d.vd == bincVdByteArray {
xbs = d.DecodeBytes(nil, false, true)
xbs = d.DecodeBytes(nil, true)
} else {
d.d.errorf("Invalid d.vd for extensions (Expecting extensions or byte array). Got: 0x%x", d.vd)
return
Expand Down Expand Up @@ -858,7 +859,7 @@ func (d *bincDecDriver) DecodeNaked() {
n.s = d.DecodeString()
case bincVdByteArray:
n.v = valueTypeBytes
n.l = d.DecodeBytes(nil, false, false)
n.l = d.DecodeBytes(nil, false)
case bincVdTimestamp:
n.v = valueTypeTimestamp
tt, err := decodeTime(d.r.readx(int(d.vs)))
Expand Down
12 changes: 8 additions & 4 deletions codec/cbor.go
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ func (d *cborDecDriver) decAppendIndefiniteBytes(bs []byte) []byte {
return bs
}

func (d *cborDecDriver) DecodeBytes(bs []byte, isstring, zerocopy bool) (bsOut []byte) {
func (d *cborDecDriver) DecodeBytes(bs []byte, zerocopy bool) (bsOut []byte) {
if !d.bdRead {
d.readNextBd()
}
Expand All @@ -434,7 +434,11 @@ func (d *cborDecDriver) DecodeBytes(bs []byte, isstring, zerocopy bool) (bsOut [
}

func (d *cborDecDriver) DecodeString() (s string) {
return string(d.DecodeBytes(d.b[:], true, true))
return string(d.DecodeBytes(d.b[:], true))
}

func (d *cborDecDriver) DecodeStringAsBytes() (s []byte) {
return d.DecodeBytes(d.b[:], true)
}

func (d *cborDecDriver) DecodeExt(rv interface{}, xtag uint64, ext Ext) (realxtag uint64) {
Expand Down Expand Up @@ -485,7 +489,7 @@ func (d *cborDecDriver) DecodeNaked() {
n.f = d.DecodeFloat(false)
case cborBdIndefiniteBytes:
n.v = valueTypeBytes
n.l = d.DecodeBytes(nil, false, false)
n.l = d.DecodeBytes(nil, false)
case cborBdIndefiniteString:
n.v = valueTypeString
n.s = d.DecodeString()
Expand All @@ -510,7 +514,7 @@ func (d *cborDecDriver) DecodeNaked() {
n.i = d.DecodeInt(64)
case d.bd >= cborBaseBytes && d.bd < cborBaseString:
n.v = valueTypeBytes
n.l = d.DecodeBytes(nil, false, false)
n.l = d.DecodeBytes(nil, false)
case d.bd >= cborBaseString && d.bd < cborBaseArray:
n.v = valueTypeString
n.s = d.DecodeString()
Expand Down

4 comments on commit 54210f4

@vdemeester
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes some signatures like DecodeBytes and thus breaks k8s.io/client-go on generated types it seems 😅.

@abronan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdemeester Seeing similar errors on coreos/etcd/client

@ugorji
Copy link
Owner Author

@ugorji ugorji commented on 54210f4 Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a consequence of using codecgen. A new pull of go-codec will require the re-generation of the sources. i.e. you have to re-run codecgen after a new update.

@ajwerner
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

causes #208

Please sign in to comment.