math/bits: faster way to compute OnesCount64

### What version of Go are you using (`go version`)?

<pre>
$ go version
go version go1.16.3 darwin/amd64

</pre>

### Does this issue reproduce with the latest release?
Yes


### What operating system and processor architecture are you using (`go env`)?

<details><summary><code>go env</code> Output</summary><br><pre>
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/ttsugrii/Library/Caches/go-build"
GOENV="/Users/ttsugrii/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/ttsugrii/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/ttsugrii/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.16.3/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.16.3/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16.3"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/ttsugrii/projects/go/testing/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/kx/wp6rcc054zv_xyf4lpmlztr00000gn/T/go-build1168578044=/tmp/go-build -gno-record-gcc-switches -fno-common"

</pre></details>

### What did you do?

 I've ported Wilkes-Wheeler-Gill algorithm from ["Faster Population Counts Using AVX2 Instructions" paper](https://arxiv.org/pdf/1611.07612.pdf):
```go
const c1 = 0x5555555555555555
const c2 = 0x3333333333333333
const c4 = 0x0F0F0F0F0F0F0F0F

//go:noinline
func wilkes_wheeler_gill(x uint64) int {
	x -= (x >> 1) & c1
	x = ((x >> 2) & c2) + (x & c2)
	x = (x + (x >> 4)) & c4
	x *= 0x0101010101010101
	return int(x >> 56)
}
```

and benchmarked it against the version used by `math.bits.OnesCount64` in case `POPCNT` is not supported:
```go
const m0 = 0x5555555555555555 // 01010101 ...
const m1 = 0x3333333333333333 // 00110011 ...
const m2 = 0x0f0f0f0f0f0f0f0f // 00001111 ...

//go:noinline
func parallel_summing(x uint64) int {
	const m = 1<<64 - 1
	x = x>>1&(m0&m) + x&(m0&m)
	x = x>>2&(m1&m) + x&(m1&m)
	x = (x>>4 + x) & (m2 & m)
	x += x >> 8
	x += x >> 16
	x += x >> 32
	return int(x) & (1<<7 - 1)
}
```

The benchmarks:
```go
var x uint64 = <the input value>
var result int

func BenchmarkPopCountParallelSumming(b *testing.B) {
	var cnt int
	for i := 0; i < b.N; i++ {
		cnt += parallel_summing(x)
	}
	result += cnt
}

func BenchmarkPopCountWilkesWheelerGill(b *testing.B) {
	var cnt int
	for i := 0; i < b.N; i++ {
		cnt += wilkes_wheeler_gill(x)
	}
	result += cnt
}
```

For `var x uint64 = 0xffffffffffffffff`:
<pre>
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkPopCountParallelSumming-16      	360195423	         3.094 ns/op
BenchmarkPopCountWilkesWheelerGill-16    	418102376	         2.798 ns/op
</pre>
and for `var x uint64 = 0`
<pre>
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkPopCountParallelSumming-16      	379682172	         3.126 ns/op
BenchmarkPopCountWilkesWheelerGill-16    	431421498	         2.870 ns/op
</pre>


### What did you expect to see?
`math.bits.OnesCount64`'s version to be faster


### What did you see instead?
Wilkes-Wheeler-Gill algorithm was faster by ~10% and, given its implementation is also more concise, it makes sense to replace with it existing implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

math/bits: faster way to compute OnesCount64 #46188

What version of Go are you using (`go version`)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (`go env`)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

math/bits: faster way to compute OnesCount64 #46188

Description

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?