Improvements #1

funny-falcon · 2015-03-04T12:49:45Z

use non-blocking select from single channel when possible
such select doesn't need heap allocation nor complex logic
stop timer in client CallTimeout
detect empty requestsChan and responsesChan instead of timer to flush writers
with default PendingRequests == PendingResponses == 32768 it is good both for throughput and latency
(but i'll recomend to decrease default write buffers a bit).

- try nonblocking channel send, which is faster. - stop timer, so runtime doesn't need to fire goroutine when it expires.

react on no messages in requestsChan instead of flush timeout. probably it will use a bit more CPU, but decrease request latency for service without big load.

react on no responses in responsesChan instead of flush timeout. probably it will use a bit more CPU, but decrease request latency for service without big load.

valyala · 2015-03-04T23:53:26Z

Thanks for the participation, but unfortunately I cannot merge these changes due to problems described above.

funny-falcon · 2015-03-05T05:26:06Z

It's a pity that you even measured proposed changes and their impact. I share not just my "thoughts" but experience, and you talk about it as if it is "premature optimizations". Yeah, it is "premature optimizations" if you never plan to reach 100000rps and more. But it is profile guided optimization at that rate, cause i did profile at that rate.

…th. Thanks for this hack to funny-falcon. See #1 for details

valyala · 2015-03-05T10:51:00Z

I just benchmarked blocking select with non-blocking select and was impressed by the numbers:

func makeChans() (bCh chan struct{}, nbCh chan struct{}) {
        nbCh = make(chan struct{})
        close(nbCh)
        bCh = make(chan struct{})
        return
}

func BenchmarkBlockingSelect(b *testing.B) {
        bCh, nbCh := makeChans()

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                select {
                case <-nbCh:  // query queue emulation
                case <-bCh:   // timer emulation
                }
        }
}

func BenchmarkNonBlockingSelect(b *testing.B) {
        bCh, nbCh := makeChans()

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                select {
                case <-nbCh:
                default:
                        b.Fatalf("unexpected code path")
                        select {
                        case <-nbCh:
                        case <-bCh:
                        }
                }
        }
}

Results:

BenchmarkBlockingSelect  5000000           256 ns/op
BenchmarkNonBlockingSelect  30000000            43.2 ns/op

This proves that your change with select increases its' performance by more than 6 times! The change has been adopted in the code - see ec89681 .

valyala · 2015-03-05T10:57:30Z

FYI, I slightly updated benchmark code to be closer to the reality:

func makeChans(n int) (bCh chan struct{}, nbCh chan struct{}) {
        nbCh = make(chan struct{}, n)
        bCh = make(chan struct{})
        return
}

func BenchmarkBlockingSelect(b *testing.B) {
        bCh, nbCh := makeChans(b.N)

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                select {
                case nbCh <- struct{}{}: // query queue emulation
                case <-bCh: // timer emulation
                }
        }
}

func BenchmarkNonBlockingSelect(b *testing.B) {
        bCh, nbCh := makeChans(b.N)

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                select {
                case nbCh <- struct{}{}:
                default:
                        b.Fatalf("Unexpected code path")
                        select {
                        case nbCh <- struct{}{}:
                        case <-bCh:
                        }
                }
        }
}

But this didn't change benchmark results.

valyala · 2015-03-05T11:27:17Z

As to the change with time.After() -> time.NewTimer(), benchark results prove this was worthless optimization:

func BenchmarkTimerStop(b *testing.B) {
        for i := 0; i < b.N; i++ {
                t := time.NewTimer(time.Millisecond)
                select {
                case <-t.C:
                default:
                        t.Stop()
                }
        }
}

func BenchmarkTimerNoStop(b *testing.B) {
        for i := 0; i < b.N; i++ {
                tc := time.After(time.Millisecond)
                select {
                case <-tc:
                default:
                }
        }
}

Results:

BenchmarkTimerStop-4     1000000          1483 ns/op         192 B/op          3 allocs/op
BenchmarkTimerNoStop-4   2000000           854 ns/op         192 B/op          3 allocs/op

funny-falcon · 2015-03-05T11:58:33Z

Yeah, you are right about timers:
I benched it when go were 1.1, and then each timer expiration created goroutines. When I asked for a way to call function within timers goroutine, they refuse it.
And now they call timers function in a timers goroutine.... just as i asked, and they refused...

…s to FlushDelay. Thanks to funny-falcon to the idea - see #1 for details

valyala · 2015-03-05T12:23:10Z

FYI, I added an ability to disable message buffering on both client and server for those who need minimal rpc latency. But benchark results say that gorpc has better throughput with enabled message buffering:

10K workers, default 5ms delay for message buffering:

BenchmarkEchoInt10000Workers-4    100000         10849 ns/op
BenchmarkEchoIntNocompress10000Workers-4      200000         10088 ns/op
BenchmarkEchoString10000Workers-4     100000         11528 ns/op
BenchmarkEchoStringNocompress10000Workers-4   100000         11218 ns/op
BenchmarkEchoStruct10000Workers-4     100000         14669 ns/op
BenchmarkEchoStructNocompress10000Workers-4   100000         13533 ns/op

10K workers, disabled message buffering:

BenchmarkEchoInt10000Workers-4     50000         27190 ns/op
BenchmarkEchoIntNocompress10000Workers-4      100000         14499 ns/op
BenchmarkEchoString10000Workers-4      50000         29251 ns/op
BenchmarkEchoStringNocompress10000Workers-4   100000         16329 ns/op
BenchmarkEchoStruct10000Workers-4      50000         32244 ns/op
BenchmarkEchoStructNocompress10000Workers-4   100000         19387 ns/op

funny-falcon · 2015-03-05T12:50:34Z

your code is not equivalent to my proposal. I'll try to fix it and post bench soon.
My proposal didn't disable buffering, it just makes buffering low latency.

funny-falcon · 2015-03-05T13:22:24Z

Made a separate pool request #2 with benching

…s non-rpc messages in a bounded worker pool

funny-falcon added 3 commits March 4, 2015 13:31

Client CallTimeout improvement

9c9f6d4

- try nonblocking channel send, which is faster. - stop timer, so runtime doesn't need to fire goroutine when it expires.

func clientWriter flush without timeout

d4f2053

react on no messages in requestsChan instead of flush timeout. probably it will use a bit more CPU, but decrease request latency for service without big load.

func serverWriter flush without timeout

b8fbdda

react on no responses in responsesChan instead of flush timeout. probably it will use a bit more CPU, but decrease request latency for service without big load.

valyala closed this Mar 4, 2015

valyala pushed a commit that referenced this pull request Mar 5, 2015

Use non-blocking select for fast path and blocking select for slow pa…

ec89681

…th. Thanks for this hack to funny-falcon. See #1 for details

valyala pushed a commit that referenced this pull request Mar 5, 2015

Added ability to disable messages buffering by passing negative value…

1643298

…s to FlushDelay. Thanks to funny-falcon to the idea - see #1 for details

iwasaki-kenta referenced this pull request in perlin-network/noise May 9, 2019

peer, examples/benchmark*: allow for a handler to be passed to proces…

71d6ff2

…s non-rpc messages in a bounded worker pool

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements #1

Improvements #1

funny-falcon commented Mar 4, 2015

valyala commented Mar 4, 2015

funny-falcon commented Mar 5, 2015

valyala commented Mar 5, 2015

valyala commented Mar 5, 2015

valyala commented Mar 5, 2015

funny-falcon commented Mar 5, 2015

valyala commented Mar 5, 2015

funny-falcon commented Mar 5, 2015

funny-falcon commented Mar 5, 2015

Improvements #1

Improvements #1

Conversation

funny-falcon commented Mar 4, 2015

valyala commented Mar 4, 2015

funny-falcon commented Mar 5, 2015

valyala commented Mar 5, 2015

valyala commented Mar 5, 2015

valyala commented Mar 5, 2015

funny-falcon commented Mar 5, 2015

valyala commented Mar 5, 2015

funny-falcon commented Mar 5, 2015

funny-falcon commented Mar 5, 2015