Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements #1

Closed
wants to merge 3 commits into from
Closed

Improvements #1

wants to merge 3 commits into from

Conversation

funny-falcon
Copy link

  • use non-blocking select from single channel when possible
    such select doesn't need heap allocation nor complex logic
  • stop timer in client CallTimeout
  • detect empty requestsChan and responsesChan instead of timer to flush writers
    with default PendingRequests == PendingResponses == 32768 it is good both for throughput and latency
    (but i'll recomend to decrease default write buffers a bit).

- try nonblocking channel send, which is faster.
- stop timer, so runtime doesn't need to fire goroutine when it expires.
react on no messages in requestsChan instead of flush timeout.
probably it will use a bit more CPU, but decrease request latency for
service without big load.
react on no responses in responsesChan instead of flush timeout.
probably it will use a bit more CPU, but decrease request latency for
service without big load.
@valyala
Copy link
Owner

valyala commented Mar 4, 2015

Thanks for the participation, but unfortunately I cannot merge these changes due to problems described above.

@valyala valyala closed this Mar 4, 2015
@funny-falcon
Copy link
Author

It's a pity that you even measured proposed changes and their impact. I share not just my "thoughts" but experience, and you talk about it as if it is "premature optimizations". Yeah, it is "premature optimizations" if you never plan to reach 100000rps and more. But it is profile guided optimization at that rate, cause i did profile at that rate.

valyala pushed a commit that referenced this pull request Mar 5, 2015
…th. Thanks for this hack to funny-falcon. See #1 for details
@valyala
Copy link
Owner

valyala commented Mar 5, 2015

I just benchmarked blocking select with non-blocking select and was impressed by the numbers:

func makeChans() (bCh chan struct{}, nbCh chan struct{}) {
        nbCh = make(chan struct{})
        close(nbCh)
        bCh = make(chan struct{})
        return
}

func BenchmarkBlockingSelect(b *testing.B) {
        bCh, nbCh := makeChans()

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                select {
                case <-nbCh:  // query queue emulation
                case <-bCh:   // timer emulation
                }
        }
}

func BenchmarkNonBlockingSelect(b *testing.B) {
        bCh, nbCh := makeChans()

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                select {
                case <-nbCh:
                default:
                        b.Fatalf("unexpected code path")
                        select {
                        case <-nbCh:
                        case <-bCh:
                        }
                }
        }
}

Results:

BenchmarkBlockingSelect  5000000           256 ns/op
BenchmarkNonBlockingSelect  30000000            43.2 ns/op

This proves that your change with select increases its' performance by more than 6 times! The change has been adopted in the code - see ec89681 .

@valyala
Copy link
Owner

valyala commented Mar 5, 2015

FYI, I slightly updated benchmark code to be closer to the reality:

func makeChans(n int) (bCh chan struct{}, nbCh chan struct{}) {
        nbCh = make(chan struct{}, n)
        bCh = make(chan struct{})
        return
}

func BenchmarkBlockingSelect(b *testing.B) {
        bCh, nbCh := makeChans(b.N)

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                select {
                case nbCh <- struct{}{}: // query queue emulation
                case <-bCh: // timer emulation
                }
        }
}

func BenchmarkNonBlockingSelect(b *testing.B) {
        bCh, nbCh := makeChans(b.N)

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                select {
                case nbCh <- struct{}{}:
                default:
                        b.Fatalf("Unexpected code path")
                        select {
                        case nbCh <- struct{}{}:
                        case <-bCh:
                        }
                }
        }
}

But this didn't change benchmark results.

@valyala
Copy link
Owner

valyala commented Mar 5, 2015

As to the change with time.After() -> time.NewTimer(), benchark results prove this was worthless optimization:

func BenchmarkTimerStop(b *testing.B) {
        for i := 0; i < b.N; i++ {
                t := time.NewTimer(time.Millisecond)
                select {
                case <-t.C:
                default:
                        t.Stop()
                }
        }
}

func BenchmarkTimerNoStop(b *testing.B) {
        for i := 0; i < b.N; i++ {
                tc := time.After(time.Millisecond)
                select {
                case <-tc:
                default:
                }
        }
}

Results:

BenchmarkTimerStop-4     1000000          1483 ns/op         192 B/op          3 allocs/op
BenchmarkTimerNoStop-4   2000000           854 ns/op         192 B/op          3 allocs/op

@funny-falcon
Copy link
Author

Yeah, you are right about timers:
I benched it when go were 1.1, and then each timer expiration created goroutines. When I asked for a way to call function within timers goroutine, they refuse it.
And now they call timers function in a timers goroutine.... just as i asked, and they refused...

valyala pushed a commit that referenced this pull request Mar 5, 2015
…s to FlushDelay. Thanks to funny-falcon to the idea - see #1 for details
@valyala
Copy link
Owner

valyala commented Mar 5, 2015

FYI, I added an ability to disable message buffering on both client and server for those who need minimal rpc latency. But benchark results say that gorpc has better throughput with enabled message buffering:

10K workers, default 5ms delay for message buffering:

BenchmarkEchoInt10000Workers-4    100000         10849 ns/op
BenchmarkEchoIntNocompress10000Workers-4      200000         10088 ns/op
BenchmarkEchoString10000Workers-4     100000         11528 ns/op
BenchmarkEchoStringNocompress10000Workers-4   100000         11218 ns/op
BenchmarkEchoStruct10000Workers-4     100000         14669 ns/op
BenchmarkEchoStructNocompress10000Workers-4   100000         13533 ns/op

10K workers, disabled message buffering:

BenchmarkEchoInt10000Workers-4     50000         27190 ns/op
BenchmarkEchoIntNocompress10000Workers-4      100000         14499 ns/op
BenchmarkEchoString10000Workers-4      50000         29251 ns/op
BenchmarkEchoStringNocompress10000Workers-4   100000         16329 ns/op
BenchmarkEchoStruct10000Workers-4      50000         32244 ns/op
BenchmarkEchoStructNocompress10000Workers-4   100000         19387 ns/op

@funny-falcon
Copy link
Author

your code is not equivalent to my proposal. I'll try to fix it and post bench soon.
My proposal didn't disable buffering, it just makes buffering low latency.

@funny-falcon
Copy link
Author

Made a separate pool request #2 with benching

iwasaki-kenta referenced this pull request in perlin-network/noise May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants