perf: Lift the thread limit to enable full concurrency #2145

MOZGIII · 2020-03-25T14:05:00Z

After switching to tokio-compat, the thread limit should not be required.

Closes #391.
Closes #1696.

We're not merging this until we do #1696.

After switching to tokio-compat, the thread limit should not be required. Signed-off-by: MOZGIII <mike-n@narod.ru>

MOZGIII · 2020-03-26T14:34:08Z

/test -t tcp_to_tcp_performance -c big-vms

MOZGIII · 2020-03-26T14:57:22Z

/test -t tcp_to_tcp_performance -c big-vms

github-actions · 2020-03-26T15:46:01Z

Test harness invocation requested by #2145 (comment) is complete!

Something went wrong, see log for more details.

You can check the execution log to learn more!

binarylogic · 2020-03-28T02:45:20Z

/test -t tcp_to_tcp_performance -c big-vms

MOZGIII · 2020-03-28T02:48:38Z

@binarylogic it's unfortunately broken again. ☹️ See vectordotdev/vector-test-harness#45
I think we desperately need a CI flow for the test harness itself.

github-actions · 2020-03-28T03:38:13Z

Test harness invocation requested by #2145 (comment) is complete!

Something went wrong, see log for more details.

You can check the execution log to learn more!

MOZGIII · 2020-04-01T21:51:48Z

/test -t tcp_to_tcp_performance -c big_vms

binarylogic · 2020-04-01T22:06:04Z

🥁

github-actions · 2020-04-01T22:40:47Z

Test harness invocation requested by #2145 (comment) is complete!

Something went wrong, see log for more details.

You can check the execution log to learn more!

github-actions · 2020-04-01T23:36:34Z

Test harness invocation requested by #2145 (comment) is complete!


                                   __   __  __
                                   \ \ / / / /
                                    \ V / / /
                                     \_/  \/

                                   V E C T O R

--------------------------------------------------------------------------------
Test Comparison
Test name: tcp_to_tcp_performance
Test configuration: big_vms
Subject: vector
Versions: 
--------------------------------------------------------------------------------
Metric         
---------------
Test count     
Duration (avg) 
Duration (max) 
CPU sys (max)  
CPU usr (max)  
Load 1m (avg)  
Mem used (max) 
Disk read (avg)
Disk read (sum)
Disk writ (sum)
Net recv (avg) 
Net recv (sum) 
Net send (sum) 
TCP estab (avg)
TCP syn (avg)  
TCP close (avg)
--------------------------------------------------------------------------------
W = winner

You can check the execution log to learn more!

binarylogic · 2020-04-02T00:28:07Z

It worked! I’ll fix the output so the results are displayed.

MOZGIII · 2020-04-02T00:29:32Z

Great! This may be related: vectordotdev/vector-test-harness#47

binarylogic · 2020-04-02T13:48:05Z

/test -t tcp_to_tcp_performance -c 16-cores

github-actions · 2020-04-02T14:45:09Z

Test harness invocation requested by #2145 (comment) is complete!

Something went wrong, see log for more details.

You can check the execution log to learn more!

binarylogic · 2020-04-02T15:27:00Z

/test -t tcp_to_tcp_performance -c 16_cores

binarylogic · 2020-04-02T17:29:30Z

Here are the new results:

➜ aws-vault exec vector -- ./bin/compare -s vector -t tcp_to_tcp_performance -c 16_cores -v 0.6.0 -v 0.7.0 -v 0.8.2 -v nightly/2020-03-30 -v dev-thread-limit-lift-1-6305861 -r

                                   __   __  __
                                   \ \ / / / /
                                    \ V / / /
                                     \_/  \/

                                   V E C T O R

--------------------------------------------------------------------------------
Test Comparison
Test name: tcp_to_tcp_performance
Test configuration: 16_cores
Subject: vector
Versions: 0.6.0 0.7.0 0.8.2 nightly/2020-03-30 dev-thread-limit-lift-1-6305861
--------------------------------------------------------------------------------
Metric          | 0.6.0              | 0.7.0              | 0.8.2             | dev-thread-limit-lift-1-630... | nightly/2020-03-30
----------------|--------------------|--------------------|-------------------|--------------------------------|-------------------
Test count      | 1                  | 1                  | 1                 | 1                              | 1                 
Duration (avg)  | 63s                | 63s                | 64s               | 61s                            | 63s               
Duration (max)  | 63s                | 63s                | 64s               | 61s                            | 63s               
CPU sys (max)   | 0.8 W              | 0.9 (+17%)         | 2.5 (+227%)       | 1.1 (+49%)                     | 0.9 (+16%)        
CPU usr (max)   | 24.7 (+13%)        | 24.7 (+13%)        | 24.6 (+13%)       | 23.3 (+7%)                     | 21.7 W            
Load 1m (avg)   | 1.9 (+5%)          | 1.9 (+5%)          | 1.9 (+5%)         | 1.8 W                          | 1.9 (+2%)         
Mem used (max)  | 358.8 MiB (+1%)    | 353.7 MiB W        | 355.8 MiB (+0%)   | 381.5 MiB (+7%)                | 364.3 MiB (+2%)   
Disk read (avg) | 562.5 kib/s (-21%) | 558.5 kib/s (-21%) | 657.3 kib/s (-7%) | 453.4 kib/s (-36%)             | 712.8 kib/s W     
Disk read (sum) | 34.6 MiB (+28%)    | 34.4 MiB (+27%)    | 41.1 MiB (+52%)   | 27 MiB W                       | 43.9 MiB (+62%)   
Disk writ (sum) | 4.5 MiB (+43%)     | 3.1 MiB W          | 17.2 MiB (+454%)  | 27.4 MiB (+782%)               | 20.5 MiB (+562%)  
Net recv (avg)  | 97 MiB/s W         | 94.7 MiB/s (-2%)   | 76.5 MiB/s (-21%) | 61 MiB/s (-37%)                | 59 MiB/s (-39%)   
Net recv (sum)  | 6 gib W            | 5.8 gib (-2%)      | 4.8 gib (-19%)    | 3.6 gib (-39%)                 | 3.6 gib (-39%)    
Net send (sum)  | 6 gib              | 5.8 gib            | 4.8 gib           | 3.6 gib                        | 3.6 gib           
TCP estab (avg) | 429                | 427                | 428               | 424                            | 451               
TCP syn (avg)   | 0                  | 0                  | 0                 | 0                              | 0                 
TCP close (avg) | 0                  | 0                  | 0                 | 0                              | 0                 
--------------------------------------------------------------------------------
W = winner

I have suspicion that this is not actually raising the thread limit since nothing really changed. It would be nice to investigate CPU core usage when this test is actually running.

binarylogic · 2020-04-02T18:31:01Z

I updated the above results with more versions. In 0.2.0 (yes, a pretty old version), removing the thread cap resulted in a sharp decline in throughput. I would expect to see the same if tokio-compat didn't fix it. The only way to know is to see if we can reproduce the decline on the 0.8.X branch before we introduced tokio-compat.

binarylogic · 2020-04-02T19:41:22Z

Looks like this change fixed it 😄

➜ aws-vault exec vector -- ./bin/compare -s vector -t tcp_to_tcp_performance -c 16_cores -v nightly/2020-03-30 -v dev-thread-limit-lift-1-6305861 -v dev-v0-8-thread-limit-test-1-7963032 

                                   __   __  __
                                   \ \ / / / /
                                    \ V / / /
                                     \_/  \/

                                   V E C T O R

--------------------------------------------------------------------------------
Test Comparison
Test name: tcp_to_tcp_performance
Test configuration: 16_cores
Subject: vector
Versions: nightly/2020-03-30 dev-thread-limit-lift-1-6305861 dev-v0-8-thread-limit-test-1-7963032
--------------------------------------------------------------------------------
Metric          | dev-thread-limit-lift-1-630... | dev-v0-8-thread-limit-test-... | nightly/2020-03-30
----------------|--------------------------------|--------------------------------|-------------------
Test count      | 1                              | 1                              | 1                 
Duration (avg)  | 61s                            | 61s                            | 63s               
Duration (max)  | 61s                            | 61s                            | 63s               
CPU sys (max)   | 1.1 (+28%)                     | 11.6 (+1221%)                  | 0.9 W             
CPU usr (max)   | 23.3 (+7%)                     | 50.2 (+131%)                   | 21.7 W            
Load 1m (avg)   | 1.8 W                          | 4.7 (+159%)                    | 1.9 (+2%)         
Mem used (max)  | 381.5 MiB (+4%)                | 375 MiB (+2%)                  | 364.3 MiB W       
Disk read (avg) | 453.4 kib/s (-36%)             | 423 kib/s (-40%)               | 712.8 kib/s W     
Disk read (sum) | 27 MiB (+7%)                   | 25.2 MiB W                     | 43.9 MiB (+74%)   
Disk writ (sum) | 27.4 MiB (+33%)                | 29.3 MiB (+42%)                | 20.5 MiB W        
Net recv (avg)  | 61 MiB/s W                     | 8.5 MiB/s (-85%)               | 59 MiB/s (-3%)    
Net recv (sum)  | 3.6 gib W                      | 521.2 MiB (-85%)               | 3.6 gib (0%)      
Net send (sum)  | 3.6 gib                        | 519 MiB                        | 3.6 gib           
TCP estab (avg) | 424                            | 423                            | 451               
TCP syn (avg)   | 0                              | 0                              | 0                 
TCP close (avg) | 0                              | 0                              | 0                 
--------------------------------------------------------------------------------
W = winner

You can see my comparison branch here. Unless I'm missing something, this appears to be resolved. I'd still like to verify that this branch is actually using all CPU cores. Or I may be misunderstandingn how Vector utilizes cores with the new runtime.

MOZGIII · 2020-04-02T19:48:48Z

🎉 Finally! Getting results from the test harness is so time-consuming, but we're getting there.

LucioFranco · 2020-04-02T19:49:15Z

@binarylogic so I doubt it will use all cores because we can only partition the work on multiple cores based on each connection being on a single task/core. The issue that we were most likely seeing before was the work stealing scheduler was causing a lot of contention trying to steal tasks because of the amount of idle workers. Aka if we have more workers than tasks that are being executed those idle workers will put a lot of contention on the global task queue. Which can drastically slow it down. So this aligns well with what we chatted with @jonhoo about a while back :) and why he originally wrote tokio-io-pool. So this overall looks like it aligns with what we expected. I think we also have done more work to spawn more tasks to handle the load better. So I wouldn't be surprised if we didn't use all the CPU but were in fact just saturating our current workload with better work that isn't just creating contention on a single item.

MOZGIII · 2020-04-02T19:50:20Z

Note that 0.8 doesn't have the patch that lifts thread limit, so we effectively run at 4 threads.

MOZGIII · 2020-04-02T19:52:13Z

Note that 0.8 doesn't have the patch that lifts thread limit, so we effectively run at 4 threads.

That said, the comparison, nonetheless, shows that this PR can be merged and the high threads count is not a problem anymore.

binarylogic · 2020-04-02T19:59:50Z

Let's merge it then 🚀

After switching to tokio-compat, the thread limit should not be required. Signed-off-by: MOZGIII <mike-n@narod.ru>

Hoverbear added domain: networking Anything related to Vector's networking type: performance labels Mar 25, 2020

Hoverbear requested a review from LucioFranco March 25, 2020 17:27

Hoverbear assigned MOZGIII Mar 25, 2020

LucioFranco approved these changes Mar 25, 2020

View reviewed changes

Lift the threads limit

6305861

After switching to tokio-compat, the thread limit should not be required. Signed-off-by: MOZGIII <mike-n@narod.ru>

MOZGIII force-pushed the thread-limit-lift branch from 08ab248 to 6305861 Compare March 26, 2020 13:58

binarylogic mentioned this pull request Mar 28, 2020

Add CI for test harness vectordotdev/vector-test-harness#46

Closed

binarylogic mentioned this pull request Apr 2, 2020

chore(operations): Compare test harness output with nightly/latest #2204

Merged

MOZGIII marked this pull request as ready for review April 2, 2020 19:48

MOZGIII requested a review from lukesteensen as a code owner April 2, 2020 19:48

lukesteensen approved these changes Apr 2, 2020

View reviewed changes

Hoverbear approved these changes Apr 2, 2020

View reviewed changes

binarylogic changed the title ~~perf: Lift the thread limit~~ perf: Lift the thread limit to enable full concurrency Apr 2, 2020

binarylogic merged commit cb0da84 into master Apr 2, 2020

binarylogic deleted the thread-limit-lift branch April 2, 2020 21:04

binarylogic pushed a commit that referenced this pull request Apr 5, 2020

perf: Lift the thread limit to enable full concurrency (#2145)

c38aeb2

After switching to tokio-compat, the thread limit should not be required. Signed-off-by: MOZGIII <mike-n@narod.ru>

binarylogic added type: enhancement A value-adding code change that enhances its existing functionality. domain: performance Anything related to Vector's performance and removed type: performance labels Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Lift the thread limit to enable full concurrency #2145

perf: Lift the thread limit to enable full concurrency #2145

MOZGIII commented Mar 25, 2020 •

edited by binarylogic

Loading

MOZGIII commented Mar 26, 2020

MOZGIII commented Mar 26, 2020

github-actions bot commented Mar 26, 2020

binarylogic commented Mar 28, 2020

MOZGIII commented Mar 28, 2020

github-actions bot commented Mar 28, 2020

MOZGIII commented Apr 1, 2020

binarylogic commented Apr 1, 2020

github-actions bot commented Apr 1, 2020

github-actions bot commented Apr 1, 2020

binarylogic commented Apr 2, 2020

MOZGIII commented Apr 2, 2020

binarylogic commented Apr 2, 2020

github-actions bot commented Apr 2, 2020

binarylogic commented Apr 2, 2020

binarylogic commented Apr 2, 2020 •

edited

Loading

binarylogic commented Apr 2, 2020

binarylogic commented Apr 2, 2020

MOZGIII commented Apr 2, 2020

LucioFranco commented Apr 2, 2020

MOZGIII commented Apr 2, 2020

MOZGIII commented Apr 2, 2020

binarylogic commented Apr 2, 2020

perf: Lift the thread limit to enable full concurrency #2145

perf: Lift the thread limit to enable full concurrency #2145

Conversation

MOZGIII commented Mar 25, 2020 • edited by binarylogic Loading

MOZGIII commented Mar 26, 2020

MOZGIII commented Mar 26, 2020

github-actions bot commented Mar 26, 2020

binarylogic commented Mar 28, 2020

MOZGIII commented Mar 28, 2020

github-actions bot commented Mar 28, 2020

MOZGIII commented Apr 1, 2020

binarylogic commented Apr 1, 2020

github-actions bot commented Apr 1, 2020

github-actions bot commented Apr 1, 2020

binarylogic commented Apr 2, 2020

MOZGIII commented Apr 2, 2020

binarylogic commented Apr 2, 2020

github-actions bot commented Apr 2, 2020

binarylogic commented Apr 2, 2020

binarylogic commented Apr 2, 2020 • edited Loading

binarylogic commented Apr 2, 2020

binarylogic commented Apr 2, 2020

MOZGIII commented Apr 2, 2020

LucioFranco commented Apr 2, 2020

MOZGIII commented Apr 2, 2020

MOZGIII commented Apr 2, 2020

binarylogic commented Apr 2, 2020

MOZGIII commented Mar 25, 2020 •

edited by binarylogic

Loading

binarylogic commented Apr 2, 2020 •

edited

Loading