Decide number of clients basing on average request size of client #15369

radek-kondziolka · 2022-12-12T11:18:56Z

Description

Let's define the class of queries that are run on skewed data and as a result on some downstream stage there is one (or few nodes) that gather data from empty or nearly empty nodes at upstream stage - i.e. at some upstream stage most nodes are empty (serving no data) or nearly empty (serving little data). We observed that such queries are executed very poorly on Trino.
This PR addresses that issue.

The query listed below is a representant of that class:

SELECT
    count(*),
    cc.cc_name,
    cc.cc_class,
    cc.cc_manager,
    cc.cc_mkt_desc,
    cc.cc_market_manager,
    cc.cc_division_name
FROM catalog_sales cs
    RIGHT JOIN call_center cc ON cc.cc_call_center_sk =  cs.cs_call_center_sk
    RIGHT JOIN store s ON cc.cc_closed_date_sk = s.s_closed_date_sk
GROUP BY 2, 3, 4, 5, 6, 7
ORDER BY 1, 2, 3, 4, 5, 6, 7
LIMIT 100;

Before the change it takes 11,5 minutes to finish that query on 32 nodes r6g.4xlarge cluster. After the change it takes 4 minutes to finish on same cluster.
No regresion in serial and throughput benchmarks.

Release notes

( *) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

sopel39 · 2022-12-12T21:42:34Z

core/trino-main/src/main/java/io/trino/operator/DirectExchangeClient.java

+                .filter(client -> !queuedClients.contains(client) && !completedClients.contains(client))
+                .mapToLong(HttpPageBufferClient::getAverageRequestSizeInBytes)
+                .sum();
+        long bytesToBeRequested = 0;


projectedBytesToBeRequested

ok, let it be

sopel39 · 2022-12-12T21:45:39Z

core/trino-main/src/main/java/io/trino/operator/DirectExchangeClient.java

@@ -106,7 +108,9 @@ public DirectExchangeClient(
            ScheduledExecutorService scheduledExecutor,
            LocalMemoryContext memoryContext,
            Executor pageBufferClientCallbackExecutor,
-            TaskFailureListener taskFailureListener)
+            TaskFailureListener taskFailureListener,
+            ConcurrentHashMap<URI, HttpPageBufferClient> allClients,


it's better to expose package private getter for testing

I do not think so in that case. If I do it in that way I need to make the queuedClients collection to be concurrent or I need another mechanism of synchronization. I saw in the codebase that we use @VisibleForTesting constructor.

visible for testing (package private)
getAllClients()
getQueuedClients()

core/trino-main/src/main/java/io/trino/operator/DirectExchangeClient.java

sopel39 · 2022-12-12T21:52:38Z

core/trino-main/src/main/java/io/trino/operator/HttpPageBufferClient.java

+    synchronized void updateAverageRequestSize(int successfulRequests, long responseSize)
+    {
+        if (successfulRequests > 0) {
+            averageRequestSizeInBytes = (averageRequestSizeInBytes * (successfulRequests - 1) + responseSize) / successfulRequests;


in current could this is using doubles for computations:

// AVG_n = AVG_(n-1) * (n-1)/n + VALUE_n / n averageBytesPerRequest = (long) (1.0 * averageBytesPerRequest * (successfulRequests - 1) / successfulRequests + responseSize / successfulRequests);

I think its more accurate

sopel39 · 2022-12-12T21:53:02Z

core/trino-main/src/main/java/io/trino/operator/HttpPageBufferClient.java

+                long responseSize = pages.stream().mapToLong(Slice::length).sum();
+                synchronized (HttpPageBufferClient.this) {
+                    requestsCompleted.incrementAndGet();
+                    updateAverageRequestSize(Math.max(0, requestsCompleted.get() - requestsFailed.get()), responseSize);


can requestsCompleted.get() - requestsFailed.get() become negative?

It should not. It is kind of defensive programming.

core/trino-main/src/main/java/io/trino/operator/HttpPageBufferClient.java

sopel39 · 2022-12-12T21:55:39Z

core/trino-main/src/main/java/io/trino/operator/HttpPageBufferClient.java

@@ -511,8 +532,8 @@ public void onSuccess(@Nullable StatusResponse result)
                        future = null;
                    }
                    lastUpdate = DateTime.now();
+                    requestsCompleted.incrementAndGet();


I need that. I am doing artihmetic on requestsFailed and requestsCompleted. Without synchronization my average would be invalid.

core/trino-main/src/main/java/io/trino/operator/HttpPageBufferClient.java

core/trino-main/src/main/java/io/trino/operator/DirectExchangeClient.java

sopel39

lgtm % comments

sopel39 · 2022-12-13T11:01:15Z

core/trino-main/src/main/java/io/trino/operator/DirectExchangeClient.java

@@ -106,7 +108,9 @@ public DirectExchangeClient(
            ScheduledExecutorService scheduledExecutor,
            LocalMemoryContext memoryContext,
            Executor pageBufferClientCallbackExecutor,
-            TaskFailureListener taskFailureListener)
+            TaskFailureListener taskFailureListener,
+            ConcurrentHashMap<URI, HttpPageBufferClient> allClients,


visible for testing (package private)
getAllClients()
getQueuedClients()

core/trino-main/src/main/java/io/trino/operator/DirectExchangeClient.java

core/trino-main/src/main/java/io/trino/operator/HttpPageBufferClient.java

core/trino-main/src/test/java/io/trino/operator/TestDirectExchangeClient.java

radek-kondziolka · 2022-12-13T12:23:36Z

It gives some gain in our TPCDS / TPCH benchmarks:

32 nodes r6g.4xlarge, orc, unpart, sf1000:
TPCDS walltime: -4.69%
TPCH walltime: -6.87%

radek-kondziolka · 2022-12-13T13:51:42Z

@sopel39 all comments addressed

sopel39

small comment

core/trino-main/src/test/java/io/trino/operator/TestDirectExchangeClient.java

Change the way how DirectExchangeClient.scheduleRequestIfNecessary calculates the number of clients to be requested on the exchange phase to use an average request size of specific client instead of aggregated average of all clients.

Tests for a new approach to calculate the number of clients to be requested was added. Beyond that, the test for calculating average size of request was added.

radek-kondziolka · 2022-12-15T07:45:49Z

small comment

commit structure repaired

cla-bot bot added the cla-signed label Dec 12, 2022

radek-kondziolka force-pushed the rk/avarage_per_client branch 2 times, most recently from 5d5fe25 to b520ecf Compare December 12, 2022 12:36

radek-kondziolka requested review from sopel39, lukasz-stec and raunaqmorarka December 12, 2022 12:36

radek-kondziolka marked this pull request as ready for review December 12, 2022 12:37

radek-kondziolka force-pushed the rk/avarage_per_client branch from b520ecf to c8e680e Compare December 12, 2022 12:40

radek-kondziolka changed the title ~~Rk/avarage per client~~ Decide number of clients basing on average request size of client Dec 12, 2022

radek-kondziolka requested a review from Dith3r December 12, 2022 14:38

radek-kondziolka force-pushed the rk/avarage_per_client branch from c8e680e to 77388b6 Compare December 12, 2022 14:40

sopel39 reviewed Dec 12, 2022

View reviewed changes

radek-kondziolka force-pushed the rk/avarage_per_client branch from 77388b6 to 94f00d6 Compare December 13, 2022 08:56

radek-kondziolka requested a review from sopel39 December 13, 2022 09:00

sopel39 approved these changes Dec 13, 2022

View reviewed changes

radek-kondziolka force-pushed the rk/avarage_per_client branch from 94f00d6 to 189a510 Compare December 13, 2022 13:49

Dith3r approved these changes Dec 14, 2022

View reviewed changes

sopel39 approved these changes Dec 14, 2022

View reviewed changes

core/trino-main/src/test/java/io/trino/operator/TestDirectExchangeClient.java Show resolved Hide resolved

radek-kondziolka added 2 commits December 15, 2022 08:43

Add tests to DirectExchangeClient and HttpPageBufferClient

51a7d65

Tests for a new approach to calculate the number of clients to be requested was added. Beyond that, the test for calculating average size of request was added.

radek-kondziolka force-pushed the rk/avarage_per_client branch from 189a510 to 51a7d65 Compare December 15, 2022 07:44

sopel39 merged commit ac661b9 into trinodb:master Dec 15, 2022

sopel39 mentioned this pull request Dec 15, 2022

Release notes for 405 #15058

Closed

github-actions bot added this to the 404 milestone Dec 15, 2022

colebow mentioned this pull request Dec 21, 2022

Add Trino 405 release notes #15139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide number of clients basing on average request size of client #15369

Decide number of clients basing on average request size of client #15369

radek-kondziolka commented Dec 12, 2022 •

edited

Loading

sopel39 Dec 12, 2022

radek-kondziolka Dec 13, 2022

sopel39 Dec 12, 2022

radek-kondziolka Dec 13, 2022 •

edited

Loading

sopel39 Dec 13, 2022

sopel39 Dec 12, 2022

radek-kondziolka Dec 13, 2022

sopel39 Dec 12, 2022

radek-kondziolka Dec 13, 2022

sopel39 Dec 12, 2022

radek-kondziolka Dec 13, 2022

sopel39 left a comment

sopel39 Dec 13, 2022

radek-kondziolka commented Dec 13, 2022 •

edited

Loading

radek-kondziolka commented Dec 13, 2022

sopel39 left a comment

radek-kondziolka commented Dec 15, 2022

Decide number of clients basing on average request size of client #15369

Decide number of clients basing on average request size of client #15369

Conversation

radek-kondziolka commented Dec 12, 2022 • edited Loading

Description

Release notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radek-kondziolka Dec 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radek-kondziolka commented Dec 13, 2022 • edited Loading

radek-kondziolka commented Dec 13, 2022

sopel39 left a comment

Choose a reason for hiding this comment

radek-kondziolka commented Dec 15, 2022

radek-kondziolka commented Dec 12, 2022 •

edited

Loading

radek-kondziolka Dec 13, 2022 •

edited

Loading

radek-kondziolka commented Dec 13, 2022 •

edited

Loading