Add configs to support the TTL of alluxio SDK cache #268

beinan · 2023-06-16T23:36:49Z

Add support for the TTL of Alluxio SDK cache

== RELEASE NOTES ==

Hive Changes
* Add support for the TTL of Alluxio SDK cache

PartitionedOutput plan node may change the order of input columns. In these cases, when translating PartitionedOutput to PartitionAndSerialize we need to add a ProjectNode that will reorder the columns.

SqlVarbinary needs to be converted to Java's byte[] as test type in order to match the results in test suite.

…e the metric same as prestodb side.

Also, do not poll intermediate tasks for results. These tasks write results to shuffle and therefore cannot return anything.

Reviewed By: bigfootjon Differential Revision: D44454761 fbshipit-source-id: 054af6bed6f50fa4cffeee00db0c9374ec466f08

Currently DetachedNativeExecutionProcess class is defined in test folder as it was only being used for test execution. But now that we want to support Presto-on-Spark in localMode, we want to be able to talk to an already running CPP process during e2e query execution. Moving it out into src to enable this

This reverts commit eaf0203.

Simplify query plan when input is empty.

This rule is replaced by SimplifyPlanWithEmptyInput rule.

Summary: Pull Request resolved: prestodb#19504 The scope of this PR is to address a couple of correlated enhancements for Decimal Types listed below 1) Introduced `TypeKind::HUGEINT` (`HugeintType`) with `int128_t` as its CPP type. The support is limited to the needs of the LongDecimalType. 2) Removed the `TypeKind::SHORT_DECIMAL`, `TypeKind::LONG_DECIMAL` enum values and replaced them with a Velox type ShortDecimalType based on BigintType and LongDecimalType based on HugeintType. This change replaces Decimal CPP types `UnscaledShortDecimal` with `int64_t`, `UnscaledLongDecimal` with `int128_t`. 3) Removed `SHORT_DECIMAL`, `LONG_DECIMAL` APIs and replaced them with `DECIMAL`. The above changes mean data of decimal types are stored in memory using 64-bit/128-bit integers. e.g: `FlatVector<int64_t>`, `FlatVector<int128_t>`. The individual value is an unscaled decimal value. A variant similarly holds an unscaled value using 64-bit/128-bit integers. The Decimal Type must be present to interpret the decimal semantics. Earlier, the decimal limit check was enabled in UnscaledXXXDecimal only in debug mode. That won't catch overflows in production (release build). The new approach would be to explicitly call `DecimalUtil::valueInRange(int128_t)`where ever a decimal value is computed. See usage in SumAggregate.h, DecimalArithmetic.cpp. This is what the Presto Java implementation does as well. Vector functions must use `DECIMAL` in the signature. Simple functions require a unique type for ShortDecimal and LongDecimal to bind the appropriate implementation. Therefore, simple functions are not supported for Decimal Types. See facebookincubator/velox#4069 X-link: facebookincubator/velox#4434 Reviewed By: mbasmanova Differential Revision: D45443908 Pulled By: Yuhta fbshipit-source-id: 4bb1d1d870a666aa0c8811840131fe236e328043

Summary: X-link: facebookincubator/velox#4797 3 more fixes to UnsafeRow serialization to make it compatible with Spark Java: - The first integer that describes the row size needs to be 32 not 64 bits. - This integer needs to be serialized in big endian order. Curiously, the remaining integers within the UnsafeRow itself are little endian. - The input buffer allocated needs to be initialized to zero, since not all portions of it will be initialized in the UnsafeRow serialization code. Reviewed By: mbasmanova Differential Revision: D45446862 fbshipit-source-id: 0961b9a27f367803bb1da149729128c0a6dbc15f

PartitionAndSerialize operator used to include row size in the serialized row: row size | <UnsafeRow>. This causes row size to be serialized twice as ShuffleWrite::collect was adding row size again. This change is to not include row size in the serialized row produced by PartitionAndSerialize operator.

After adding empty table optimization, the query plan which has empty table input of tpc-ds will change. Edit the plan here to reflect the change here.

Previously the json based function definition file is using static path related to current code repo structure, it'll raise error when we import/reuse the function registration method in other modules (file not found error). This PR changed the file path to a relative path based on current class-loader's resource path.

This helps provide a hook for PrestoSparkNativeTaskExecutoryFactory to shutdown the native process

Split E2E tests and run them in parallel. Also create a separate run for Spark tests. There are 5 jobs that are run in parallel with this change. Please look at each job for test counts and failures. Co-authored-by: Michael Shang <mikesh@fb.com>

Add broadcast read support for file based broadcast by adding new type of exchange source - BroadcastExchangeSource. BroadcastExchangeSource reads data from files specified in split location. Format of split location that this exchange source can handle: batch://<taskid>?broadcastInfo={fileInfos:[<fileInfo>]}

Add new CI image Add -DFOLLY_HAVE_INT128_T=ON to centos setup script Add known warnings to Linux Advance Velox Add support for aggregations over sorted inputs Co-authored-by: Masha Basmanova <mbasmanova@fb.com> Co-authored-by: Deepak Majeti <deepak.majeti@ibm.com>

Remove identity projection below a project node.

Add hive.allow-drop-table permission to the test java runner to avoid access denied error when dropping test tables in TestPrestoSparkNativeGeneralQueries#testDecimalRangeFilters test suite. Re-enabled the TestPrestoSparkNativeGeneralQueries#testDecimalRangeFilters suite as well.

Add ccache to speed up build. Co-authored-by: Michael Shang <mikesh@fb.com>

Summary: In the HTTPClient, callbacks are scheduled on an eventBase. HTTPClient is kept alive using a shared_ptr, but it contains a raw pointer to MemoryPool. This MemoryPool may be freed if Task is aborted earlier, but a callback is executed much later. We see crashes related to this when the batch cluster is under heavy load. So the fix here is to keep shared_ptr to MemoryPool isntead of a raw pointer ``` == NO RELEASE NOTE == ``` Pull Request resolved: prestodb#19865 Reviewed By: xiaoxmeng Differential Revision: D46674355 Pulled By: pranjalssh fbshipit-source-id: 9b53deb6357ff87b8e1a992f3205d0ce9d79c05c

Join output has a restriction that output from left input should be before output from right input. Fix the randomize null join key optimizer here to keep this order.

As title, left side input should be before right side input in join output.

Map $internal$json_string_to_array/map/row_cast to cast(json_parse(x) as array/map/row). Also, remove invalid mappings: row_constructor -> in, isnull -> in.

Task creation involves translating query plan into a set of operator pipelines. ShuffleWrite operator creation used to include creating ShuffleWriter which make take a long time (60s or longer) and cause create-or-update-task RPC to timeout. Move ShuffleWriter creation into ShuffleWrite::addInput to avoid timing out on create-or-update-task request.

This commit updates Drift version to 1.36 and Netty version to 4.1.92.Final to add support for TLS 1.3.

…s script

This reverts commit 5039ce5.

kewang1024 and others added 30 commits April 27, 2023 16:45

[native] Add type convert for IntervalDayTimeType

036c315

Fix bug in SimplifyCardinalityMapRewriter

15bf62b

[native] Fix PartitionedOutput to PartitionAndSerialize node translation

7668999

PartitionedOutput plan node may change the order of input columns. In these cases, when translating PartitionedOutput to PartitionAndSerialize we need to add a ProjectNode that will reorder the columns.

Convert SqlVarbinary to byte[] in test suite

0a606b9

SqlVarbinary needs to be converted to Java's byte[] as test type in order to match the results in test suite.

[native] Advance velox.

0a5c657

[native] Simplify zoombie task logging.

c791629

Add cumulative user memory metric and total cumulative memory and mak…

4a16567

…e the metric same as prestodb side.

[Native] Add decimal arithmetic overflow tests

1eb6559

[native] Remove redundant PartitionedOutput node after ShuffleWrite

6510422

Also, do not poll intermediate tasks for results. These tasks write results to shuffle and therefore cannot return anything.

Add config to skip HBO when limiting nodes are present

2f9ddb8

fbcode/github/presto-trunk/presto-docs/src/main/sphinx/include

eaf0203

Reviewed By: bigfootjon Differential Revision: D44454761 fbshipit-source-id: 054af6bed6f50fa4cffeee00db0c9374ec466f08

Presto Native execution

020e498

[native] Enable TPC-H q12 and q14 for Presto-on-Spark

4162dc8

Revert "fbcode/github/presto-trunk/presto-docs/src/main/sphinx/include"

0c08b8d

This reverts commit eaf0203.

[native] Advance Velox submodule

01d35a3

[native] Split join integration tests to broadcast and partitioned

e8160ea

Add optimization rule to optimize for empty input tables

07d5c85

Simplify query plan when input is empty.

Remove EliminateEmptyJoins rule

54d7b25

This rule is replaced by SimplifyPlanWithEmptyInput rule.

[native] Advance Velox version

aa5c6ff

Fix TaskManagerTest.testCumulativeMemory

507dd4e

[native]Fix duplicate root pool name issue

3ac9983

[native]Remove the legacy memory pool api access

55b6760

[native] Enable TestPrestoSparkNativeAggregation#testMinMax

c8b1723

Refactor PrestoSparkNativeQueryRunnerUtils for better reusability

eeb960e

Fix tpc-ds plan test

594ab25

After adding empty table optimization, the query plan which has empty table input of tpc-ds will change. Edit the plan here to reflect the change here.

shrinidhijoshi and others added 27 commits June 12, 2023 21:43

[native pos] Add close method to IPrestoSparkTaskExecutorFactory

fa07644

This helps provide a hook for PrestoSparkNativeTaskExecutoryFactory to shutdown the native process

Remove documentation about Alluxio Structured Data Service

c0eb1b2

Iceberg connector document update about metadata tables

a2ee05c

Track applicable optimizations that were disabled with a flag

5039ce5

[native]turn on bigint compile option for folly to support velox

bcdb21d

add e2etesting for beta_cdf

e4c8fb4

[native] Add table name to pushdown enable check error message

4be65df

adding starts_with and ends_with

07b9712

Add optimization rule to remove identity projection under project node

517d570

Remove identity projection below a project node.

[native] Add ccache to speed up build

8f9b5d4

Add ccache to speed up build. Co-authored-by: Michael Shang <mikesh@fb.com>

Fix randomize null join key optimizer

bc1273a

Join output has a restriction that output from left input should be before output from right input. Fix the randomize null join key optimizer here to keep this order.

Fix join output for cross join or optimizer rule

38a404c

As title, left side input should be before right side input in join output.

[native pos] Enable TestPrestoSparkNativeGeneralQueries#testUnionAll

ac8b3f6

[native] Add mappings for $internal$json_string_to_array/map/row_cast

089bbb1

Map $internal$json_string_to_array/map/row_cast to cast(json_parse(x) as array/map/row). Also, remove invalid mappings: row_constructor -> in, isnull -> in.

[native]Update Prestissimo code according to velox change

829d711

[native] Advance velox.

5ce9598

[native] Add e2e tests for array trim.

a26bdf2

Update Drift and Netty dependencies to support TLS 1.3

a433154

This commit updates Drift version to 1.36 and Netty version to 4.1.92.Final to add support for TLS 1.3.

Add join output order check in JoinNode constructor

0b83c8d

[native] Fix presto-native-execution README for running setup-adapter…

8a6ea31

…s script

Revert "Track applicable optimizations that were disabled with a flag"

024b792

This reverts commit 5039ce5.

beinan changed the base branch from twitter-master to master June 16, 2023 23:37

Add configs for ttl of Alluxio cache

6edab30

beinan force-pushed the presto_local_cache_ttl branch from 1ad6c42 to 6edab30 Compare June 21, 2023 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configs to support the TTL of alluxio SDK cache #268

Add configs to support the TTL of alluxio SDK cache #268

beinan commented Jun 16, 2023

Add configs to support the TTL of alluxio SDK cache #268

Are you sure you want to change the base?

Add configs to support the TTL of alluxio SDK cache #268

Conversation

beinan commented Jun 16, 2023