Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configs to support the TTL of alluxio SDK cache #268

Open
wants to merge 7,003 commits into
base: master
Choose a base branch
from

Conversation

beinan
Copy link

@beinan beinan commented Jun 16, 2023

Add support for the TTL of Alluxio SDK cache

== RELEASE NOTES ==

Hive Changes
* Add support for the TTL of Alluxio SDK cache

kewang1024 and others added 30 commits April 27, 2023 16:45
PartitionedOutput plan node may change the order of input columns. In these
cases, when translating PartitionedOutput to PartitionAndSerialize we need to
add a ProjectNode that will reorder the columns.
SqlVarbinary needs to be converted to Java's byte[] as test type in
order to match the results in test suite.
Also, do not poll intermediate tasks for results. These tasks write
results to shuffle and therefore cannot return anything.
Reviewed By: bigfootjon

Differential Revision: D44454761

fbshipit-source-id: 054af6bed6f50fa4cffeee00db0c9374ec466f08
Currently DetachedNativeExecutionProcess class is defined in test folder
as it was only being used for test execution. But now that we want to
support Presto-on-Spark in localMode, we want to be able to talk
to an already running CPP process during e2e query execution. Moving it
out into src to enable this
Simplify query plan when input is empty.
This rule is replaced by SimplifyPlanWithEmptyInput rule.
Summary:
Pull Request resolved: prestodb#19504

The scope of this PR is to address a couple of correlated enhancements for Decimal Types listed below
1) Introduced `TypeKind::HUGEINT` (`HugeintType`) with `int128_t` as its CPP type.
 The support is limited to the needs of the LongDecimalType.
2) Removed the `TypeKind::SHORT_DECIMAL`, `TypeKind::LONG_DECIMAL` enum values and
replaced them with a Velox type ShortDecimalType based on BigintType and LongDecimalType
based on HugeintType. This change replaces Decimal CPP types `UnscaledShortDecimal` with `int64_t`,
`UnscaledLongDecimal` with `int128_t`.
3) Removed `SHORT_DECIMAL`, `LONG_DECIMAL` APIs and replaced them with `DECIMAL`.

The above changes mean data of decimal types are stored in memory using 64-bit/128-bit integers.
 e.g: `FlatVector<int64_t>`, `FlatVector<int128_t>`. The individual value is an unscaled decimal value.
A variant similarly holds an unscaled value using 64-bit/128-bit integers.
The Decimal Type must be present to interpret the decimal semantics.

Earlier, the decimal limit check was enabled in UnscaledXXXDecimal only in debug mode.
That won't catch overflows in production (release build). The new approach would be to explicitly call `DecimalUtil::valueInRange(int128_t)`where ever a decimal value is computed.
See usage in SumAggregate.h, DecimalArithmetic.cpp.
This is what the Presto Java implementation does as well.

Vector functions must use `DECIMAL` in the signature.

Simple functions require a unique type for ShortDecimal and LongDecimal to bind the appropriate
implementation. Therefore, simple functions are not supported for Decimal Types.

See facebookincubator/velox#4069

X-link: facebookincubator/velox#4434

Reviewed By: mbasmanova

Differential Revision: D45443908

Pulled By: Yuhta

fbshipit-source-id: 4bb1d1d870a666aa0c8811840131fe236e328043
Summary:
X-link: facebookincubator/velox#4797

3 more fixes to UnsafeRow serialization to make it compatible with
Spark Java:
- The first integer that describes the row size needs to be 32 not 64 bits.
- This integer needs to be serialized in big endian order. Curiously, the
  remaining integers within the UnsafeRow itself are little endian.
- The input buffer allocated needs to be initialized to zero, since not all
  portions of it will be initialized in the UnsafeRow serialization code.

Reviewed By: mbasmanova

Differential Revision: D45446862

fbshipit-source-id: 0961b9a27f367803bb1da149729128c0a6dbc15f
PartitionAndSerialize operator used to include row size in the serialized row:
row size | <UnsafeRow>.

This causes row size to be serialized twice as ShuffleWrite::collect was adding
row size again.

This change is to not include row size in the serialized row produced by
PartitionAndSerialize operator.
After adding empty table optimization, the query plan which has empty
table input of tpc-ds will change. Edit the plan here to reflect the
change here.
Previously the json based function definition file is using static path
related to current code repo structure, it'll raise error when we import/reuse
the function registration method in other modules (file not found error).
This PR changed the file path to a relative path based on current
class-loader's resource path.
shrinidhijoshi and others added 27 commits June 12, 2023 21:43
This helps provide a hook for PrestoSparkNativeTaskExecutoryFactory
to shutdown the native process
Split E2E tests and run them in parallel. Also create a separate
run for Spark tests.

There are 5 jobs that are run in parallel with this change.
Please look at each job for test counts and failures.

Co-authored-by: Michael Shang <mikesh@fb.com>
Add broadcast read support for file based broadcast by adding new type of exchange source - BroadcastExchangeSource. BroadcastExchangeSource reads data from files specified in split location. Format of split location that this exchange source can handle: batch://<taskid>?broadcastInfo={fileInfos:[<fileInfo>]}
Add new CI image
Add -DFOLLY_HAVE_INT128_T=ON to centos setup script
Add known warnings to Linux
Advance Velox
Add support for aggregations over sorted inputs

Co-authored-by: Masha Basmanova <mbasmanova@fb.com>
Co-authored-by: Deepak Majeti <deepak.majeti@ibm.com>
Remove identity projection below a project node.
Add hive.allow-drop-table permission to the test java runner to
avoid access denied error when dropping test tables in
TestPrestoSparkNativeGeneralQueries#testDecimalRangeFilters test suite.
Re-enabled the TestPrestoSparkNativeGeneralQueries#testDecimalRangeFilters
suite as well.
Add ccache to speed up build.

Co-authored-by: Michael Shang <mikesh@fb.com>
Summary:
In the HTTPClient, callbacks are scheduled on an eventBase. HTTPClient is kept alive using a shared_ptr, but it contains a raw pointer to MemoryPool. This MemoryPool may be freed if Task is aborted earlier, but a callback is executed much later.
We see crashes related to this when the batch cluster is under heavy load.

So the fix here is to keep shared_ptr to MemoryPool isntead of a raw pointer

```
== NO RELEASE NOTE ==
```

Pull Request resolved: prestodb#19865

Reviewed By: xiaoxmeng

Differential Revision: D46674355

Pulled By: pranjalssh

fbshipit-source-id: 9b53deb6357ff87b8e1a992f3205d0ce9d79c05c
Join output has a restriction that output from left input should be before
output from right input. Fix the randomize null join key optimizer here to
keep this order.
As title, left side input should be before right side input in join output.
Map $internal$json_string_to_array/map/row_cast to cast(json_parse(x) as array/map/row).

Also, remove invalid mappings: row_constructor -> in, isnull -> in.
Task creation involves translating query plan into a set of operator pipelines.
ShuffleWrite operator creation used to include creating ShuffleWriter which
make take a long time (60s or longer) and cause create-or-update-task RPC to
timeout. Move ShuffleWriter creation into ShuffleWrite::addInput to avoid
timing out on create-or-update-task request.
This commit updates Drift version to 1.36 and Netty version
to 4.1.92.Final to add support for TLS 1.3.
@beinan beinan changed the base branch from twitter-master to master June 16, 2023 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.