From 721f82c51911fb36dceef2bf1bdeb70f9200092c Mon Sep 17 00:00:00 2001 From: Gera Shegalov Date: Thu, 25 Aug 2022 14:34:24 -0700 Subject: [PATCH] Add libcudf and spark-rapids to Implementations Add links to libcudf and spark-rapids implementations of t-digest 1. libcudf provides a generic CUDA kernel 2. spark-rapids uses 1 to accelerate Spark SQL expressions approx_percentile and percentile_approx https://spark.apache.org/docs/latest/api/sql/#approx_percentile --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1d5906e..c2ca4f7 100644 --- a/README.md +++ b/README.md @@ -239,6 +239,7 @@ The t-digest algorithm has been ported to other languages: - C++: [CPP TDigest](https://github.com/gpichot/cpp-tdigest), [FB's Folly Implementation (high performance)](https://github.com/facebook/folly/blob/master/folly/stats/TDigest.h) - C++: [TDigest](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/tdigest.h ) as part of [Apache Arrow](https://arrow.apache.org/) + - CUDA C++: [tdigest.cu](https://github.com/rapidsai/cudf/blob/branch-22.10/cpp/src/quantiles/tdigest/tdigest.cu) as part of `libcudf` in [RAPIDS](https://rapids.ai/) powering the [`approx_percentile` and `percentile_approx`](https://github.com/NVIDIA/spark-rapids/blob/b35311f7c6950fd5d8f7f6ed66aeffa87c480850/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuApproximatePercentile.scala#L123-L130) expressions in Spark SQL with [RAPIDS Accelerator for Apache Spark](https://nvidia.github.io/spark-rapids/) - Rust: [t-digest](https://github.com/MnO2/t-digest) and its modified version in [Apache Arrow Datafusion](https://github.com/apache/arrow-datafusion/blob/ca952bd33402816dbb1550debb9b8cac3b13e8f2/datafusion-physical-expr/src/tdigest/mod.rs#L19-L28) - Scala: [TDigest.scala](https://github.com/stripe-archive/brushfire/blob/master/brushfire-training/src/main/scala/com/stripe/brushfire/TDigest.scala) - C: [tdigestc (w/ bindings to Go, Java, Python, JS via wasm)](https://github.com/ajwerner/tdigestc)