Add cumulative sum tensor operation #1722

allenqm · 2024-05-03T21:34:49Z

Pull Request Template

Starting a draft PR to align on a few things with maintainers as I dive into this.

Context: Per this convo, I wanted to add a cumulative product operation to burn.

My plan is to start with a cumulative sum operation. Then cumulative product can be developed using cumulative sum, log, and exp.

@nathanielsimard, Items to align on upfront:

The name of the function should be cumsum_dim. cumsum aligns with the pytorch api. In burn, operations that take an explicit dim argument seem to have a _dim suffix. Alternatively we could remove the suffix.
The function should be implemented for Float and Int tensorkinds, but not bool.
Backends: While the implementations for tch, candle, and ndarray seem straightforward, I have questions about jit. For jit, I cannot find a existing WGSL implementation for cumulative sum. Is the right approach in this situation to create a WGSL compute shader for it? Cumulative sum is probably hard to do well GPUs given the dependencies between elements, but I'm open to trying. I'm new to WGSL.

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Provide links to relevant issues and dependent PRs.

Changes

Summarize the problem being addressed and your solution.

Testing

Describe how these changes have been tested.

louisfd · 2024-05-06T12:16:23Z

Hi @allenqm
Although you tagged Nathaniel, I think I can answer in his stead:

It's great that you made a default implementation for cumprod using log and exp 👍
For the naming, I think we can leave the _dim out because there cannot be a version for all dims at once. That's what we did for instance with sort, which acts on a dimension but for which no "global sort" exists.
Indeed, it must be implemented for int and float only, I saw you put it in numeric, which is the way to go 👍
For JIT, it is going to be a pain at the moment. There is no WGSL code anymore, the WGSL is always auto-generated from the intermediate JIT representation (kernels using the gpu! macro). It's honestly a pain to work with, it's not designed for new contributors to learn. I'm working on a language to rewrite them in an accessible way, see CubeCL: Compute Language Extension in Rust for Multi-target GPU kernels #1665, but it's not ready.
For the GPU algorithm, you're right that the dependancy between elements will make it difficult to have an efficient kernel. The straightforward way would be to spawn one thread for the whole dim to sum, and this thread fills all the output spots while accumulating the inputs in a local sum variable. But for large dim to sum it can be slow. Not sure if there's better solutions, I haven't done any research.

I'm willing to write that kernel in the JIT intermediate representation if you want, so the operation becomes available soon; then we can optimize it later and with the upcoming language.

allenqm · 2024-05-06T20:28:13Z

@louisfd Thanks so much for the guidance.

I will remove _dim suffix.

Thanks for offering to step in and write the kernel in the JIT intermediate representation. I'll take you up on that.

I'm going to try and get the tch, candle, ndarray, and autodiff implementations done by EoD tomorrow.

Just to be clear: I haven't written anything specific for cumprod yet. I was proposing that if we implement cumsum, then cumprod will be more straightforward as it could be described without new backend implementations (with the exception of autodiff), using the existing implementations of cumsum, exp, and log. Let me know if my assessment here seems off.

allenqm · 2024-05-07T22:19:08Z

crates/burn-ndarray/src/ops/tensor.rs

+        tensor: NdArrayTensor<E, D>,
+        dim: usize,
+    ) -> NdArrayTensor<E, D> {
+        let mut array = tensor.array.clone().into_owned();


I believe the underlying array struct of tensor needs to be cloned, since NdArray's method for accumulating elements along an axis modifies an array's data inplace. Referring to this method

Well float_cumsum takes ownership of the tensor, so I don't think the clone is required here.

allenqm · 2024-05-07T22:27:06Z

tch, candle, ndarray, autodiff + tests, and tensor tests have been added. Going to work on the onnx section of the contributor book next.

no action needed, just fyi @louisfd

github-actions · 2024-06-08T12:07:35Z

This PR has been marked as stale because it has not been updated for over a month

allenqm · 2024-06-25T21:44:58Z

Sorry for not flipping this to "Ready for Review" @louisfd . I think I've got the required onnx files in place. Can you take a look?

laggui

Just some minor comments, but overall great job 👏

laggui · 2024-06-26T12:15:35Z

crates/burn-import/src/burn/node/binary.rs

+    FloatCumsum,
+    IntCumsum,


CumSum is a single operator, the same for int and float.

So we should also only have one node type, BinaryType::Cumsum. See for example BinaryType::Sub, which does split the int and float tests for the generated onnx files but it's still a single operation/node.

laggui · 2024-06-26T12:23:27Z

crates/burn-tensor/src/tensor/ops/int_tensor.rs

@@ -770,6 +770,18 @@ pub trait IntTensorOps<B: Backend> {
    /// The sum of all elements in the tensor along the dimension.
    fn int_sum_dim<const D: usize>(tensor: IntTensor<B, D>, dim: usize) -> IntTensor<B, D>;

+    /// Cumulative Sum of all elements in a tensor along a dimension.


Let's keep the capitalization at "Cumulative sum"

laggui · 2024-06-26T12:23:37Z

crates/burn-tensor/src/tensor/ops/tensor.rs

@@ -842,6 +842,19 @@ pub trait FloatTensorOps<B: Backend> {
    /// A tensor with the sum of all elements in `tensor` along `dim`.
    fn float_sum_dim<const D: usize>(tensor: FloatTensor<B, D>, dim: usize) -> FloatTensor<B, D>;

+    /// Cumulative Sum of all elements in a tensor along a dimension.


Same thing regarding capitalization

laggui · 2024-06-26T12:24:48Z

crates/burn-tensor/src/tensor/api/check.rs

+    /// Checks running dimension such as cumulative sum
+    pub(crate) fn running_dim<const D: usize>(ops: &str, dim: usize) -> Self {
+        let mut check = Self::Ok;
+
+        if dim > D {
+            check = check.register(
+                ops,
+                TensorError::new(format!(
+                    "Can't perform a running calculation on a tensor with ({D}) dimensions on axis ({dim})"
+                )),
+            );
+        }
+
+        check
+    }
+


You could use the existing TensorCheck::dim_ops instead

laggui · 2024-06-26T12:26:14Z

crates/burn-import/src/onnx/to_burn.rs

+        match &lhs {
+            Type::Tensor(x) => match x.kind {
+                TensorKind::Int => BinaryNode::int_cumsum(lhs, rhs, output),
+                TensorKind::Float => BinaryNode::float_cumsum(lhs, rhs, output),
+                _ => panic!("cumsum function requires LHS to be int or float type"),
+            },
+            _ => panic!("cumsum function only supports LHS tensor type"),
+        }
+    }


By making cumsum a single node it should simplify this block

laggui · 2024-06-26T12:34:43Z

crates/burn-ndarray/src/ops/tensor.rs

+        tensor: NdArrayTensor<E, D>,
+        dim: usize,
+    ) -> NdArrayTensor<E, D> {
+        let mut array = tensor.array.clone().into_owned();


Well float_cumsum takes ownership of the tensor, so I don't think the clone is required here.

laggui · 2024-06-26T12:36:47Z

crates/burn-ndarray/src/ops/int_tensor.rs

+        tensor: NdArrayTensor<i64, D>,
+        dim: usize,
+    ) -> NdArrayTensor<i64, D> {
+        let mut array = tensor.array.clone().into_owned();


See comment for float_cumsum

Begin draft of cumulative sum: interfaces

b7188aa

antimora requested review from louisfd, laggui, nathanielsimard and ashdtu May 4, 2024 20:06

allenqm added 6 commits May 7, 2024 17:05

format

c51d7be

add torch implementation

729778e

add candle implementation

ed7681e

add ndarray implementation

9175f92

add autodiff implementation

3a477a2

add tensor tests

46cc2c8

allenqm commented May 7, 2024

View reviewed changes

remove int overflow test

91c27b4

allenqm added 3 commits May 7, 2024 19:29

add cumsum source to burn-import. tests/ in subsequent commits.

fdc5c95

add burn import tests

8b9b39e

rename cumsum_dim to cumsum

dab2905

github-actions bot added the stale The issue or pr has been open for too long label Jun 8, 2024

allenqm marked this pull request as ready for review June 25, 2024 21:45

github-actions bot removed the stale The issue or pr has been open for too long label Jun 26, 2024

laggui requested changes Jun 26, 2024

View reviewed changes

antimora requested review from laggui and removed request for ashdtu June 27, 2024 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cumulative sum tensor operation #1722

Add cumulative sum tensor operation #1722

allenqm commented May 3, 2024 •

edited

Loading

louisfd commented May 6, 2024

allenqm commented May 6, 2024

allenqm May 7, 2024

laggui Jun 26, 2024

allenqm commented May 7, 2024

github-actions bot commented Jun 8, 2024

allenqm commented Jun 25, 2024

laggui left a comment

laggui Jun 26, 2024

laggui Jun 26, 2024

laggui Jun 26, 2024

laggui Jun 26, 2024

laggui Jun 26, 2024

laggui Jun 26, 2024

laggui Jun 26, 2024

Add cumulative sum tensor operation #1722

Are you sure you want to change the base?

Add cumulative sum tensor operation #1722

Conversation

allenqm commented May 3, 2024 • edited Loading

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

louisfd commented May 6, 2024

allenqm commented May 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allenqm commented May 7, 2024

github-actions bot commented Jun 8, 2024

allenqm commented Jun 25, 2024

laggui left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allenqm commented May 3, 2024 •

edited

Loading