Feat/autotune int ops #1136

agelas · 2024-01-12T01:59:59Z

Pull Request Template

Checklist

WIP

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

#944

Changes

Added int_random() to int tensor ops, new autotuning calls for sum_dim and mean_dim.

Testing

Unit tests for int_random(), sum_dim() and mean_dim() on WgpuTensors with IntElements .

agelas · 2024-01-12T02:07:19Z

@louisfd Finally got the chance to start on #944. I'm going down the long route and trying to add support at large for random() on int tensors. When you say "add a call to autotune for sum_dim and mean_dim (int versions)", I'm assuming that means creating an impl<E: IntElement, const D: usize> SumDimAutotuneOperationSet<E, D> and associated AutotuneOperationSet impl (and similar story for mean_dim)?

codecov · 2024-01-12T02:16:23Z

Codecov Report

Attention: Patch coverage is 61.05991% with 169 lines in your changes are missing coverage. Please review.

Project coverage is 85.53%. Comparing base (cbf7550) to head (63455c4).

Files	Patch %	Lines
crates/burn-fusion/src/ops/int.rs	0.00%	36 Missing ⚠️
crates/burn-candle/src/ops/int_tensor.rs	0.00%	32 Missing ⚠️
crates/burn-wgpu/src/kernel/reduce/tune/sum_dim.rs	59.72%	29 Missing ⚠️
crates/burn-tch/src/ops/int_tensor.rs	0.00%	22 Missing ⚠️
crates/burn-ndarray/src/ops/int_tensor.rs	0.00%	19 Missing ⚠️
crates/burn-autodiff/src/ops/int_tensor.rs	0.00%	7 Missing ⚠️
...-wgpu/src/kernel/reduce/reduction_shared_memory.rs	50.00%	7 Missing ⚠️
...rates/burn-wgpu/src/kernel/reduce/tune/mean_dim.rs	91.66%	6 Missing ⚠️
crates/burn-fusion/src/stream/context.rs	0.00%	5 Missing ⚠️
crates/burn-fusion/src/stream/operation.rs	0.00%	3 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1136      +/-   ##
==========================================
- Coverage   85.69%   85.53%   -0.17%     
==========================================
  Files         586      586              
  Lines       65339    65769     +430     
==========================================
+ Hits        55993    56254     +261     
- Misses       9346     9515     +169

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

louisfd · 2024-01-12T14:02:04Z

@louisfd Finally got the chance to start on #944. I'm going down the long route and trying to add support at large for random() on int tensors. When you say "add a call to autotune for sum_dim and mean_dim (int versions)", I'm assuming that means creating an impl<E: IntElement, const D: usize> SumDimAutotuneOperationSet<E, D> and associated AutotuneOperationSet impl (and similar story for mean_dim)?

Hi @agelas, thanks for tackling this issue!
The only place that was problematic was these lines in burn-wgpu/src/kernel/reduce/tune/sum_dim.rs

fn autotunables(&self) -> Vec<Box<dyn AutotuneOperation>> {
    let random_bounds: (E, E) = ((-10.0).elem::<E>(), (10.0).elem::<E>());
    let input = random_like_uniform(&self.input, random_bounds.0, random_bounds.1);

because random_like_uniform only worked with floats even if E was an int. I'm not sure if you could just rewrite those lines to make it call your new int ops instead of the lower level random_like_uniform or if you'll have trouble determining if you need to call the float or int version. If that's the case then yes you can duplicate the AutotuneOperationSet, but it should keep the same key.

In the end you will only have to change sum_dim / mean_dim implementations in burn-wgpu/src/ops/int_ops.rs to something very similar to the float ones:

    fn sum_dim<const D: usize>(tensor: FloatTensor<Self, D>, dim: usize) -> FloatTensor<Self, D> {
        #[cfg(feature = "autotune")]
        {
            reduce::sum_dim_autotune(tensor, dim)
        }

        #[cfg(not(feature = "autotune"))]
        {
            let output = init_reduce_output(&tensor, dim);
            reduce::sum_dim(tensor, output, dim)
        }
    }

louisfd

There are some small problems remaining, but overall looks good!

burn-wgpu/src/kernel/reduce/tune/mean_dim.rs

louisfd · 2024-01-30T15:30:32Z

burn-wgpu/src/template/prng/uniform_int_inner_loop.wgsl

+    let random_in_range = (random_u32 % range) + low;
+
+    output[write_index] = random_in_range;
+}


CI fails during cast_float. Can we skip this function as we don't want a float?

@louisfd is this shader being invoked for the test that's failing? The problem on CI was a fusion test fusion::base::tests::aggregation::tests::test_should_mean_last_dim_int, that used to work at commit ae60d25 but then stopped working at dca5b26 for some reason. It works locally for me though.

Now it seems the CI is complaining about something else though.

@louisfd Ok I'm back to failing on test_should_mean_last_dim_int() in the aggregation tests for burn-tensor. For both that and test_should_sum_last_dim_int(), shouldn't we be calling the int_mean_dim and int_sum_dim() though? Still not sure why cast_float() is causing issues though especially since I never use it, and if these tests use the base tensor functions, neither should even be calling anything from this PR.

#[test] fn test_should_mean_last_dim_int() { let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]); let data_actual = tensor.mean_dim(1).to_data(); assert_eq!(data_actual, Data::from([[1], [4]])); } #[test] fn test_should_sum_last_dim_int() { let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]); let data_actual = tensor.sum_dim(1).to_data(); assert_eq!(data_actual, Data::from([[3], [12]])); }

nathanielsimard · 2024-02-02T14:45:04Z

The casting problem is quite obvious in cast_float.
This function is used in the prng shader.

fn cast_float(number: u32) -> {{ elem }} {
   return 2.3283064365387e-10 * {{ elem }}(number);
}

I think it should be :

fn cast_elem(number: u32) -> {{ elem }} {
   let tmp = 2.3283064365387e-10 * f32(number);
   return {{ elem }}(tmp);
}

It was assume that {{ elem }} was a float, but with this PR, it becomes an int i32. @agelas @louisfd

agelas · 2024-02-03T00:40:35Z

@nathanielsimard In burn-wgpu/src/kernel/prng/bernoulli.rs, it looks like there's already a cast_elem function, so maybe convert_elem or something similar?

@louisfd Also, for the test that's failing, is there a way to switch to use int_mean_dim(1) instead of mean_dim(1)? I think that and the next test in the aggregation.rs file (test_should_sim_last_dim_int()) now have int versions that should probably be used.

#[test]
    fn test_should_mean_last_dim_int() {
        let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]);

        let data_actual = tensor.mean_dim(1).to_data(); // <- use int_mean_dim(1) instead?

        assert_eq!(data_actual, Data::from([[1], [4]]));
    }

louisfd · 2024-02-07T17:50:48Z

@nathanielsimard In burn-wgpu/src/kernel/prng/bernoulli.rs, it looks like there's already a cast_elem function, so maybe convert_elem or something similar?

@louisfd Also, for the test that's failing, is there a way to switch to use int_mean_dim(1) instead of mean_dim(1)? I think that and the next test in the aggregation.rs file (test_should_sim_last_dim_int()) now have int versions that should probably be used.
#[test]
    fn test_should_mean_last_dim_int() {
        let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]);

        let data_actual = tensor.mean_dim(1).to_data(); // <- use int_mean_dim(1) instead?

        assert_eq!(data_actual, Data::from([[1], [4]]));
    }

Hi @agelas Sorry for the late reply.

Regarding the cast functions, since it's getting a bit confusing I think we should be more explicit with the names:
cast_X_to_Y with X and Y changed to bool, float or elem when it depends.

For your second question, maybe you're confusing tensor methods and backend operations? Because it's an int tensor the mean_dim method will call int_mean_dim underneath.

impl<B: Backend> Numeric<B> for Int {
    // ...

    fn mean_dim<const D: usize>(tensor: Self::Primitive<D>, dim: usize) -> Self::Primitive<D> {
        B::int_mean_dim(tensor, dim)
    }

    // ...
}

For the CI error, it seems to be the modulo. Are you positive that both random_u32 and range are the same type?

  48 │     let random_in_range = (random_u32 % range) + low;
     │                            ^^^^^^^^^^^^^^^^^^ naga::Expression [93]

agelas · 2024-02-25T07:30:02Z

Hi @antimora @louisfd , sorry about the delay, life got in the way.

@louisfd w/regards to the naming issue, I changed cast_float to cast_u32_to_float like you suggested. For the CI error, it seemed to be with the KernelSettings, but that's resolved now.

The CI is failing on codecov/path now at 63455c4 (the latest commit), I'm not sure what to do here since a lot of the code here gets used via a somewhat complicated mixture of macros and regular rust,

antimora · 2024-02-25T17:13:32Z

@agelas, thank you for the update. Do not worry, the life is a priority.

Regarding the coverage - it's not a hard requirement to hit 80%. We can override manually.

louisfd

LGTM
Many thanks 😄

agelas added 9 commits January 11, 2024 16:48

Add int_random to int tensor ops

5035f70

Int random for tch backend

b810300

Int random for burn-fusion

c5ba443

int random for autodiff

fdc7886

Int random for candle backend

db752a1

Int random for ndarray backend

6419d26

Int random for wgpu backend

6c9bf8c

Resolve import conflict and marge main

21b9a82

Merge imports

1c961ef

agelas added 18 commits January 20, 2024 00:57

Typo

82366b6

Shader file for int uniform distribution

a810d06

Create AutotuneOperationSet and public int_sum_dim_autotune

a553aca

Adjust bounds to 0..10

b5a2d4c

Create uniform_int_kernel, unit tests, use new kernel

3b8212f

Reduction kernels for regular and shared memory sum_dim int operations

26c0567

Macro that accomadates wgpu IntElement

f86199e

Add autotuning to int_mean_dim

3ec81e4

Use correct macro for Int autotuning

fb17541

Add int_mean_dim_shared_memory

6f48be3

Add int_mean_dim and unit test

b801471

Create autotunables for mean_dim

c5b1279

Run fmt

59c2a59

Remove comment

e075250

Resolve merge conflicts from main

841c354

Finish resolving merge conflict, fix doc

ae60d25

Make the element trait bound a parameter to reduce_tune_ops macro

dca5b26

Update book

ccd6471

louisfd requested changes Jan 30, 2024

View reviewed changes

agelas added 4 commits January 30, 2024 13:05

Use correct int autotune for mean dim

227c4d4

Merge branch 'main' into feat/autotune-int-bool-ops

4ff98dd

Fix typo- not sure how this passed earlier

b151e34

Resolve syntax issues from merge

d2f4522

antimora added the enhancement Enhance existing features label Jan 31, 2024

Resolve minor merge conflict

5675720

Fix cast_float

798ddc8

agelas changed the title ~~Feat/autotune int bool ops~~ Feat/autotune int ops Feb 2, 2024

antimora added the stale The issue or pr has been open for too long label Feb 24, 2024

agelas added 12 commits February 24, 2024 12:58

Saving here

036f5ba

Resolve merge conflicts

ffa2a55

Continue fixing merge conflicts, all tests pass locally

8f16098

Run fmt

53dd521

Change cast_float to cast_u32_to_float

726b8de

Make uniform_int_inner_loop safer

388a790

Be even more explicit about u32 casts

4fd8aa4

Skip an intermediate step and cast directly to u32

051285d

Replace JitElement + Element with IntElement

54c2c6e

Run fmt

84d88ff

This should fix the CI

42f8948

This time for sure

63455c4

antimora removed the stale The issue or pr has been open for too long label Feb 25, 2024

louisfd approved these changes Feb 26, 2024

View reviewed changes

louisfd merged commit bb5e6fa into tracel-ai:main Feb 26, 2024
13 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/autotune int ops #1136

Feat/autotune int ops #1136

agelas commented Jan 12, 2024 •

edited

agelas commented Jan 12, 2024

codecov bot commented Jan 12, 2024 •

edited

louisfd commented Jan 12, 2024

louisfd left a comment

louisfd Jan 30, 2024

agelas Jan 30, 2024

agelas Feb 1, 2024

nathanielsimard commented Feb 2, 2024

agelas commented Feb 3, 2024

louisfd commented Feb 7, 2024 •

edited

agelas commented Feb 25, 2024

antimora commented Feb 25, 2024

louisfd left a comment

Feat/autotune int ops #1136

Feat/autotune int ops #1136

Conversation

agelas commented Jan 12, 2024 • edited

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

agelas commented Jan 12, 2024

codecov bot commented Jan 12, 2024 • edited

Codecov Report

louisfd commented Jan 12, 2024

louisfd left a comment

Choose a reason for hiding this comment

louisfd Jan 30, 2024

Choose a reason for hiding this comment

agelas Jan 30, 2024

Choose a reason for hiding this comment

agelas Feb 1, 2024

Choose a reason for hiding this comment

nathanielsimard commented Feb 2, 2024

agelas commented Feb 3, 2024

louisfd commented Feb 7, 2024 • edited

agelas commented Feb 25, 2024

antimora commented Feb 25, 2024

louisfd left a comment

Choose a reason for hiding this comment

agelas commented Jan 12, 2024 •

edited

codecov bot commented Jan 12, 2024 •

edited

louisfd commented Feb 7, 2024 •

edited