Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/autotune int ops #1136

Merged
merged 50 commits into from Feb 26, 2024
Merged

Conversation

agelas
Copy link
Contributor

@agelas agelas commented Jan 12, 2024

Pull Request Template

Checklist

WIP

  • Confirmed that run-checks all script has been executed.
  • Made sure the book is up to date with changes in this PR.

Related Issues/PRs

#944

Changes

Added int_random() to int tensor ops, new autotuning calls for sum_dim and mean_dim.

Testing

Unit tests for int_random(), sum_dim() and mean_dim() on WgpuTensors with IntElements .

@agelas
Copy link
Contributor Author

agelas commented Jan 12, 2024

@louisfd Finally got the chance to start on #944. I'm going down the long route and trying to add support at large for random() on int tensors. When you say "add a call to autotune for sum_dim and mean_dim (int versions)", I'm assuming that means creating an impl<E: IntElement, const D: usize> SumDimAutotuneOperationSet<E, D> and associated AutotuneOperationSet impl (and similar story for mean_dim)?

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

Attention: Patch coverage is 61.05991% with 169 lines in your changes are missing coverage. Please review.

Project coverage is 85.53%. Comparing base (cbf7550) to head (63455c4).

Files Patch % Lines
crates/burn-fusion/src/ops/int.rs 0.00% 36 Missing ⚠️
crates/burn-candle/src/ops/int_tensor.rs 0.00% 32 Missing ⚠️
crates/burn-wgpu/src/kernel/reduce/tune/sum_dim.rs 59.72% 29 Missing ⚠️
crates/burn-tch/src/ops/int_tensor.rs 0.00% 22 Missing ⚠️
crates/burn-ndarray/src/ops/int_tensor.rs 0.00% 19 Missing ⚠️
crates/burn-autodiff/src/ops/int_tensor.rs 0.00% 7 Missing ⚠️
...-wgpu/src/kernel/reduce/reduction_shared_memory.rs 50.00% 7 Missing ⚠️
...rates/burn-wgpu/src/kernel/reduce/tune/mean_dim.rs 91.66% 6 Missing ⚠️
crates/burn-fusion/src/stream/context.rs 0.00% 5 Missing ⚠️
crates/burn-fusion/src/stream/operation.rs 0.00% 3 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1136      +/-   ##
==========================================
- Coverage   85.69%   85.53%   -0.17%     
==========================================
  Files         586      586              
  Lines       65339    65769     +430     
==========================================
+ Hits        55993    56254     +261     
- Misses       9346     9515     +169     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@louisfd
Copy link
Member

louisfd commented Jan 12, 2024

@louisfd Finally got the chance to start on #944. I'm going down the long route and trying to add support at large for random() on int tensors. When you say "add a call to autotune for sum_dim and mean_dim (int versions)", I'm assuming that means creating an impl<E: IntElement, const D: usize> SumDimAutotuneOperationSet<E, D> and associated AutotuneOperationSet impl (and similar story for mean_dim)?

Hi @agelas, thanks for tackling this issue!
The only place that was problematic was these lines in burn-wgpu/src/kernel/reduce/tune/sum_dim.rs

fn autotunables(&self) -> Vec<Box<dyn AutotuneOperation>> {
    let random_bounds: (E, E) = ((-10.0).elem::<E>(), (10.0).elem::<E>());
    let input = random_like_uniform(&self.input, random_bounds.0, random_bounds.1);

because random_like_uniform only worked with floats even if E was an int. I'm not sure if you could just rewrite those lines to make it call your new int ops instead of the lower level random_like_uniform or if you'll have trouble determining if you need to call the float or int version. If that's the case then yes you can duplicate the AutotuneOperationSet, but it should keep the same key.

In the end you will only have to change sum_dim / mean_dim implementations in burn-wgpu/src/ops/int_ops.rs to something very similar to the float ones:

    fn sum_dim<const D: usize>(tensor: FloatTensor<Self, D>, dim: usize) -> FloatTensor<Self, D> {
        #[cfg(feature = "autotune")]
        {
            reduce::sum_dim_autotune(tensor, dim)
        }

        #[cfg(not(feature = "autotune"))]
        {
            let output = init_reduce_output(&tensor, dim);
            reduce::sum_dim(tensor, output, dim)
        }
    }

Copy link
Member

@louisfd louisfd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some small problems remaining, but overall looks good!

burn-wgpu/src/kernel/reduce/tune/mean_dim.rs Outdated Show resolved Hide resolved
burn-wgpu/src/kernel/reduce/tune/mean_dim.rs Outdated Show resolved Hide resolved
let random_in_range = (random_u32 % range) + low;

output[write_index] = random_in_range;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI fails during cast_float. Can we skip this function as we don't want a float?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@louisfd is this shader being invoked for the test that's failing? The problem on CI was a fusion test fusion::base::tests::aggregation::tests::test_should_mean_last_dim_int, that used to work at commit ae60d25 but then stopped working at dca5b26 for some reason. It works locally for me though.

Now it seems the CI is complaining about something else though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@louisfd Ok I'm back to failing on test_should_mean_last_dim_int() in the aggregation tests for burn-tensor. For both that and test_should_sum_last_dim_int(), shouldn't we be calling the int_mean_dim and int_sum_dim() though? Still not sure why cast_float() is causing issues though especially since I never use it, and if these tests use the base tensor functions, neither should even be calling anything from this PR.

#[test]
    fn test_should_mean_last_dim_int() {
        let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]);

        let data_actual = tensor.mean_dim(1).to_data();

        assert_eq!(data_actual, Data::from([[1], [4]]));
    }

    #[test]
    fn test_should_sum_last_dim_int() {
        let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]);

        let data_actual = tensor.sum_dim(1).to_data();

        assert_eq!(data_actual, Data::from([[3], [12]]));
    }

@antimora antimora added the enhancement Enhance existing features label Jan 31, 2024
@nathanielsimard
Copy link
Member

The casting problem is quite obvious in cast_float.
This function is used in the prng shader.

fn cast_float(number: u32) -> {{ elem }} {
   return 2.3283064365387e-10 * {{ elem }}(number);
}

I think it should be :

fn cast_elem(number: u32) -> {{ elem }} {
   let tmp = 2.3283064365387e-10 * f32(number);
   return {{ elem }}(tmp);
}

It was assume that {{ elem }} was a float, but with this PR, it becomes an int i32. @agelas @louisfd

@agelas agelas changed the title Feat/autotune int bool ops Feat/autotune int ops Feb 2, 2024
@agelas
Copy link
Contributor Author

agelas commented Feb 3, 2024

@nathanielsimard In burn-wgpu/src/kernel/prng/bernoulli.rs, it looks like there's already a cast_elem function, so maybe convert_elem or something similar?

@louisfd Also, for the test that's failing, is there a way to switch to use int_mean_dim(1) instead of mean_dim(1)? I think that and the next test in the aggregation.rs file (test_should_sim_last_dim_int()) now have int versions that should probably be used.

#[test]
    fn test_should_mean_last_dim_int() {
        let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]);

        let data_actual = tensor.mean_dim(1).to_data(); // <- use int_mean_dim(1) instead?

        assert_eq!(data_actual, Data::from([[1], [4]]));
    }

@louisfd
Copy link
Member

louisfd commented Feb 7, 2024

@nathanielsimard In burn-wgpu/src/kernel/prng/bernoulli.rs, it looks like there's already a cast_elem function, so maybe convert_elem or something similar?

@louisfd Also, for the test that's failing, is there a way to switch to use int_mean_dim(1) instead of mean_dim(1)? I think that and the next test in the aggregation.rs file (test_should_sim_last_dim_int()) now have int versions that should probably be used.

#[test]
    fn test_should_mean_last_dim_int() {
        let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]);

        let data_actual = tensor.mean_dim(1).to_data(); // <- use int_mean_dim(1) instead?

        assert_eq!(data_actual, Data::from([[1], [4]]));
    }

Hi @agelas Sorry for the late reply.

Regarding the cast functions, since it's getting a bit confusing I think we should be more explicit with the names:
cast_X_to_Y with X and Y changed to bool, float or elem when it depends.

For your second question, maybe you're confusing tensor methods and backend operations? Because it's an int tensor the mean_dim method will call int_mean_dim underneath.

impl<B: Backend> Numeric<B> for Int {
    // ...

    fn mean_dim<const D: usize>(tensor: Self::Primitive<D>, dim: usize) -> Self::Primitive<D> {
        B::int_mean_dim(tensor, dim)
    }

    // ...
}

For the CI error, it seems to be the modulo. Are you positive that both random_u32 and range are the same type?

  48 │     let random_in_range = (random_u32 % range) + low;
     │                            ^^^^^^^^^^^^^^^^^^ naga::Expression [93]

@antimora antimora added the stale The issue or pr has been open for too long label Feb 24, 2024
@agelas
Copy link
Contributor Author

agelas commented Feb 25, 2024

Hi @antimora @louisfd , sorry about the delay, life got in the way.

@louisfd w/regards to the naming issue, I changed cast_float to cast_u32_to_float like you suggested. For the CI error, it seemed to be with the KernelSettings, but that's resolved now.

The CI is failing on codecov/path now at 63455c4 (the latest commit), I'm not sure what to do here since a lot of the code here gets used via a somewhat complicated mixture of macros and regular rust,

@antimora
Copy link
Collaborator

@agelas, thank you for the update. Do not worry, the life is a priority.

Regarding the coverage - it's not a hard requirement to hit 80%. We can override manually.

@antimora antimora removed the stale The issue or pr has been open for too long label Feb 25, 2024
Copy link
Member

@louisfd louisfd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Many thanks 😄

@louisfd louisfd merged commit bb5e6fa into tracel-ai:main Feb 26, 2024
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhance existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants