Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty Print Tensors #257

Merged
merged 26 commits into from
Apr 7, 2023
Merged

Pretty Print Tensors #257

merged 26 commits into from
Apr 7, 2023

Conversation

agelas
Copy link
Contributor

@agelas agelas commented Mar 30, 2023

@nathanielsimard I'm new to Rust so I figured I'd give #138 a shot, Let me know if this is somewhere in the ballpark of what you were thinking. I only implemented a pretty print for 2D Tensor structs composed of Ints (or at least that's what I think I did 🙃), and I can generalize to other types later.

@agelas agelas marked this pull request as draft March 30, 2023 07:59
burn-tensor/src/tensor/api/base.rs Outdated Show resolved Hide resolved
burn-tensor/src/tensor/api/base.rs Outdated Show resolved Hide resolved
burn-tensor/src/tensor/api/base.rs Outdated Show resolved Hide resolved
Comment on lines 175 to 183
fn to_nested_vec(&self) -> Vec<Vec<B::IntElem>> {
let data = self.to_data();
let mut result = vec![vec![B::IntElem::default(); self.dims()[1]]; self.dims()[0]];
for (i, val) in data.value.iter().enumerate() {
let row = i / self.dims()[1];
let col = i % self.dims()[1];
result[row][col] = *val;
}
result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be made as a separate standalone function that can take the multi dimensional tensor and return formatted string string. This way you don't need to worry about nested vectors.

The challenging part is to convert the flat data into nested array. But I am sure it's doing. Here is an example output how numpy does:

>>> np.ones([15, 15,15, 15])
array([[[[1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],
         ...,
         [1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.]],

        [[1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],
         ...,
         [1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.]],

        [[1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],
         ...,
         [1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.]],

        ...,

        [[1., 1., 1., ..., 1., 1., 1.],
         [1., 1., 1., ..., 1., 1., 1.],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current strategy is to try and recursively format the tensor as a 2D vector for each dimension, indenting based on the dimensionality level. It's still a work in progress because I need to figure out how to properly slice up tensors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a recursive strategy is the only way to generalize it to any dimension, or at least the only strategy that I can think of 😅.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agelas @nathanielsimard

I came across this project serde-dim that looks interesting. Maybe there is a something you can learn from it: https://github.com/RReverser/serde-ndim/blob/main/src/ser.rs . What this project is trying to accomplish is basically convert one dimensional data array into multi dimensional array. Pretty what you're trying to accomplish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antimora Looks interesting, I'll give it a look. But your description seems almost the opposite of what this PR is for, ie we're going from multi-dimensional array -> "1-dimensional" string representation and not the other way around.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antimora Looks interesting, I'll give it a look. But your description seems almost the opposite of what this PR is for, ie we're going from multi-dimensional array -> "1-dimensional" string representation and not the other way around.

The tensor data is stored as one-dimensional vector and shape contains the dimensions so you know how to decompose into rows and columns. The link I shared basically is trying to take the serialized data (1d) and deserialize into n-dimensional array by recursion. By coincidence I came across this code and I thought there might be something to learn. I haven't dived deep to see if this direct helps you but in principle by its description it should. I hope it does not confuse you =)

@agelas agelas marked this pull request as ready for review April 3, 2023 05:59
Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look good! I think we should make the implementation generic over the kind, and with the other minor requested changes we could merge the PR pretty soon.

Comment on lines 43 to 45
pub fn size(&self) -> usize {
self.dims().iter().fold(1, |acc, &dim| acc * dim)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a similar method num_elements in the Shape struct, I would use it instead here, but I like the fold implementation, so we may update the num_elements function implementation to use fold instead of loops. size is a bit unclear as a function name, since size often return the shape in PyTorch, so we may update the name to num_elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I didn't mean to leave this in- I needed something like this for an earlier implementation I was trying out. We can remove it, rename it, or do whatever honestly 🤣

burn-tensor/src/tensor/api/base.rs Outdated Show resolved Hide resolved
burn-tensor/src/tests/stats/basic.rs Outdated Show resolved Hide resolved
@@ -212,6 +307,19 @@ pub trait BasicOps<B: Backend>: TensorKind<B> {
rhs: Self::Primitive<D>,
) -> Tensor<B, D, Bool>;
fn equal_elem<const D: usize>(lhs: Self::Primitive<D>, rhs: Self::Elem) -> Tensor<B, D, Bool>;
fn elem_type() -> &'static str {
if TypeId::of::<Self::Elem>() == TypeId::of::<f32>() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanielsimard Also here I assume I'm missing a bunch of other possibilities like f64 for float, or i8, i16, i128, etc. for ints. What else should I add?

Copy link
Member

@nathanielsimard nathanielsimard Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use directly core::any::type_name::<Seld::Elem>() and all types will be supported :), I would also rename the function to elem_type_name to be more clear.

Copy link
Contributor Author

@agelas agelas Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want the more specific type that comes from core::any::type_name::<Self::Elem>(), or should I group the possible returns and return int/float?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more specific type is needed, but we could also add the tensor kind:

  data: [...],
  kind: Float, // We could add a function `name` in the trait TensorKind.
  elem: f32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, just threw kind in as well.

@agelas
Copy link
Contributor Author

agelas commented Apr 6, 2023

@nathanielsimard I think this is pretty much done unless you can think of anything else. I just merged main so the branch is all up to date as well.

@nathanielsimard
Copy link
Member

@agelas I think the only missing part is fixing the CI (mostly no_std support by importing stuff from alloc) and remove the size function from the tensor API.

@agelas
Copy link
Contributor Author

agelas commented Apr 6, 2023

@nathanielsimard Ok should be good now, although locally I can't seem to build because the compiler is complaining about the "use of unstable library feature 'unzip_option': recently added" in burn-core/src/optim/simple/adaptor.rs:86:63. Not sure if that will trip up the CI as well.

@nathanielsimard
Copy link
Member

@nathanielsimard Ok should be good now, although locally I can't seem to build because the compiler is complaining about the "use of unstable library feature 'unzip_option': recently added" in burn-core/src/optim/simple/adaptor.rs:86:63. Not sure if that will trip up the CI as well.

@agelas I'm curious, what is your local Rust version? Maybe I used a newly stabilized function, and I should update the minimum required Rust version in the readme. The CI is always using the latest stable Rust version.

@agelas
Copy link
Contributor Author

agelas commented Apr 6, 2023

@nathanielsimard Oh I'm on 1.65.0, that might explain it.

@nathanielsimard
Copy link
Member

@agelas You can run clippy and cargo fmt with the lastest Rust to fix most problems.

cargo clippy --fix
cargo fmt --all

@agelas
Copy link
Contributor Author

agelas commented Apr 6, 2023

@nathanielsimard clippy and fmt didn't seem to change anything. I added a few more imports from alloc though hopefully that should fix it.

@agelas
Copy link
Contributor Author

agelas commented Apr 6, 2023

@nathanielsimard Ok we're getting a little bit closer 😆. The test-burn-ndarray one is strange. I've noticed that sometimes the backend comes back as ndarray, sometimes as tch, do you know what might be up with that?

@nathanielsimard
Copy link
Member

@nathanielsimard Ok we're getting a little bit closer laughing. The test-burn-ndarray one is strange. I've noticed that sometimes the backend comes back as ndarray, sometimes as tch, do you know what might be up with that?

I think, the tests in ndarray will have ndarray as backend and the ones in tch will have tch. But backends can be used as dev dependencies just to run the tests, so it might change. I would sugest to update the test to use the TestBackend name directly.

@agelas
Copy link
Contributor Author

agelas commented Apr 6, 2023

I think, the tests in ndarray will have ndarray as backend and the ones in tch will have tch. But backends can be used as dev dependencies just to run the tests, so it might change. I would sugest to update the test to use the TestBackend name directly.

Sorry, not quite sure what you mean by using the TestBackend name directly. Do you mean explicitly doing something like let TestBackend = burn_ndarray::NdArrayBackend<f32>?

@nathanielsimard
Copy link
Member

nathanielsimard commented Apr 6, 2023

Sorry, not quite sure what you mean by using the TestBackend name directly. Do you mean explicitly doing something like let TestBackend = burn_ndarray::NdArrayBackend<f32>?

No I mean in the test, instead of hardcoding the name of the backend in the expected string, maybe use a format!("... backend: {:?}...", TestBackend::name())

@agelas
Copy link
Contributor Author

agelas commented Apr 6, 2023

@nathanielsimard I'm just gonna remove the doctest lol.

@agelas
Copy link
Contributor Author

agelas commented Apr 6, 2023

@nathanielsimard alright this time I'm 99% sure it should pass

@nathanielsimard nathanielsimard merged commit d8f64ce into tracel-ai:main Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants