Skip to content

Conversation

@hexbabe
Copy link
Member

@hexbabe hexbabe commented Feb 15, 2024

From https://viam.atlassian.net/browse/RSDK-6530 :

Sean Yu wrote a deserialize_depth_map function in order to consume depth maps in cpp-land for the rgb-d-overlay module. It is best to make this a static Camera method in C++ SDK

@hexbabe hexbabe requested a review from a team as a code owner February 15, 2024 17:58
@hexbabe hexbabe requested review from njooma, purplenicole730 and stuqdog and removed request for a team February 15, 2024 17:58
Copy link
Member

@stuqdog stuqdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, glad to see this here! Generally looks good though I can't speak much to the actual camera logic. I do have a couple small suggestions/questions.

Comment on lines 86 to 87
/// @brief Decode image data of custom MIME type FORMAT_RAW_DEPTH into a standard
/// representation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add docstring comments here to explain the arguments, what is returned, and when this might throw an exception?

std::tuple<uint64_t, uint64_t, std::vector<uint16_t>> Camera::deserialize_depth_map(
const std::vector<unsigned char>& data) {
if (data.size() < 24) {
throw std::runtime_error("Data too short to contain valid depth information");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be throwing our internal Exception class rather than runtime_error.


/// @brief Decode image data of custom MIME type FORMAT_RAW_DEPTH into a standard
/// representation.
static std::tuple<uint64_t, uint64_t, std::vector<uint16_t>> deserialize_depth_map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(q) I'm not familiar with depth map logic so perhaps this is totally standard, but would it make sense to create a struct type to return instead of a tuple so we can make it clear which value is height, which width, and which the arr?


/// @brief Decode image data of custom MIME type FORMAT_RAW_DEPTH into a standard
/// representation.
static std::tuple<uint64_t, uint64_t, std::vector<uint16_t>> deserialize_depth_map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible/sensible to add tests for this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should. It will require writing an equivalent encode_depth_map function, but I think that's a good idea I will do it

}

if (data.size() < 24 + width * height * sizeof(uint16_t)) {
throw std::runtime_error("Data size does not match width and height");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above re Exception.

@hexbabe hexbabe changed the title RSDK-6530: Add deserialize_depth_map to C++ SDK RSDK-6530: Add decode_depth_map to C++ SDK Feb 16, 2024
@hexbabe
Copy link
Member Author

hexbabe commented Feb 16, 2024

Wait, does the run-tests action run unit tests?

@hexbabe hexbabe changed the title RSDK-6530: Add decode_depth_map to C++ SDK RSDK-6530: Add depth map encode decode methods to C++ SDK Feb 16, 2024
@hexbabe hexbabe force-pushed the RSDK-6530 branch 5 times, most recently from 93e6c57 to 7f70d7c Compare February 16, 2024 18:06
@stuqdog
Copy link
Member

stuqdog commented Feb 16, 2024

Wait, does the run-tests action run unit tests?

yep! it's a little funky though, if one test fails in a file then often it says they all failed. Lemme know if you want any help with debugging.

@hexbabe hexbabe force-pushed the RSDK-6530 branch 3 times, most recently from 6dddc8f to 3469494 Compare February 16, 2024 19:20
bhaney
bhaney previously requested changes Feb 22, 2024
Copy link
Member

@bhaney bhaney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Just some small changes for clarity

size_t offset = 0;
const uint64_t magic_number = read_uint64_big_endian(data, offset);
if (magic_number != MAGIC_NUMBER) {
throw Exception("Invalid magic number. The data may not be a depth map.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain more here? This would be an extremely frustrating error to get (but also a common one, if they tried to feed in an image that wasn't a viam depth map). Maybe say something like "Invalid header for a vnd.viam.dep encoded depth image. The data may be corrupted, or not a viam-encoded depth map."

for (size_t i = 0; i < width * height; ++i) {
const size_t data_index = HEADER_SIZE + i * sizeof(uint16_t);
const uint16_t depth_value = static_cast<uint16_t>(
data[data_index] << 8 | data[data_index + 1]); // Assemble from big endian into uint16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

};

/// @brief UTF-8 encoding of 'DEPTHMAP' used in the header of FORMAT_RAW_DEPTH bytes payload.
static const uint64_t MAGIC_NUMBER = 0x44455054484D4150ULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what the ULL at the end is? Seems like "extra" stuff - 0x44455054484D4150 is the complete number

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants belong in the .cpp file. Also you don't need to use SHOUTY constants in C++. Often in the C++SDK we prefix constants with k_ to indicate that they are constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the gcc manual:

ISO C99 supports data types for integers that are at least 64 bits wide ( . . . ) . To make an integer constant of type long long int, add the suffix LL to the integer. To make an integer constant of type unsigned long long int, add the suffix ULL to the integer.

It should ensure that the literal has the correct type and size @bhaney

struct depth_map {
uint64_t width; ///< Width of the depth map in pixels.
uint64_t height; ///< Height of the depth map in pixels.
std::vector<uint16_t> depth_values; ///< A flat vector of depth values.
Copy link
Member

@bhaney bhaney Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The depth info will almost always be converted into some 2D array eventually. Like you said, a lot of frameworks have methods of reading in a flat vector and turning into into a 2D array. The python SDK reshapes the data into a 2D list (though a numpy array would have been better IMO --- but they don't want to add numpy as a dependency).

Could totally just add a public get(x,y) method to this struct for a bare-bones convenience -- then again -- depending on whether a user is using it as an "image" or a "matrix" will make the accessor different. get(i, j) which treats the data as a matrix indexes on a row first. get(x, y) which treats the data as an image accesses on a column first. I would say that "image-like" access makes more sense, since this is a depth image.

Comment on lines 105 to 107
const uint64_t width = 2;
const uint64_t height = 2;
const std::vector<uint16_t> depth_values = {100, 200, 300, 400};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the test case non-symmetric, like a 2x3 image at least? if you accidentally swapped height and width, you wouldn't know from this test case.

@acmorrow
Copy link
Member

@hexbabe -

I think I had sort of misunderstood the point of this review. I expected it was creating a new type like raw_image that would actually be used in the Camera api (e.g. raw_image Camera::get_image(...)). But that isn't what is going on here: the depth_map type isn't returned or consumed by Camera, and the two new functions exist only to transform a byte buffer into a depth_map and vice versa.

So, what is the flow of control here? How does the user of the SDK end up with a byte buffer which happens to contain an encoded depth map, and know that it should be passed to Camera::decode_depth_map? Or is this just a "part 1" review and subsequent changes will add methods that traffic in depth_map to Camera?

Copy link
Member

@acmorrow acmorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the code looks good. I have some small comments and suggestions.

I do wonder whether 1) the depth_map stuff belongs on Camera at all given that the rest of the Camera API doesn't make use of it in any way (yet?), and 2) how exactly it is expected for clients or module authors to use these APIs.

// Appends a uint64_t value in big-endian format to a byte vector and updates the offset.
void append_uint64_big_endian(std::vector<unsigned char>& data, size_t& offset, uint64_t value) {
const uint64_t value_be = boost::endian::native_to_big(value);
if (data.size() < offset + 8) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of a lot of these explicit 8's by using sizeof

};

/// @brief UTF-8 encoding of 'DEPTHMAP' used in the header of FORMAT_RAW_DEPTH bytes payload.
static const uint64_t MAGIC_NUMBER = 0x44455054484D4150ULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants belong in the .cpp file. Also you don't need to use SHOUTY constants in C++. Often in the C++SDK we prefix constants with k_ to indicate that they are constant.

/// per depth value.
/// @throws Exception: if the depth data values do not correspond to height and width.
///
static std::vector<unsigned char> encode_depth_map(const Camera::depth_map& m);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take back my comment about it being part of the public API. My comment was based on my assumption that Camera methods were actually going to be trading in depth_maps, in which case I wouldn't have expected this part of the transformation pair to need to be visible to users. However, per my newer understanding about what is being done here, I agree that it needs to be public. I still have some questions though about how users are expected to know what is and isn't a buffer containing a depth map that they should be passing to Camera::decode_depth_map, or what they are expected to do with a buffer containing an encoded depth map after they call Camera::encode_depth_map.

/// data.
/// @throws Exception: if the data is misformatted e.g. doesn't contain valid depth information,
/// or if the data size does not match the expected size based on the width and height.
static Camera::depth_map decode_depth_map(const std::vector<unsigned char>& data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

}

std::vector<unsigned char> Camera::encode_depth_map(const Camera::depth_map& m) {
if (m.depth_values.size() != m.width * m.height) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the depth_map constructor enforce this invariant?

size_t offset = 0;

// Network data is stored in big-endian, while most host systems are little endian.
append_uint64_big_endian(data, offset, MAGIC_NUMBER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better if offset was passed by pointer here to make it clearer to a reader that it was going to be modified.

depth_map.width = width;
depth_map.height = height;
depth_map.depth_values = std::move(arr);
return depth_map;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll bet you can write this entire thing as just return {width, height, std::move(arr)};

/// depth_map holds the width and height of a depth map, along with a vector
/// of actual depth values. Each depth value is a 16-bit unsigned integer representing
/// the distance from the camera to a point in the scene.
struct depth_map {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are invariants for a valid depth_map, notably that depth_values.size() == width * height. That'd argue for making this a class so that the invariant can be enforced on construction and then preserved.

@hexbabe
Copy link
Member Author

hexbabe commented Feb 23, 2024

@hexbabe -

I think I had sort of misunderstood the point of this review. I expected it was creating a new type like raw_image that would actually be used in the Camera api (e.g. raw_image Camera::get_image(...)). But that isn't what is going on here: the depth_map type isn't returned or consumed by Camera, and the two new functions exist only to transform a byte buffer into a depth_map and vice versa.

So, what is the flow of control here? How does the user of the SDK end up with a byte buffer which happens to contain an encoded depth map, and know that it should be passed to Camera::decode_depth_map? Or is this just a "part 1" review and subsequent changes will add methods that traffic in depth_map to Camera?

The scope of this PR is to put decode and encode methods into the SDK as sort of util methods for depth map handling in modules and clients. For decoding, I wanted to mirror how they are handled in the Python SDK. In the Python SDK, we leave it to the client to call raw_img.bytes_to_depth_array() to decode the depth map. The goal here is to provide similar decode util for C++ SDK clients. For encoding, we would expect depth camera module creators to use the encode function to convert their raw bytes into the Viam depth mime type format.

Sorry for all the confusion. There is definitely a misalignment in what the methods in this PR should be doing/where they should be used. My mistake was thinking that this ticket would be as simple as copying over the function I wrote for the RGB-D overlay module into Camera to be used as util functions. If I were to redo this PR, I would've communicated the requirements and thought harder about how to integrate the methods into the exiting C++ APIs.

To amend this mistake, I think we should chat briefly to discuss where it makes sense to place these functions in a less haphazard way that is appropriate for each use case.

We should definitely add these methods though because for most of the modules I've worked on so far, I've had to write my own custom transcoding logic for Viam depth map mime return types.

@acmorrow
Copy link
Member

acmorrow commented Mar 6, 2024

@hexbabe and I met to discuss this. The plan is to move forward with this review. The depth_map type will be re-worked in terms of xtensor (either it will be an xtensor type alias, or it will wrap an xtensor). There will then be a subsequent review where the Camera interface is adjusted in order to integrate depth map information. This might look like having Camera::get_image return a variant over raw_image and depth_map, or adding a get_depth_map function that returns an Optional<depth_map>. Similar adjustments will need to be made to get_images to incorporate depth_maps.

@hexbabe hexbabe force-pushed the RSDK-6530 branch 4 times, most recently from 1366e6d to 16fba03 Compare March 8, 2024 15:26
… implementation to use xtensor aliased depth map type; Fixed import orders; Updated tests accordingly
Copy link
Member

@acmorrow acmorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, though I think there is a memory error that needs to be addressed.

Comment on lines 84 to 88
for (size_t j = 0; j < width; ++j) {
const uint16_t value = m(i, j);
const uint16_t value_be = boost::endian::native_to_big(value);
std::memcpy(&data[offset], &value_be, sizeof(uint16_t));
offset += sizeof(uint16_t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make append_uint64_big_endian a template on the value type, you could re-use it here:

template<typename T>
append_big_endian(std::vector<unsigned char>& data, size_t* offset, T value);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's so elegant it brings a tear to my eye

depth_values.push_back(depth_value);
}

return xt::adapt(depth_values, std::array<size_t, 2>{height, width});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to cause problems: I think xt::adapt is only going to view the memory in depth_values, and it will go out of scope when this function returns.

I think you will need another approach to construct an xtensor that owns the memory here. Let me know if you need me to look into how to do that best.

@hexbabe hexbabe force-pushed the RSDK-6530 branch 3 times, most recently from a77c318 to b84b545 Compare March 8, 2024 19:56
}

xt::xarray<uint16_t> m = xt::xarray<uint16_t>::from_shape({height, width});
std::copy(depth_values.begin(), depth_values.end(), m.begin());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked up a method to copy the memory in. Let me know if there's a better way to do it @acmorrow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I was actually just looking at that. So, what you have here is fine, though it does cost you a copy. If you wanted to avoid the copy though, I think that's totally doable.

The trick would be to not create a std::vector, but just to directly create the xt::xarray you want to return, of the appropriate size and shape, before the loop, and then populate it in the loop.

xt::xarray<uint16_t> m = xt::xarray<uint16_t>::from_shape({height, width});
for (size_t i = 0; i < width * height; ++i) {
    m(<ii>, <jj>) = read_big_endian<uint16_t>(data, &offset);

Now, that'd require a little hassle to compute the correct ii and jj w.r.t. height and width and i, which is sort of a hassle. However, xtensor lets you reshape and view. So I think you could do it something like this:

xt::xarray<uint16_t> m = xt::xarray<uint16_t>::from_shape({height, width});
auto m_linear_view = xt::flatten(m);
for (size_t i = 0; i < width * height; ++i) {
    m_linear_view[i] = read_big_endian<uint16_t>(data, &offset));
return m;

It may not be exactly flatten, you might need reshape_view? I'd need to spend a little time with the xtensor docs to be sure. But something from https://xtensor.readthedocs.io/en/latest/view.html.

I'd say spend no more than 15 minutes on it. If you can make it work, great. If not, put in a TODO and leave it as a copy.

Copy link
Member

@acmorrow acmorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if you want to keep the copy. If you elect to use reshape/view to build directly into the output tensor, please post an update with that so I can give it a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants