-
Notifications
You must be signed in to change notification settings - Fork 26
RSDK-6530: Add depth map encode decode methods to C++ SDK #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
stuqdog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, glad to see this here! Generally looks good though I can't speak much to the actual camera logic. I do have a couple small suggestions/questions.
| /// @brief Decode image data of custom MIME type FORMAT_RAW_DEPTH into a standard | ||
| /// representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add docstring comments here to explain the arguments, what is returned, and when this might throw an exception?
| std::tuple<uint64_t, uint64_t, std::vector<uint16_t>> Camera::deserialize_depth_map( | ||
| const std::vector<unsigned char>& data) { | ||
| if (data.size() < 24) { | ||
| throw std::runtime_error("Data too short to contain valid depth information"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be throwing our internal Exception class rather than runtime_error.
|
|
||
| /// @brief Decode image data of custom MIME type FORMAT_RAW_DEPTH into a standard | ||
| /// representation. | ||
| static std::tuple<uint64_t, uint64_t, std::vector<uint16_t>> deserialize_depth_map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(q) I'm not familiar with depth map logic so perhaps this is totally standard, but would it make sense to create a struct type to return instead of a tuple so we can make it clear which value is height, which width, and which the arr?
|
|
||
| /// @brief Decode image data of custom MIME type FORMAT_RAW_DEPTH into a standard | ||
| /// representation. | ||
| static std::tuple<uint64_t, uint64_t, std::vector<uint16_t>> deserialize_depth_map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible/sensible to add tests for this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should. It will require writing an equivalent encode_depth_map function, but I think that's a good idea I will do it
| } | ||
|
|
||
| if (data.size() < 24 + width * height * sizeof(uint16_t)) { | ||
| throw std::runtime_error("Data size does not match width and height"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above re Exception.
deserialize_depth_map to C++ SDKdecode_depth_map to C++ SDK
…p to remote to run tests
|
Wait, does the |
decode_depth_map to C++ SDK93e6c57 to
7f70d7c
Compare
yep! it's a little funky though, if one test fails in a file then often it says they all failed. Lemme know if you want any help with debugging. |
6dddc8f to
3469494
Compare
bhaney
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Just some small changes for clarity
| size_t offset = 0; | ||
| const uint64_t magic_number = read_uint64_big_endian(data, offset); | ||
| if (magic_number != MAGIC_NUMBER) { | ||
| throw Exception("Invalid magic number. The data may not be a depth map."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain more here? This would be an extremely frustrating error to get (but also a common one, if they tried to feed in an image that wasn't a viam depth map). Maybe say something like "Invalid header for a vnd.viam.dep encoded depth image. The data may be corrupted, or not a viam-encoded depth map."
| for (size_t i = 0; i < width * height; ++i) { | ||
| const size_t data_index = HEADER_SIZE + i * sizeof(uint16_t); | ||
| const uint16_t depth_value = static_cast<uint16_t>( | ||
| data[data_index] << 8 | data[data_index + 1]); // Assemble from big endian into uint16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
| }; | ||
|
|
||
| /// @brief UTF-8 encoding of 'DEPTHMAP' used in the header of FORMAT_RAW_DEPTH bytes payload. | ||
| static const uint64_t MAGIC_NUMBER = 0x44455054484D4150ULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what the ULL at the end is? Seems like "extra" stuff - 0x44455054484D4150 is the complete number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These constants belong in the .cpp file. Also you don't need to use SHOUTY constants in C++. Often in the C++SDK we prefix constants with k_ to indicate that they are constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the gcc manual:
ISO C99 supports data types for integers that are at least 64 bits wide ( . . . ) . To make an integer constant of type long long int, add the suffix LL to the integer. To make an integer constant of type unsigned long long int, add the suffix ULL to the integer.
It should ensure that the literal has the correct type and size @bhaney
| struct depth_map { | ||
| uint64_t width; ///< Width of the depth map in pixels. | ||
| uint64_t height; ///< Height of the depth map in pixels. | ||
| std::vector<uint16_t> depth_values; ///< A flat vector of depth values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The depth info will almost always be converted into some 2D array eventually. Like you said, a lot of frameworks have methods of reading in a flat vector and turning into into a 2D array. The python SDK reshapes the data into a 2D list (though a numpy array would have been better IMO --- but they don't want to add numpy as a dependency).
Could totally just add a public get(x,y) method to this struct for a bare-bones convenience -- then again -- depending on whether a user is using it as an "image" or a "matrix" will make the accessor different. get(i, j) which treats the data as a matrix indexes on a row first. get(x, y) which treats the data as an image accesses on a column first. I would say that "image-like" access makes more sense, since this is a depth image.
src/viam/sdk/tests/test_camera.cpp
Outdated
| const uint64_t width = 2; | ||
| const uint64_t height = 2; | ||
| const std::vector<uint16_t> depth_values = {100, 200, 300, 400}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the test case non-symmetric, like a 2x3 image at least? if you accidentally swapped height and width, you wouldn't know from this test case.
|
@hexbabe - I think I had sort of misunderstood the point of this review. I expected it was creating a new type like So, what is the flow of control here? How does the user of the SDK end up with a byte buffer which happens to contain an encoded depth map, and know that it should be passed to |
acmorrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the code looks good. I have some small comments and suggestions.
I do wonder whether 1) the depth_map stuff belongs on Camera at all given that the rest of the Camera API doesn't make use of it in any way (yet?), and 2) how exactly it is expected for clients or module authors to use these APIs.
| // Appends a uint64_t value in big-endian format to a byte vector and updates the offset. | ||
| void append_uint64_big_endian(std::vector<unsigned char>& data, size_t& offset, uint64_t value) { | ||
| const uint64_t value_be = boost::endian::native_to_big(value); | ||
| if (data.size() < offset + 8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of a lot of these explicit 8's by using sizeof
| }; | ||
|
|
||
| /// @brief UTF-8 encoding of 'DEPTHMAP' used in the header of FORMAT_RAW_DEPTH bytes payload. | ||
| static const uint64_t MAGIC_NUMBER = 0x44455054484D4150ULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These constants belong in the .cpp file. Also you don't need to use SHOUTY constants in C++. Often in the C++SDK we prefix constants with k_ to indicate that they are constant.
| /// per depth value. | ||
| /// @throws Exception: if the depth data values do not correspond to height and width. | ||
| /// | ||
| static std::vector<unsigned char> encode_depth_map(const Camera::depth_map& m); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take back my comment about it being part of the public API. My comment was based on my assumption that Camera methods were actually going to be trading in depth_maps, in which case I wouldn't have expected this part of the transformation pair to need to be visible to users. However, per my newer understanding about what is being done here, I agree that it needs to be public. I still have some questions though about how users are expected to know what is and isn't a buffer containing a depth map that they should be passing to Camera::decode_depth_map, or what they are expected to do with a buffer containing an encoded depth map after they call Camera::encode_depth_map.
| /// data. | ||
| /// @throws Exception: if the data is misformatted e.g. doesn't contain valid depth information, | ||
| /// or if the data size does not match the expected size based on the width and height. | ||
| static Camera::depth_map decode_depth_map(const std::vector<unsigned char>& data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
| } | ||
|
|
||
| std::vector<unsigned char> Camera::encode_depth_map(const Camera::depth_map& m) { | ||
| if (m.depth_values.size() != m.width * m.height) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the depth_map constructor enforce this invariant?
| size_t offset = 0; | ||
|
|
||
| // Network data is stored in big-endian, while most host systems are little endian. | ||
| append_uint64_big_endian(data, offset, MAGIC_NUMBER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better if offset was passed by pointer here to make it clearer to a reader that it was going to be modified.
| depth_map.width = width; | ||
| depth_map.height = height; | ||
| depth_map.depth_values = std::move(arr); | ||
| return depth_map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll bet you can write this entire thing as just return {width, height, std::move(arr)};
| /// depth_map holds the width and height of a depth map, along with a vector | ||
| /// of actual depth values. Each depth value is a 16-bit unsigned integer representing | ||
| /// the distance from the camera to a point in the scene. | ||
| struct depth_map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are invariants for a valid depth_map, notably that depth_values.size() == width * height. That'd argue for making this a class so that the invariant can be enforced on construction and then preserved.
The scope of this PR is to put decode and encode methods into the SDK as sort of util methods for depth map handling in modules and clients. For decoding, I wanted to mirror how they are handled in the Python SDK. In the Python SDK, we leave it to the client to call Sorry for all the confusion. There is definitely a misalignment in what the methods in this PR should be doing/where they should be used. My mistake was thinking that this ticket would be as simple as copying over the function I wrote for the RGB-D overlay module into To amend this mistake, I think we should chat briefly to discuss where it makes sense to place these functions in a less haphazard way that is appropriate for each use case. We should definitely add these methods though because for most of the modules I've worked on so far, I've had to write my own custom transcoding logic for Viam depth map mime return types. |
|
@hexbabe and I met to discuss this. The plan is to move forward with this review. The |
1366e6d to
16fba03
Compare
… implementation to use xtensor aliased depth map type; Fixed import orders; Updated tests accordingly
acmorrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good, though I think there is a memory error that needs to be addressed.
src/viam/sdk/components/camera.cpp
Outdated
| for (size_t j = 0; j < width; ++j) { | ||
| const uint16_t value = m(i, j); | ||
| const uint16_t value_be = boost::endian::native_to_big(value); | ||
| std::memcpy(&data[offset], &value_be, sizeof(uint16_t)); | ||
| offset += sizeof(uint16_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make append_uint64_big_endian a template on the value type, you could re-use it here:
template<typename T>
append_big_endian(std::vector<unsigned char>& data, size_t* offset, T value);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that's so elegant it brings a tear to my eye
src/viam/sdk/components/camera.cpp
Outdated
| depth_values.push_back(depth_value); | ||
| } | ||
|
|
||
| return xt::adapt(depth_values, std::array<size_t, 2>{height, width}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to cause problems: I think xt::adapt is only going to view the memory in depth_values, and it will go out of scope when this function returns.
I think you will need another approach to construct an xtensor that owns the memory here. Let me know if you need me to look into how to do that best.
…-endian helpers to use template value types
a77c318 to
b84b545
Compare
| } | ||
|
|
||
| xt::xarray<uint16_t> m = xt::xarray<uint16_t>::from_shape({height, width}); | ||
| std::copy(depth_values.begin(), depth_values.end(), m.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked up a method to copy the memory in. Let me know if there's a better way to do it @acmorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I was actually just looking at that. So, what you have here is fine, though it does cost you a copy. If you wanted to avoid the copy though, I think that's totally doable.
The trick would be to not create a std::vector, but just to directly create the xt::xarray you want to return, of the appropriate size and shape, before the loop, and then populate it in the loop.
xt::xarray<uint16_t> m = xt::xarray<uint16_t>::from_shape({height, width});
for (size_t i = 0; i < width * height; ++i) {
m(<ii>, <jj>) = read_big_endian<uint16_t>(data, &offset);
Now, that'd require a little hassle to compute the correct ii and jj w.r.t. height and width and i, which is sort of a hassle. However, xtensor lets you reshape and view. So I think you could do it something like this:
xt::xarray<uint16_t> m = xt::xarray<uint16_t>::from_shape({height, width});
auto m_linear_view = xt::flatten(m);
for (size_t i = 0; i < width * height; ++i) {
m_linear_view[i] = read_big_endian<uint16_t>(data, &offset));
return m;
It may not be exactly flatten, you might need reshape_view? I'd need to spend a little time with the xtensor docs to be sure. But something from https://xtensor.readthedocs.io/en/latest/view.html.
I'd say spend no more than 15 minutes on it. If you can make it work, great. If not, put in a TODO and leave it as a copy.
acmorrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if you want to keep the copy. If you elect to use reshape/view to build directly into the output tensor, please post an update with that so I can give it a look.
From https://viam.atlassian.net/browse/RSDK-6530 :