-
Notifications
You must be signed in to change notification settings - Fork 20
Proposal for FrameGrabber, FrameData, DepthMap #77
Conversation
looks nice! lgtm |
@@ -358,66 +316,149 @@ | |||
</section> | |||
<section> | |||
<h2> | |||
<code>CanvasImageSource</code> typedef | |||
<code>DepthData</code> interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider to use DepthMap name? As depthmap (http://en.wikipedia.org/wiki/Depth_map) is a quite common name used in 3D camera space, e.g. Google Tango Project (https://developers.google.com/depthmap-metadata/) and Intel RealSense SDK are all using it in developer manual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point - linguistically I like the symmetry of ImageData/DepthData - but since there's an existing convention I'd be happy if we ended up with ImageData/DepthMap.
Thanks, @anssiko ! It looks nice. My comments provided. |
@huningxin @robman @ds-hwang Thanks for the review and comments. I've addressed them in an update to this PR. Some questions on the new proposed
|
Hi @anssiko For your second point, I don't believe we should drop near/far/format. Near and far variables required to reconstruct depth if you receive a normalised depth value. Basically format defines which algorithm to use. For your last point, the measureType defines if the length is measured along the optical axis (depth down the X3 axis in the pinhole camera model) or as a ray (along the effective hypotenuse). Again this tells us what algorithm to use. |
@robman Thanks, what are the key use cases for which it would be preferred to use of the raw depth map data over the normalized data? And respectively, what are the key use case that require normalized data? I'm interested in figuring out how much of the details we can defer to the implementation, and how much we must expose to address the requirements. We should not expose metadata that has no supporting use case, and minimize the API surface instead, even if exposing metadata would be cheap. Do the known implementations support both the measureTypes? |
Hi @anssiko Thanks for the updating.
It sounds good to me. Web platform can stick to one "native" unit and data format. Implementations figure out how to fit into it. There are two flavors:
What do you like?
To encoding the depth value, e.g. implement https://developers.google.com/depthmap-metadata/encoding on web, web developers need to know near and far.
Agree with @robman , it tells web developer the depth measurement model, some algorithms depend on it.
|
@ds-hwang pointed out that Uint16 maps better to GPU when uploading this data to a WebGL texture. I'll let @ds-hwang expand. @huningxin Yes, using Uint16 with mm units would represent the range of 0..65.535 m as you say. If we'd stick with a single format (say range linear) and units (say mm) what use cases would we miss? At least that'd make the API simpler for the web developer. IOW, what are the key use cases that require the web developer to be able to convert the normalised depth values back? |
Uint16 has enough accuracy. Let me explain. First of all, we need to support RangeLinear mode and RangeInverse mode like https://developers.google.com/depthmap-metadata/encoding User will calculate real distance using depth camera output. RangeLinear formula is RealDistance = d(far - near) + near RangeInverse formula is RealDistance = (far x near)/(far - d*(far - near)) As you see, depth value is just scale. it don't has unit. near and far has unit. RangeLinear is used for dance game or something and RangeInverse is used for face recognition. On the other hands, chromium keeps the cam video in texture. In the same sense, chromium will keep depth value in texture. 32float texture is supported on only extremely modern gpu. IMO 32float texture is overkill. |
IMO, with Uint16 with mm units (or Float32 with m units), we won't need to support encoding format, say RangeLinear and RangeInverse. The Uint16 depth value represents the real distance, e.g. 1 means 1 mm and 65535 means 65535 mm. It is straightforward to web developers, just use the value without any calculations. The only concern is that whether the range and accuracy is too limited, say will 65.535 m be too small or mm units is too large for some use cases with some new depth cameras in the future? I propose to keep near and far as we may need to support web app to encode the depth value into other formats. Just like https://developers.google.com/depthmap-metadata/encoding, if web developer wants to encode the depth metadata into XMP properties, they need the near and far values. They would write JavaScript code to implement either RangeLinear or RangeInverse encoding mechanism. |
I trust your judgement. However, I have to say there are drawback with value with unit. Let's assume near is 0.1m and far is 5m. (0, 100) and (5000, 65535) are useless at that time. It's why tango encoding format uses scale value, instead of real value. Some depth sensor in the future can have capability to detect >65m range. Some application want more precision <1mm. If value has unit, it looks not flexible. What is real output of RealSense and Kinect? |
I updated the PR. Please review and comment. The use case for I'd like to get resolution on the issue whether we should just hardcode If we keep
|
RealSense SDK:
Kinect SDK: Project Tango:
This is why Float32 with meter units seems promising. |
Point Cloud Library (PCL) is using float in RangeImage (depth map): with millimeters units: |
@huningxin Thanks for sharing information on implementations. What is your guesstimate re the performance implications (also memory) of using If there are performance concerns, one approach worth considering might be to go with the lowest common denominator (e.g. Re units, mm sounds like the best choice regardless of the type. |
@anssiko , thanks for the comments. I agree with you that the Because: Kinect SDK uses 16-bit unsigned integers with mm According to Tango Depth Camera implementation in Chromium (https://code.google.com/p/chromium/codesearch#chromium/src/media/base/android/java/src/org/chromium/media/VideoCaptureTango.java&q=tango&sq=package:chromium&l=141),
It is also So For the units, I suggest we add |
@@ -453,26 +557,63 @@ | |||
<dd> | |||
- | |||
</dd> | |||
<dt> | |||
DOMString format = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need format
anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped.
* s/video track/video stream track/g * Link to the above terminology across the spec.
@huningxin Thanks again for your suggestions. I updated the spec to address your comments. I also did further refactoring. I'd like to review this after your final review. |
</dl> | ||
<dl id="enum-basic" class="idl" title="enum DepthMapUnit"> | ||
<dt> | ||
mm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will people confuse 'mm' between millimeters and micrometers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Millimeter (the American spelling of millimetre) has an SI unit symbol 'mm' so I think that's the best we can do. If we'd use the full name (millimeter) people would probably typo it since the American and International spellings differ subtly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds good to me. Thanks for the explanation!
hi @anssiko , thanks for your efforts and explanation. It makes the spec pretty good. LGTM with one open. |
@huningxin @robman @ds-hwang I'll merge this PR now and craft a mail to group to get wider feedback. Thanks for your contributions and review! |
Proposal for FrameGrabber, FrameData, DepthMap
This PR contains an early proposal with contributions from @huningxin @ds-hwang @robman and @anssiko. IDL is in place to allow people to review the API shape, but the prose around the new interfaces is still missing to allow us more easily iterate on the API design based on wider feedback.
HTML preview: https://rawgit.com/anssiko/mediacapture-depth/framegrabber/index.html
New interfaces added in this PR:
FrameGrabber
FrameData
DepthData
The following interfaces and definitions were obsoleted by the new ones, and were removed:
CanvasImageSource typedef
ImageData interface
(or to be precise, extensions to it)Examples:
Editorial: