Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Querying HDR Decode and Render Capabilities #118

Closed
vi-dot-cpp opened this issue Jul 10, 2019 · 74 comments
Closed

Add Support for Querying HDR Decode and Render Capabilities #118

vi-dot-cpp opened this issue Jul 10, 2019 · 74 comments

Comments

@vi-dot-cpp
Copy link
Contributor

vi-dot-cpp commented Jul 10, 2019

This is part 1, which covers decoding and rendering, of the HDR two-part series. Part 2 (#119) covers display.

Modern day scenarios, based on data & partner asks we have analyzed, are increasingly requiring the need of HDR capability detection in v1. We let the following design considerations guide this proposal:

  1. Separate decoding & rendering capabilities (MediaCapabilities) and display capabilities (Screen). Relevant threads/comments: [1][2][3][4][5][6]
  2. Bucket vs. granularity for HDR Capabilities. See: $todo in explainer.md#HDR
  3. Distinguish graphics and video capabilities Distinguish graphics and video capabilities #25
  4. Limit finger printing

We propose the following changes to MediaCapabilities. These changes will be complemented by changes to Screen in the aforementioned linked issue.

  • Add a bucketized HdrCapability enum to VideoConfiguration in similar fashion to Android’s HdrCapabilities.
    • A bucketed approach solves nuanced granular properties like EOTF, color depth, and color gamut [Need to add EOTF support for HDR #10][ #110-comment]
    • A bucketed approach also addresses the limitation that granular aspects like frame metadata are not standardized
    • There are environments that currently support playing HDR content on certain SDR hardware for a pseudo-HDR experience

1. Define HdrCapability Enum

Shared in Screen and MediaCapabilities:

enum HdrCapability {
    “HDR10”,  
    “HDR10Plus”,  
    “DolbyVision”,  
    “HLG”; 
}; 

2. Add HdrCapability Enum to VideoConfiguration

dictionary VideoConfiguration { 
    … 
    HdrCapability hdrCapability; 
}; 

Team: @scottlow @GurpreetV @isuru-c-p @vi-dot-cpp from Microsoft

@chcunningham
Copy link
Contributor

I dig it! I'm not a color expert, but I think this seems sane and does a great job of addressing the pitfalls of previous proposals. Kudos to you all for diligence.

Re: color expertise, we'll want some other folks to weigh in.

Give me a bit to collect additional Chrome feedback from folks who know the color stack.

@chcunningham
Copy link
Contributor

Aside: If we do spec these values we may need to go the path of having a registry (similar to MSE byte-streams). Nudge, @mounirlamouri who's familiar with those reqs.

@mwatson2
Copy link

Great proposal, thanks!

We have a small problem with the coupling of decoding and rendering. A video codec has no knowledge about the pixel data encoding, except the bit depth and spacial aspects. The four buckets correspond to:

  • PQ, BT.2020, 10bits, SMPTE ST 2086 static metadata
  • PQ, BT.2020, 10bits, SMPTE ST 2094 dynamic metadata (I forget which part)
  • PQ, BT.2020, 10bits, SMPTE ST 2094 dynamic metadata (a different part ;-)
  • HLG, BT.2020, 10bits

All of these could potentially work with AV-1 as well as HEVC or any other 10-bit codec (e.g. VP9). But at least the term HDR10 implies HEVC.

Now, the codecs string for some codecs (including VP9 and AV-1 [1]) can include information about Transfer Function, Color Space as well as bit depth, chroma sub-sampling, video range flag and matrix coefficients but does not include HDR dynamic metadata information. And for other codecs (e.g. HEVC) this information is not in the codec string.

I think the bucketing is fine, but the buckets should be constrained to the rendering capabilities and precisely defined in terms of Transfer Function, Color Space and metadata specification. The buckets should not carry an implication about bit depth, chroma sub-sampling, video range flag and matrix coefficients.

And then, finally, we should describe the error case where the HDR capability bucket is incompatible with the codec string.

[1] https://aomediacodec.github.io/av1-isobmff/#codecsparam

@jernoble
Copy link
Contributor

I don't like adding vendor-specific names to specifications, so I'm hesitant to enshrine "DolbyVision" into Media Capabilities. I proposed a something similar in #110, but using transfer function, color space, and bit depth.

@jernoble
Copy link
Contributor

I'm also concerned about conflating whether a decoder supports these HdrCapability settings and whether the display is capable of rendering the output of the decoder, hence a separate DisplayCapabilities API in #110.

@chcunningham
Copy link
Contributor

chcunningham commented Jul 15, 2019

(mwatson2)
The buckets should not carry an implication about bit depth, chroma sub-sampling, video range flag and matrix coefficients.

Just to confirm, this leaves 3 parts: eotf, color gamut, and metadata? For the screen interface (#119) this would be just the first 2 (metadata handled in software)?

And then, finally, we should describe the error case where the HDR capability bucket is incompatible with the codec string.

2 routes we could go:

  • Today we could have a codec string provide "level" info that is technically incompatible with provided framerate, bitrate, resolution info. The spec ignores this. In Chrome we check that the codec string is valid, but we use the explicitly described fields when checking for performance/power efficiency. This is more precise (levels are sometimes large buckets). We don't cross validate (would be high effort for low return).
  • For EME, we explicitly validate parts of the input against other parts. See this section (ex: "If keySystemConfiguration.audioRobustness is present, audio MUST also be present."). Failure to validate triggers a TypeError.

(jernoble)
I'm also concerned about conflating whether a decoder supports these HdrCapability settings and whether the display is capable of rendering the output of the decoder, hence a separate DisplayCapabilities API in #110.

I see you've found #119 ;)

@vi-dot-cpp
Copy link
Contributor Author

vi-dot-cpp commented Jul 16, 2019

(@mwatson2)
I think the bucketing is fine, but the buckets should be constrained to the rendering capabilities and precisely defined in terms of Transfer Function, Color Space and metadata specification. The buckets should not carry an implication about bit depth, chroma sub-sampling, video range flag and matrix coefficients.

Good point. Additionally, HDR profiles like Dolby Vision could theoretically support 12-bit & 10-bit color depth [1]. We are for constraining HDR capabilities to transfer function, color gamut, and metadata. What if the HdrCapability buckets explicitly reflected these properties?

enum HdrCapability {
    “Pq_Rec2020_SmpteSt2086Static”,  // HDR10
    “Pq_Rec2020_SmpteSt2094Dynamic-40”,  // HDR10Plus
    “Pq_Rec2020_SmpteSt2094Dynamic-10”,  // DolbyVision
    “Hlg_Rec2020”;  // HLG
}; 

(@jernoble )
I don't like adding vendor-specific names to specifications, so I'm hesitant to enshrine "DolbyVision" into Media Capabilities.

That makes sense. Would this edited enum also address the reservation against proprietary names?

(@chcunningham )
Just to confirm, this leaves 3 parts: eotf, color gamut, and metadata? For the screen interface (#119) this would be just the first 2 (metadata handled in software)?

The display side does not technically need metadata; what do you think, though, about MediaCapabilities and Screen sharing the same HdrCapability enum for consistency?

Today we could have a codec string provide "level" info that is technically incompatible with provided framerate, bitrate, resolution info. The spec ignores this. In Chrome we check that the codec string is valid, but we use the explicitly described fields when checking for performance/power efficiency. This is more precise (levels are sometimes large buckets). We don't cross validate (would be high effort for low return).

Thanks for suggesting this route. We would like to strive for consistency.

[1] https://www.dolby.com/us/en/technologies/dolby-vision/dolby-vision-profiles-levels.pdf

@jyavenard
Copy link
Member

I gather you meant Pq for DolbyVision transfer function.

There are other proposed HDR formats, in particular is SL-HDR1 and SL-HDR2.

In any case, I think splitting the capabilities between what the user-agent can handle and what can be displayed properly is the way to go.

For example, a UA using an SDR display may handle HDR content well, doing proper tone mapping etc. Preferring HDR content over SDR may still be preferred here, even if the display isn't HDR

@vi-dot-cpp
Copy link
Contributor Author

(@jyavenard)
There are other proposed HDR formats, in particular is SL-HDR1 and SL-HDR2.

These formats can be added to HdrCapability. Given the community's feedback, they shall be added in the following format -- [TransferFunction_ColorGamut_MetaData]. What do you think this approach?

In any case, I think splitting the capabilities between what the user-agent can handle and what can be displayed properly is the way to go.

Agreed -- #119 complements this discussion by covering the display aspect.

@scottlow
Copy link

@vi-dot-cpp and I chatted a bit more offline. Another approach we could take here is one similar to @jernoble's recommendation in #110:

dictionary hdrCapability {
    required ColorGamut colorGamut;
    required TransferFunction transferFunction;
    MetadataDescriptor metadata;
}

Where ColorGamut is an defined enum as follows:

enum ColorGamut {
    "srgb",
    "p3",
    "rec2020"
}

TransferFunction is an enum defined as follows:

enum TransferFunction {
    "srgb",
    "pq",
    "hlg"
}

And MetadataDescriptor is an enum defined as follows:

enum MetadataDescriptor {
    "smpteSt2086",
    "smpteSt2094-10",
    "smpteSt2094-40"
}

The MediaCapabilities spec could then define which combinations of the above enum values are valid "buckets" and we could throw a NotSupportedError exception for the rest.

@chcunningham
Copy link
Contributor

chcunningham commented Jul 18, 2019

I'd vote for the de-bucketing (separate enums for gamut, transfer, and metadata). To me its more elegant and forward looking.

It may also solve the issue of what to do for screen (no need to include metadata). This may make a case for doing away with the wrapper HdrCapability enum, flattening these new fields into VideoConfiguration directly. Then you can pick a handful for the screen API without needing a new wrapper (or a wrapper with parts that don't apply).

On a related note, these are all optional inputs (HDR is new), so we'll want to choose some sane defaults for these fields. I think srgb works for ColorGamut and TransferFunction. We probably need a "none" for the MetadataDescriptor.

Nit: consider renaming MetadataDescriptor to HdrMetadata?

@vi-dot-cpp
Copy link
Contributor Author

@mwatson2 @chcunningham @jernoble @jyavenard I made a PR (#124) that reflects points brought up in this thread. I would appreciate it if you all could review it -- many thanks.

@jernoble
Copy link
Contributor

Is this an actually useful addition to VideoConfiguration? I.e., are there any decoders that can otherwise decode the underlying frames, but are unable to meaningfully read the HDR metadata? I was under the impression that the HDR information was a container-level concept, and not a codec one. Decoders are happy to decode encoded media data, and don't really care about the interpretation of the color values emitted by the decoder; that's left to the renderer and the display hardware.

@isuru-c-p
Copy link

I was under the impression that the HDR information was a container-level concept, and not a codec one.

Dynamic HDR metadata is typically inserted into the compressed bitstream (e.g in HEVC, the metadata is inserted into the bitstream via SEI messages).

Decoders are happy to decode encoded media data, and don't really care about the interpretation of the color values emitted by the decoder; that's left to the renderer and the display hardware.

While this is correct, in the case of dynamic HDR metadata (and also static HDR metadata in many cases), the decoder needs to be able to parse the metadata from the compressed bitstream (in order to pass the metadata through to the renderer / display hardware).

@jernoble
Copy link
Contributor

In that case, do we need to specify each subtype of metadata to query whether the decoder supports each individually, or would it be sufficient to add a "HDR" boolean to VideoConfiguration, and mandate that decoders which advertise support for "HDR" must be able to provide all the necessary metadata to the renderer and display hardware? In other words, could we make this an 'all-or-nothing' check?

@jyavenard
Copy link
Member

Having thought further about it, I'm concerned that querying the display capabilities give too much fingerprinting abilities.
After all, all that really matter as far as content is concerned is that the decoding side of things is handled properly.
After all, there are still benefits to receive HDR content even with a SDR screen.

Regardless of what we add to VideoConfiguration, it appears to me that we'll never cover all cases anyway. So I kind of like a HDR bool that is all or nothing and only in VideoConfiguration.

@jernoble av1 has all the information you typically found in the container, in the frame header: colorspace, range, primaries, coefficient, transfer characteristics etc (the way all codecs should have been :))

@jpiesing
Copy link

I'm confused about this proposal to have an HDR boolean.
How does an app provider know which content to offer the user if they don't know whether the user can consume HLG10, vanilla PQ10, PQ10 with one of the 3 variations of dynamic mapping metadata?
HLG10 is the only one of these that is backwards compatible with SDR.

I'm also confused about the idea of fingerprinting using this data. If all recent Apple products support one particular set of technologies, how much fingerprinting data does that provide? If all 2019 TV sets from Samsung support one particular set of technologies and all 2019 TV sets from LG support a different set, how much fingerprinting data does this provide?

@jernoble
Copy link
Contributor

How does an app provider know which content to offer the user if they don't know whether the user can consume HLG10, vanilla PQ10, PQ10 with one of the 3 variations of dynamic mapping metadata?

My comment was only about VideoCapabilities, which is a proxy for the decoding system, which in turn doesn’t care about PQ vs. HLG.

I'm also confused about the idea of fingerprinting using this data. If all recent Apple products support one particular set of technologies, how much fingerprinting data does that provide?

The danger with HDR comes from being able to query the abilities of the display. Even for devices with built-in screens, they can be plugged into external monitors with different capabilities. Those combinations of capabilities can be extremely unique.

@GurpreetV
Copy link

We want to alleviate fingerprinting concerns.

A proposal to just add a boolean for HdrCapability to both MediaCapabilities and Screen would satisfy the fingerprinting concerns without compromising on the scenarios as they appear.

To give more info, adding the same boolean to Screen would be important to give flexibility to website developers to decide if they should serve HDR content based on whether the display supports it or not. Having the ability to make this decision is important especially because there can be power (for the user agent) & network (for the content provider) implications if the content provider chose to serve HDR content even if the Screen did not support it. So as long as we give them an ability to make this decision consciously, it is good enough.

We would still want to keep the HdrMetadata for the reasons @isuru-c-p mentioned above

While this is correct, in the case of dynamic HDR metadata (and also static HDR metadata in many cases), the decoder needs to be able to parse the metadata from the compressed bitstream (in order to pass the metadata through to the renderer / display hardware).

@jpiesing
Copy link

The danger with HDR comes from being able to query the abilities of the display. Even for devices with built-in screens, they can be plugged into external monitors with different capabilities. Those combinations of capabilities can be extremely unique.

We want to alleviate fingerprinting concerns.

I'm sorry but I don't understand these. My employer is a large manufacturer of monitors and TVs. I asked a colleague about information carried in HDMI and what might be vulnerable. What he said was the following;

Regarding HDMI and HDR specifically, displays can expose the following:
• Display primaries (RGBW x,y coordinates)
• Supported colorimetry (e.g. BT.2020 RGB)
• Supported HDR transfer functions (e.g. ST2084, HLG)
• Display luminance (expressed as desired max, desired min, desired max frame-average for optimal rendering by the display).
• Supported dynamic HDR metadata (Dolby Vision, HDR10+, SL-HDR2).
• Detailed private data for Dolby Vision if supported.

The primaries used are often not the precise display panel primaries, but just those of BT.709 or DCI-P3.

TVs don’t expose luminance information to my knowledge.
Some monitors do expose the luminance information, as I think it is required for the VESA DisplayHDR logo program.

Using the closest typical/generic values (e.g. min 0.05, max 1000cd/m2) for luminance instead of precise device-specific values could hamper fingerprinting.

Over HDMI, the HDR-related information in the EDID that most facilitates fingerprinting would be the Dolby Vision data. As it has to be display-specific, it is very precise.

I don't believe it was ever to expose the detailed data specific to the technology he mentions.

What am I missing?

@jernoble
Copy link
Contributor

Generally speaking, it only requires 33 bits of entropy in order to uniquely identify a user by fingerprinting, and these bits of entropy are cumulative. So the concern is not that exposing detailed device information alone will be able to uniquely identify a user, but that in combination with all the other sources of entropy available pages will be able to do so. "Does this display support HDR or not?" is one bit of entropy [1] (out of 33 total). "Does this display support ST2084 but not HLG?" is another two. "Does this display support Dolby Vision, but not HDR10+ and SL-HDR2" is another three. "What is the display luminance?", if expressed as a floating point number, could be as many as 32 bits of entropy.

[1] This is a theoretical maximum amount of entropy. If everyone in the world gave the same answer to that question, it wouldn't really add fingerprinting risk. So it's not as useful to be able to determine that "this user is on an iPhone", which isn't very unique, as it is "this user is attached to a LG model 27UK650 external display and their brightness setting is 55".

@jernoble
Copy link
Contributor

There's a lot more information here, Privacy WG's Fingerprinting Guidance document.

@GurpreetV
Copy link

Given the concerns of finger printing and given that we really don't need more than a Boolean to represent HDR support for the major scenarios, I think we no longer need to discuss having more granular data representing the device and can keep it simple. I have updated the pull request with the new proposal. @jernoble , @chcunningham , @jpiesing , @jyavenard, @mwatson2 can you see if the latest pull request seems aligned with you all. We will add a similar Boolean for HDR support to screen.

@chcunningham
Copy link
Contributor

chcunningham commented Jul 26, 2019

Help me clarify the meaning of the boolean. The latest PR update defines it as "hasHdrMetadata". I think we want to avoid having to say "we support all forms of HDR metadata". Can we reliably infer the type of metadata from the VideoConfiguration contentType? DV has its own contentType strings (e.g. codecs="dvhe.05.07"), so infer SMPTE ST2094-10. But for the other codecs they're not tightly bound to either SMPTE ST 2086 nor SMTPE ST2094-40 nor whatever future metadata spec arises. What does the boolean mean for non DV codecs?

Giving it a clear meaning is good to avoid ambiguity about future metadata formats. But also, even for known formats, UAs are likely to support just a subset. I can't predict whether Chrome will ever support DV. I also expect support for ST2094-40 to be spotty for many UAs for some time.

Re: fingerprinting, the Chrome security team's position is nuanced. Please have a read here. In short, I'm happy to consider alternatives to the buckets above, but I'm not personally worried that these APIs are meaningful additions to the required 33 bits.

@jyavenard
Copy link
Member

In short, I'm happy to consider alternatives to the buckets above, but I'm not personally worried that these APIs are meaningful additions to the required 33 bits.

I think you should be, however, having said that I don't think that the decision to okay or not as far as fingerprinting goes for the inclusion of such feature in the spec should be left to a single person.

Maybe this is something we can put on the agenda for when the new media WG meet at the next TPAC.

@jpiesing
Copy link

Given the concerns of finger printing and given that we really don't need more than a Boolean to represent HDR support for the major scenarios,

Again I apologise if I'm missing something but please can you point me to where the major scenarios are documented and where this analysis is recorded? Thanks.

@chcunningham
Copy link
Contributor

Maybe this is something we can put on the agenda for when the new media WG meet at the next TPAC.

Definitely (see you there). Meanwhile, lets keep discussing how a boolean would work. See my questions above; its not clear to me that its viable yet.

@chcunningham
Copy link
Contributor

chcunningham commented Aug 30, 2019

That said, I think it's worth noting these as potential mitigations in the spec itself, but as a non-normative note, rather than a normative section.

Agree with @mwatson2 and @jernoble - I prefer not to formally require a particular mitigation. New/improved mitigations will arise and each UA will do it differently. For ex, the latest thinking in Chrome-land is to use a "privacy budget" that throttles/blocks calls to the API above a certain threshold (distinguish fingerprinting from legitimate use).

  1. It needs to be query based in that it only returns the single bool per input set
  2. Normative spec prose regarding fingerprint impact

Do these remaining points imply a change to the spec/PR (vs just forming points of agreement)? IIUC, #1 is already true. We have a nod to #2 here - @jernoble do you think this should be amended (e.g. more complete description of the fingerprinting surface)?


Switching gears for a sec, I want to return to some discussion of the colorGamut property that came up near the end of our recent meeting. Quick summary:

  • colorGamut's values are borrowed from CSS media queries,
  • CSS is describing the attached screen, whereas we mean to describe what colors the UA's videostack understands (can "render")
  • VP9 and AV1 codec strings describe colors using color primaries, color matrix, and eotf. They do so using the values from ISO/IEC 23001-8:2016 (free).

Picking back up with new info/questions

  • @mwatson2 was part of the vp9 codec string discussion (recommended reading) and supported using those "code points"
  • on the call there was some discussion about whether support for a matrix could be inferred by support for primaries. I'm not savvy enough to say what the gotchas might be. I note @mwatson2 advocated to include the matrix in the vp9 string. Please discuss :)
  • big picture: I'm getting the sense that ISO_IEC_23001-8_2016 is a preferred language for describing video color. Would folks be supportive of a reference on that spec for eotf, color primary, and perhaps matrix coefficient enums?

@gregwhitworth
Copy link

gregwhitworth commented Aug 31, 2019

Do these remaining points imply a change to the spec/PR (vs just forming points of agreement)? IIUC, #1 is already true. We have a nod to #2 here - @jernoble do you think this should be amended (e.g. more complete description of the fingerprinting surface)?

Yep, was just trying to get a clear resolution on it all so we can put a wrap on this issue. Let's add number 2 to the PR. Regarding colorGamut, I think it would be best to keep this discussion solely to the HDR fingerprinting issue and since it seems like we're gaining consensus on adding the spec prose/top level browsing context; let's resolve and close on this issue. I've opened a separate issue to flush out the issues/questions you've outlined for colorGamut in #130

@vi-dot-cpp can you add the following to the PR:

  1. spec text regarding the fingerprinting surface of this. And based on the feedback from @chcunningham @jernoble and @mwatson2 avoid outlining any mitigations.
  2. Restrict it to the top level browsing context

Thanks for the quick responses and feedback.

@chcunningham
Copy link
Contributor

chcunningham commented Sep 3, 2019

  1. Restrict it to the top level browsing context

This is also a mitigation. Please don't add this to the PR.

I think it would be best to keep this discussion solely to the HDR fingerprinting issue and since it seems like we're gaining consensus on adding the spec prose/top level browsing context;

@gregwhitworth can we keep it here? This issue is as much about the interface (including enum values) as it is fingerprinting concerns. As-is, the PR would add a colorGamut property to MediaCapabilities that does yet exist. A handful of folks were concerned this is not quite right, so we should get consensus on that before landing a PR to add it.

@chcunningham
Copy link
Contributor

chcunningham commented Sep 4, 2019

Greg closed the separate issue (thanks). @mwatson2 @jernoble @jpiesing interested to continue the discussion re: colorGamut vs ISO_IEC_23001-8_2016. See my earlier comment.

@gregwhitworth
Copy link

gregwhitworth commented Sep 4, 2019

can we keep it here?

@chcunningham that's fine, this thread has already numerous issues so let's keep it here.

With regards to your feedback on colorGamut, let's tackle the CSS one first, the CSS spec states:

The color-gamut media feature describes the approximate range of colors that are supported by the UA and output device. That is, if the UA receives content with colors in the specified space it can cause the output device to render the appropriate color, or something appropriately close enough.

This implies that they're overloading color-gamut for both the rendering capabilities and the display capabilities. That said, a bit further down when defining the color spaces it says:

The output device can support approximately the sRGB gamut or more.

So this seems to contradict the first item as you stated and is only about the display, not the rendering capabilities & the display. I can file an issue and follow up with the CSSWG on a call following TPAC to see which direction they intended for this and we can either amend our spec to build on top of theirs. Or we can see if they'll have the color spec adjusted to align the color space definitions with the earlier paragraph as it doesn't make sense to go down a code path for a color space that the display can support but the UA can't adequately render. I personally think that we want to adjust the spec to the following (for all of the color space definitions):

The output device and the UA can support approximately the sRGB gamut or more.

Would that be sufficient?

@gregwhitworth
Copy link

@chcunningham @mwatson2 @jernoble @jpiesing I presume I should move forward with opening an issue on the CSSWG to fix the contradictions between their propdef of color-gamut and that of the color space definitions; correct?

@mwatson2
Copy link

mwatson2 commented Sep 6, 2019

Regarding whether we need to separately specific matrix coefficients, to completely make sense of decoded pixel data you need to know full range flag, eotf, matrix coefficients and color primaries:

  • the full range flag and eotf specify the mapping between integer values and linear light
  • the matrix coefficients specific the mapping between YCbCr and RGB
  • the color primaries tell you exactly what colors R, G and B are

When labelling a video stream, the values of all of these things are known and there is little reason not to declare them all in the codec string. This is just accurate labeling of a stream.

For capability discovery we can get away with a smaller set when it is known that all devices support all relevant values of one of these. Many of the values for color primaries and matrix coefficients in the codec-independent code points document are not relevant in a web context. Specifically, we only care about SDR (709) and BT.2020 for color primaries and there is only one matrix coefficients value used with 709.

I am actually not sure whether it is the case that only one value of the full range flag is used in practice or whether devices universally support both values, but I infer from the lack of problems related to this flag that one of these is true ;-) Same for the two values of matrix coefficients associated with BT.2020, though I do know here that the 'constant luminance' one is not widely supported if at all.

So, for capability discovery we are probably fine with TF and color primaries. Matrix coefficients could be added later on if someone has support for BT.2020 constant luminance and wants that to be discoverable. But this is not so likely to happen as I doubt people will want to double up their streams for the small benefit this option provides.

@jernoble
Copy link
Contributor

jernoble commented Sep 6, 2019

@mwatson2 said:

When labelling a video stream, the values of all of these things are known and there is little reason not to declare them all in the codec string.

We’ve been down this road before with EME. Existing codec strings don’t carry this information, and bodies that standardize them are very resistant to putting stream characteristics into the codec string. So not only will this not work for existing codecs and containers, it’s unlikely to work universally for future codecs and containers as well. I don’t think we’re going to be able to get away with putting all this information into the content type.

@vi-dot-cpp
Copy link
Contributor Author

vi-dot-cpp commented Sep 6, 2019

Thanks everyone for the feedback. Based on our discussion, I have updated #124 to include the following:

  • transferFunction and colorGamut (as defined by 'color-gamut' in CSS Media Queries) in VideoConfiguration.
  • updated fingerprinting information in the nonnormative fingerprinting section.

The update is based on:

  • @jernoble's fingerprinting analysis, and our agreement that best practices have been met.
  • The discussion regarding colorGamut. Namely that CSS Media Queries defn covers display and UA (but may need update for individual color space defns), and color primaries are sufficient for web capabilities detection.

@gregwfreedman
Copy link

i think we're conflating the color-gamut media-query and the ColorGamut enum.

the color-gamut media-query takes a ColorGamut enum as input and tests support by the UA and the output device. the ColorGamut enum values only represent a color space, nothing more. it is the color-gamut media-query which is returning device information for a given color space.

the proposal here is to add the ColorGamut enum to represent a color space, without the color-gamut semantics.

@gregwhitworth
Copy link

@gregwfreedman valid point that it's an enum and not necessary what's doing the evaluation of support. That said, I went ahead and filed an issue with the CSSWG spec and they'll be fixing it to reflect rendering & display. w3c/csswg-drafts#4281

@vi-dot-cpp you should be able to either change your PR for this to be a note or remove the description altogether because the CSS spec will be the definition you're expecting.

@chcunningham
Copy link
Contributor

chcunningham commented Sep 8, 2019

FYI, I'll be largely out of office next week as I head to Japan and squeeze in some tourism before TPAC. Looking forward to a f2f chat!

the proposal here is to add the ColorGamut enum to represent a color space, without the color-gamut semantics.

This is how I understood the proposal. Just want to make sure it has everything we need. Interested to hear @mwatson2 come back on @jernoble's last comment.

@vi-dot-cpp - the PR presently says "The ColorGamut represents the color gamut supported by the UA and output device." I follow that this is the CSS wording, but we should somehow call out that calls to decodingInfo() actually aren't checking the output device. IIUC the plan has been to leave output device queries to the Screen API, meaning color gamut for decodingInfo() is purely a question of what the UA supports.

@vi-dot-cpp
Copy link
Contributor Author

vi-dot-cpp commented Sep 9, 2019

(@chcunningham)
"...but we should somehow call out that calls to decodingInfo() actually aren't checking the output device. IIUC the plan has been to leave output device queries to the Screen API, meaning color gamut for decodingInfo() is purely a question of what the UA supports."

Correct me if I misunderstand -- will there not be UAs for whom decodingInfo() checks the attached screen, e.g., Cast?

Looking forward to a f2f chat!

Some of us will regrettably miss this opportunity; Is calling in an option?

@mwatson2
Copy link

mwatson2 commented Sep 9, 2019

@jernoble wrote:

We’ve been down this road before with EME. Existing codec strings don’t carry this information, and bodies that standardize them are very resistant to putting stream characteristics into the codec string. So not only will this not work for existing codecs and containers, it’s unlikely to work universally for future codecs and containers as well. I don’t think we’re going to be able to get away with putting all this information into the content type.

The VP9 and AV1 codec strings carry this information, but I understand others don't. Let me clarify my point though: I was not proposing we use codec strings for capability discovery past the identification of the codec that is common. I was pointing out the difference between describing stream properties and discovering capabilities, since someone had mentioned that I had argued for matrix coefficients as an item in the VP9 codec string, but in this discussion I think we don't need it.

If you are describing stream properties, then these are just descriptive values and you might as well include everything to be fully descriptive. When discovering capabilities the task may be simplified by known facts of the form "all implementations that support X also support Y" or "no implementation exists that supports P with Q". We don't need to separately specify matrix coefficients for discovery since there is only one relevant value for each color gamut.

Also, in future, if necessary, new capability discovery fields can be added when new capabilities are added to an implementation but it would be much harder to add a field to the codec string since that has no forwards compatibility mechanism and is embedded in many implementations.

@poolec
Copy link

poolec commented Sep 10, 2019

Just reviewing the PR and trying to understand what is now being proposed, this text seems ambiguous:

The hasHdrCapabilities member represents all HDR-relevant color gamuts (sRGB, p3, rec2020) and transfer function (sRGB, pq, hlg).

Does hasHdrCapabilities mean all of sRGB, p3 and rec2020 need to be supported and all of sRGB, pq and hlg need to be supported as the current text implies? Or is it intended to be a query covering all capabilities that is considered supported if at least one HDR-relevant color gamut and transfer function is supported (in which case, why list sRGB)?

If we're aiming to have just one boolean then I can see pros and cons with either interpretation and which is best rather depends on how likely it is that a device will support some but not all of the capabilities listed.

At the very least, the wording needs tightening to be clear what is being described.

@chcunningham
Copy link
Contributor

chcunningham commented Sep 18, 2019

Correct me if I misunderstand -- will there not be UAs for whom decodingInfo() checks the attached screen, e.g., Cast?

This is true, but I think we have to be careful about when we explicitly mention the screen to avoid confusing the reader. The current language makes it sound as if we will only return support for rendering a specific color gamut if the attached screen also supports outputting this gamut. We want to avoid that coupling (having screen output capabilities addressed by the Screen API).

When I mentioned the Cast example earlier this was to motivate the inclusion of eotf. In these cases, the screen line between the display and UA are blurred. There will also be cases where the UA software runs entirely within the display (Smart TVs). But we don't need to bring attention to this fact in the spec because it isn't important for sites to know and it implies the coupling I mention above. IMO the way to draw the line is to continue separate Screen vs Decoding+Rendering such that we only put things on Screen that were traditionally Screen properties (before screens starting building in computers) - things like dimensions, color gamut, hdr support. SmartTVs that act as a UA + Display can continue to answer the non-Screen decodingInfo() questions in the same way we would for a traditional desktop + display.

@vi-dot-cpp
Copy link
Contributor Author

It was nice to speak with everyone at the TPAC face-to-face and get agreement on this issue. #124 has been updated to reflect suggestions surfaced here and at TPAC.

@rdoherty0
Copy link

I don't like adding vendor-specific names to specifications, so I'm hesitant to enshrine "DolbyVision" into Media Capabilities. I proposed a something similar in #110, but using transfer function, color space, and bit depth.

I realize this is a comment from some time ago, but it may be important to note that Dolby Vision is a superset of SMPTE 2094-10, particularly when it comes to OTT video distribution. See https://www.dolby.com/us/en/technologies/dolby-vision/dolby-vision-profiles-levels_v1.3.2.pdf

I believe this is why the vendor strings were chosen for Android: https://developer.android.com/reference/android/view/Display.HdrCapabilities.html

@jernoble
Copy link
Contributor

@rdoherty0, could you clarify: I don’t see any reference to SMPTE 2094-10 in that document, only SMPTE 2086.

When you say “superset”, do you mean that the bitstream carries multiple metadata formats at the same time? Or that the bitstream is capable of carrying one out of a defined set of metadata formats? The “BL signal cross-compatibility ID” section seems to indicate the latter.

@rdoherty0
Copy link

@rdoherty0, could you clarify: I don’t see any reference to SMPTE 2094-10 in that document, only SMPTE 2086.

When you say “superset”, do you mean that the bitstream carries multiple metadata formats at the same time? Or that the bitstream is capable of carrying one out of a defined set of metadata formats? The “BL signal cross-compatibility ID” section seems to indicate the latter.

There is a lot to unpack here, unfortunately. Your second statement is closer to the truth: there is one complete metadata set per stream. There is more documentation from Dolby here which documents the inclusion of Dolby Vision streams into various formats (DASH, for example): https://www.dolby.com/us/en/technologies/dolby-vision/dolby-vision-for-creative-professionals.html#5

The 2094-10 metadata is used in several standards' based efforts, including ATSC and DVB, and specified in DASH-IF IOP spec. But most Dolby Vision profiles extend this metadata, including the composing metadata specified in the ETSI specification (https://www.etsi.org/deliver/etsi_gs/CCM/001_099/001/01.01.01_60/gs_CCM001v010101p.pdf), which does reference SMPTE 2094-10.

Most online distribution is using Dolby Vision profiles 5 or 8.1.

I would suggest none of this complexity needs to be exposed at this API layer, the simple existence bit as proposed is ok, but it would be not accurate to label the Dolby Vision "family" of HDR metadata as SMPTE 2094-10.

@chcunningham
Copy link
Contributor

chcunningham commented Oct 18, 2019

Celebrate!!! PR #124 is merged! This includes all the bits we agreed to in this discussion and tpac. It does not include the Screen API changes that are still under discussion.

I'm going to close the out and file a separate issue to see if we should make any revision for the points raised by @rdoherty0.

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests