Contributing source(s) missing "voice" activity flag #263

Closed
robin-raymond opened this Issue Nov 2, 2015 · 4 comments

Projects

None yet

2 participants

@robin-raymond
Contributor

There is audio level and csrc/ssrc but no flag indicating voice which is contained within the packet.

NOTE: This is only contained in "client to mixer" extension so maybe it should not be added but peer to peer this would be available so there might be some value in exposing this value to the programmer. Could be "unset" for "mixer to client" obtained values and "set" for "client to mixer" obtained values (i.e. when value arrives peer to peer).

@aboba
Contributor
aboba commented Dec 26, 2015

I think there are two issues here:

  1. How does an RtpSender turn on the "V" bit? Currently RTCRtpHeaderExtensionParameters has no support for extension mechanisms such as the "vad" extension (which can be "on" or "off" according to RFC 6464 Section 4). In a situation where the SFU discards packets without the "V" bit set (since they don't include voice there is no need to forward them), it is necessary for the browser to be able to set the "vad" extension to "on".

Do we need to add parameters to RTCRtpHeaderExtensionParameters to allow header extension parameters such as the "vad" parameter to be set? For example:

partial dictionary RTCRtpHeaderExtensionParameters {
Dictionary parameters;
};

Similarly, do we need to add capabilities to RTCRtpHeaderExtension to indicate what header extension parameters are supported?

partial dictionary RTCRtpHeaderExtension {
Dictionary parameters;
};

  1. Is there value in adding support for the "V" bit to the RTCRtpContributingSource dictionary?

Since the "V" bit does not exist in RFC 6465 (mixer to client extension), this question only arises for the peer-to-peer case, where the client-mixer extension (RFC 6464) is used.

If the use case is to set a level indicator, I don't think the "V" bit is valuable - regardless of whether the bit is on/off, the browser would just use the audioLevel value to indicate the level of energy coming from that peer.

Robin Raymond filed Issue 263:
#263

There is audio level and csrc/ssrc but no flag indicating voice which is contained within the packet.

NOTE: This is only contained in "client to mixer" extension so maybe it should not be added but peer to peer this would be available so there might be some value in exposing this value to the programmer. Could be "unset" for "mixer to client" obtained values and "set" for "client to mixer" obtained values (i.e. when value arrives peer to peer).

From RFC 6464 Section 3:

In addition, a flag bit (labeled "V") optionally indicates whether
the encoder believes the audio packet contains voice activity. If
the V bit is in use, the value 1 indicates that the encoder believes
the audio packet contains voice activity, and the value 0 indicates
that the encoder believes it does not. (The voice activity detection
algorithm is unspecified and left implementation-specific.) If the V
bit is not in use, its value is unspecified and MUST be ignored by
receivers. The use of the V bit is signaled using the extension
attribute "vad", discussed in Section 4.

From RFC 6464 Section 4:

The URI for declaring this header extension in an extmap attribute is
"urn:ietf:params:rtp-hdrext:ssrc-audio-level".

It has a single extension attribute, named "vad". It takes the form
"vad=on" or "vad=off". If the header extension element is signaled
with "vad=on", the V bit described in Section 3 is in use, and MUST
be set by senders. If the header extension element is signaled with
"vad=off", the V bit is not in use, and its value MUST be ignored by
receivers. If the vad extension attribute is not specified, the
default is "vad=on".

An example attribute line in the Session Description Protocol (SDP)
for a conference might hence be:
a=extmap:6 urn:ietf:params:rtp-hdrext:ssrc-audio-level vad=on

The vad extension attribute only controls the semantics of this
header extension attribute, and does not make any statement about
whether the sender is using any other voice activity detection
features, such as discontinuous transmission, comfort noise, or
silence suppression.

@aboba aboba added a commit that referenced this issue Dec 27, 2015
@aboba aboba Add support for header extension attributes
Fix for Issue #263
919ec9e
@robin-raymond
Contributor

But should is also have this peer to peer?

dictionary RTCRtpContributingSource {
             DOMHighResTimeStamp timestamp;
             unsigned long       source;
             byte                audioLevel;
             boolean             vad;  // <-- add this?
};

This would allow a P2P app to know active speaker so the proper video gets put into primary focus (e.g. useful for telepresence).

@robin-raymond
Contributor

Also byte? audioLevel... don't think we need nullable, just not present when not set.

@aboba aboba added a commit that referenced this issue Jan 5, 2016
@aboba aboba Contributing source(s) missing voice activity flag
Fix for Issue #263
4d00930
@aboba aboba added a commit that referenced this issue Jan 13, 2016
@aboba aboba Clarify Sender/Receiver behavior of the "vad" attribute
Clarification relating to Issue #263
78592d5
@aboba
Contributor
aboba commented Jan 15, 2016

January 14, 2016: Issue discussed in WEBRTC WG virtual interim. Addition of support for the "V" bit accepted, but v attribute changed to voiceActivityFlag attribute.

WebRTC 1.0 API PR: w3c/webrtc-pc#454

@aboba aboba added a commit that referenced this issue Jan 15, 2016
@aboba aboba Change v attribute to voiceActivityFlag
To sync with WebRTC 1.0 PR: w3c/webrtc-pc#454

Fix for Issue #263
dff24f4
@aboba aboba closed this Jan 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment