Incoming media prior to Remote Fingerprint Verification #200

aboba opened this Issue May 16, 2015 · 17 comments


None yet

6 participants

aboba commented May 16, 2015

At the ORTC CG meeting on May 13, Justin pointed out that passing incoming media to an RtpReceiver prior to verification of the remote fingerprint could permit a DTLS man-in-the-middle attack. So while we recognize the potential for incoming media in the DTLS "connecting" state, there probably should be a prohibition on passing decrypted media to an attacked RtpReceiver until the "connected" state is reached.

@aboba aboba added the 1.1 label May 16, 2015

Does RTCRtpReceiver need to have a trusted vs untrusted state (because packets should go all the way through decoder so decoder is in the correct state)?

aboba commented May 26, 2015
aboba commented Jun 3, 2015

Should untrusted media should be run through the decoder or instead should be buffered in the RtpReceiver until the fingerprint is verified? If the media is originated by an attacker, it could be an attempt to exploit a decoder vulnerability (e.g. buffer overflow). In that case, decoding the media might allow a decoder vulnerability to be exploited.

rshpount commented Jun 3, 2015

Normally when media is received, the ICE binding request from that sourse
should already be received and validated, so we can verify that media is
coming from a source that did receive our session description. has also
passed the successful DTLS handshake. Only its certificate is not validated
yet. I am not sure why sources that passed all of those hurdles are more
likely to exploit the decoder vs the ones for which certificate is

On the other hand, depending on the amount of buffered media, processing
buffered media would cause a CPU usage spike when DTLS certificate is
validated and decoder will try to catch up. This can have negative effect
on other streams.

Finally, buffering media or passing it through and not playing can be an
implementation detail. As far as protocol users are concerned, it should
look the same with an exception of statistics. As far as statistics is
concerned, I would think that it should be gathered on the un-authenticated
media regardless of either it is not processed and not played or buffered.

Roman Shpount

On Wed, Jun 3, 2015 at 1:29 PM, aboba wrote:

Should untrusted media should be run through the decoder or instead should
be buffered in the RtpReceiver until the fingerprint is verified? If the
media is originated by an attacker, it could be an attempt to exploit a
decoder vulnerability (e.g. buffer overflow). In that case, decoding the
media might allow a decoder vulnerability to be exploited.

Reply to this email directly or view it on GitHub
#200 (comment).


@rshpount because after you have validated the certificate you know for certain the media is coming from the correct party you are attempting to communicate to. Without this a proxy that is doing a man-in-the-middle attack could allow the ICE username fragment/password to flow through but substitute its own DTLS/media in that flow. Therefore an attacker could end up flowing evil packets into decoders vs only allowing packets into decoders that are coming from a more trusted source without ever having seen the session description.


Straw man for discussion:
After the DTLS handshake exchange completes (but before the remote fingerprint is verified) incoming media packets may be received. A modest buffer must be provided to avoid loss of media prior to remote fingerprint validation (which can begin after start() is called).

rshpount commented Jun 4, 2015

@robin-raymond If proxy is running man-in-the-middle attack it can just as well substitute the fingerprint. I understand that identity check can catch this, but no one is currently doing identity checks. Bottom line -- decoder must be secure on its own and cannot rely on DTLS to verify that the media is safe. If it can be crashed by malicious payload, there are plenty of scenarios when it will be crashed by media which would pass all the ICE, DTLS and identity checks.

My suggestion that we should specify that media should not be played before DTLS remote certificate is validated. Media should be buffered or preprocessed to make sure that media playback can be started as soon as the remote fingerprint is received and the media source is authenticated.


@rshpount the difference is that in one case the attacker must intercept the signalling path (which is typically protected via TLS and a trusted root) vs another case where they can act as a proxy and attack traffic without ever seeing or having knowledge of the signalling path. Having said that I do a agree the decoder does need to be solid.

It's also a question of difficulty. By allowing the packets to flow into a decoder and pre-priming all the decoder states, this requires there be information sent along the pipeline that the information is to be decoded (jitter, timing, decoding, rendering surface adapted) but the final render should be muted. So I would hesitate to demand this be a requirement rather than just requiring the packets be buffered. So language like you suggested around "either/or" is fine as long as the packets do not render to an output.

@robin-raymond robin-raymond pushed a commit that referenced this issue Jun 22, 2015
Robin Raymond Addressed Philipp Hancke's review comments, as noted in: Issue #198
Added the "failed" state to RTCIceTransportState, as noted in: Issue #199
Added text relating to handling of incoming media packets prior to remote fingerprint verification, as noted in: Issue #200
Added a complete attribute to the RTCIceCandidateComplete dictionary, as noted in: Issue #207
Updated the description of RTCIceGatherer.close() and the "closed" state, as noted in: Issue #208
Updated Statistics API error handling to reflect proposed changes to the WebRTC 1.0 API, as noted in: Issue #214
Updated Section 10 (RTCDtmfSender) to reflect changes in the WebRTC 1.0 API, as noted in: Issue #215
Clarified state transitions due to consent failure, as noted in: Issue #216
Added a reference to [FEC], as noted in: Issue #217
juberti commented Jun 24, 2015

I agree with Roman's suggestion. We should never play out media for which we are unsure about its security properties. However, we should decode when appropriate to make sure we don't lose critical information.


I agree with Justin that we need to be able to decode media received before the verification is complete. Otherwise, we could be dropping an iframe, which could delay the first rendered frame significantly, among other things.

@aboba aboba added the 1.0 label Sep 21, 2015

Related to:

If packets are allowed to flow through to the RtpListener, this could cause unhandled events for unverified media.


Hi robin-raymond,

downloaded your code, but It is not working for me , throwing errors, please help me.



@PradeepJuloori This is old shim code. A proper shim is now part of the webrtc adapter.js project. I'll have to remove this from the repo (but this is a separate issue).


ORTC Lib does not allow packets to flow until the validated state is complete. ORTC Lib does do some modest buffering to prevent key video frames from being lost, but it was considered too much a security risk to allow any pre-rendering.


TODO - advice for receiver's track: mention that media must pass DTLS validated before media can be rendered to the receiver's output track.

@aboba aboba self-assigned this Jul 13, 2016
@aboba aboba added a commit that referenced this issue Jul 14, 2016
@aboba aboba Incoming media prior to Remote Fingerprint Verification
Fix for Issue #200
aboba commented Jul 21, 2016

Peter: say that implementations must not drop the I-frame (to a degree).


Fixed by merger.

@aboba aboba was unassigned by dontcallmedom Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment