You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In calls with DTX enabled there is sometimes an issue with periodic "noise bursts" during the DTX period. The issue appears when the background noise is slighty unstationary. In my experience the problem is common and can be triggered by noise from ventilation etc.
To illustrate the problem I have encoded a 10 seconds long file containing speech and slowly increasing noise.
I encoded the file with opus_demo from master as of May 8th 2018 (commit 1b58446). I used DTX and the highest complexity mode.
./opus_demo voip 48000 1 32000 -complexity 10 -dtx original.raw encoded.raw
libopus 1.3-beta-33-g1b584467
Encoding 48000 Hz input at 32.000 kb/s in auto bandwidth with 960-sample frames.
average bitrate: 15.400 kb/s
maximum bitrate: 64.400 kb/s
active bitrate: 25.930 kb/s
bitrate standard deviation: 17.448 kb/s
The issue can easily be seen in the following spectrograms. The first spectrogram shows the original file with the encoded file below.
The reason for these clicks is a mismatch between two voice activity detectors: the Opus VAD and the Silk VAD.
The Opus VAD decides when to go into DTX. During DTX a packet is transmitted every 420 ms containing an update of the background noise. If the VAD in the Silk layer of the codec considers the signal to be active (type TYPE_UNVOICED or TYPE_VOICED instead of TYPE_NO_VOICE_ACTIVITY) the decoder will conceal the DTX region by using packet loss concealment (PLC) instead of pure comfort noise (CNG). This will case a noise burst every time a packet is decoded.
I have created two competing pull requests: #84 and #87 . The two PRs solve the issue slightly differently.
PR #84 avoids DTX when the Opus and Silk VADs do not agree.
Pros of #84:
Behaves similarly to lower complexity modes (where only the Silk VAD is used).
Cons of #84:
Results in higher bit-rates (in this example DTX is alomost not used at all).
Encoding with opus_demo from pull request 84:
./opus_demo_pr84 voip 48000 1 32000 -complexity 10 -dtx original.raw encoded_pr84.raw
libopus 1.3-beta-34-g2e635837
Encoding 48000 Hz input at 32.000 kb/s in auto bandwidth with 960-sample frames.
average bitrate: 31.296 kb/s
maximum bitrate: 64.400 kb/s
active bitrate: 33.355 kb/s
bitrate standard deviation: 6.355 kb/s
Spectrogram showing the file encoded with PR84
PR #87 passes the result of the Opus VAD to Silk. If the Opus VAD says no activity the maximum value of the Silk VAD is clamped to just below the activity threshold. The Silk encoder then produces a frame with type TYPE_NO_VOICE_ACTIVITY.
Pros of #87:
Does not alter the decision of when to enter DTX (same bit-rate as master)
./opus_demo_pr87 voip 48000 1 32000 -complexity 10 -dtx original.raw encoded_pr87.raw
libopus 1.3-beta-34-gdbc27362
Encoding 48000 Hz input at 32.000 kb/s in auto bandwidth with 960-sample frames.
average bitrate: 15.386 kb/s
maximum bitrate: 64.400 kb/s
active bitrate: 25.941 kb/s
bitrate standard deviation: 17.431 kb/s
Conclusion:
Both pull requests solves the issue, but the DTX behavior of #87 is more similar to the current DTX behavior. I suggest that #87 is merged to master and #84 is withdrawn.
The text was updated successfully, but these errors were encountered:
In calls with DTX enabled there is sometimes an issue with periodic "noise bursts" during the DTX period. The issue appears when the background noise is slighty unstationary. In my experience the problem is common and can be triggered by noise from ventilation etc.
To illustrate the problem I have encoded a 10 seconds long file containing speech and slowly increasing noise.
I encoded the file with opus_demo from master as of May 8th 2018 (commit 1b58446). I used DTX and the highest complexity mode.
The issue can easily be seen in the following spectrograms. The first spectrogram shows the original file with the encoded file below.
![original](https://user-images.githubusercontent.com/5707617/39763965-c8badb08-52de-11e8-9574-c8b0f8b4d73f.png)
![encoded](https://user-images.githubusercontent.com/5707617/39763978-d0654320-52de-11e8-8096-290b6d9bcc15.png)
The reason for these clicks is a mismatch between two voice activity detectors: the Opus VAD and the Silk VAD.
The Opus VAD decides when to go into DTX. During DTX a packet is transmitted every 420 ms containing an update of the background noise. If the VAD in the Silk layer of the codec considers the signal to be active (type
TYPE_UNVOICED
orTYPE_VOICED
instead ofTYPE_NO_VOICE_ACTIVITY
) the decoder will conceal the DTX region by using packet loss concealment (PLC) instead of pure comfort noise (CNG). This will case a noise burst every time a packet is decoded.I have created two competing pull requests: #84 and #87 . The two PRs solve the issue slightly differently.
PR #84 avoids DTX when the Opus and Silk VADs do not agree.
Pros of #84:
Behaves similarly to lower complexity modes (where only the Silk VAD is used).
Cons of #84:
Results in higher bit-rates (in this example DTX is alomost not used at all).
Encoding with opus_demo from pull request 84:
Spectrogram showing the file encoded with PR84
![encoded_pr84](https://user-images.githubusercontent.com/5707617/39764012-e5b31f54-52de-11e8-9e43-cbdac649514a.png)
PR #87 passes the result of the Opus VAD to Silk. If the Opus VAD says no activity the maximum value of the Silk VAD is clamped to just below the activity threshold. The Silk encoder then produces a frame with type
TYPE_NO_VOICE_ACTIVITY
.Pros of #87:
Does not alter the decision of when to enter DTX (same bit-rate as master)
Cons of #87:
Slightly more code than #84.
Encoding with opus_demo from pull request 87:
Spectrogram showing the file encoded with PR87
![encoded_pr87](https://user-images.githubusercontent.com/5707617/39764030-f2f8798e-52de-11e8-8ec8-70247e892c5c.png)
I made all audio files available here (raw and wav formats):
https://drive.google.com/drive/folders/1wY-_yz5I44QTccmV0lFohTWHVIGH2Vqx
Conclusion:
Both pull requests solves the issue, but the DTX behavior of #87 is more similar to the current DTX behavior. I suggest that #87 is merged to master and #84 is withdrawn.
The text was updated successfully, but these errors were encountered: