Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected continuous noise when combining NoLACE with DTX #351

Open
j-schultz opened this issue Jun 13, 2024 · 9 comments
Open

Unexpected continuous noise when combining NoLACE with DTX #351

j-schultz opened this issue Jun 13, 2024 · 9 comments

Comments

@j-schultz
Copy link

Are NoLACE and DTX supposed to be usable together? I have observed that when turning on NoLACE, as soon as the stream switches to DTX mode, the decoded 0/1-byte packets start generating some noise which sounds like it could be the tail of the last word that was said.

Here's an example of this noise (normalized to make it obvious): opus dtx+nolace.zip

I suppose we could simply mute the output of the decoder if we know we're in DTX mode, but this smells like a bug to me.

@janpbuethe
Copy link

Thanks for reporting this @j-schultz. This looks indeed like a bug though it's more likely related to neural PLC (DTX is handled by the PLC module and NoLACE is not active in this case). I tried a few files myself but could not reproduce the issue. Could you share an input file that triggers it? It would also be interesting to know whether the problem is present with dec_complexity = 5 (i.e. neural PLC active and enhancement inactive).

@j-schultz
Copy link
Author

It does happen with both decoding complexity 5 and 7. I'll see if I can get a minimal example put together - as we are streaming live audio with raw opus frames between clients, I'm not sure how comparable this is to using the file-based opus demo.

I also checked whether different encoding parameters could influence the result...

  • Both application type OPUS_APPLICATION_VOIP and OPUS_APPLICATION_AUDIO expose the issue
  • Encoding complexity: Tried two different values (5 and 8), no difference
  • Does not matter if inband FEC is enabled or not

Apart from that, we force a frame duration: 20ms and obviously DTX is enabled.

@janpbuethe
Copy link

Thanks @j-schultz. In that case it's indeed rather neural PLC that's causing the issue (looping @jmvalin in). There could be many reasons for this to happen (DTX triggered during active speech, feature prediction going wrong in neural PLC, missing buffer update etc.) so it's crucial to find a file that triggers it.

Apart from this, we should probably revise DTX handling at the decoder in general. Handling it with neural PLC means that we run a relatively expensive neural vocoder to generate silence, which is quite wasteful. I will kick of this discussion in https://www.irccloud.com/irc/libera.chat/channel/opus

What you could try as a temporary fix is to set dec_complexity to 0 during DTX and back to 7 once the first active frame is received. That should solve the noise problem and would also save you some complexity.

@j-schultz
Copy link
Author

Thanks for the suggestion, I applied the temporary workaround and that does seem to do the trick for now.

@j-schultz
Copy link
Author

Actually I might have spoken too soon, while the (incorrect) work of the PLC can no longer be heard with this change, I still get some faint clicking sound every 400ms even though the source signal is 100% digital silence. So I think I'll wait for a proper fix before turning on NoLACE.

@jmvalin
Copy link
Member

jmvalin commented Jun 17, 2024

When in DTX mode, the encoder will send a "refresh" (or keepalive) packet every 400 ms to update the decoder noise estimate. Maybe that's what causing the issue. Are you also setting dec_complexity to 0 on that one?

@j-schultz
Copy link
Author

j-schultz commented Jun 17, 2024

For testing I set the complexity to 7 for every successfully received packet and to 0 for any missing packet. So the first packet of the DTX interval still has a complexity of 7. I will change this so that if the packet indicates the start of a DTX phase, it will already reduce the complexity to 0.

Edit: That did the trick.

@jmvalin
Copy link
Member

jmvalin commented Jun 24, 2024

Is there a file and exact command line I can use to reproduce the problem?

@j-schultz
Copy link
Author

Here's a RAW sample file, together with the decoded result that I receive: sample.zip

Encoding command line: opus_demo.exe -e voip 48000 1 25000 -complexity 8 -dtx -framesize 20 withsilence.raw withsilence.opus
Decoding commandline: opus_demo.exe -d 48000 1 -dec_complexity 7 withsilence.opus withsilence.decoded.raw

Opus has been built with the following CMake configuration: cmake -DOPUS_BUILD_PROGRAMS=ON -DOPUS_DEEP_PLC=ON -DOPUS_DRED=ON -DOPUS_OSCE=ON -DOPUS_DNN=ON -DBUILD_SHARED_LIBS=OFF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants