opus-ng deep plc seems to have a worse plc audio quality than lpcnet plc #306

wumaster · 2023-12-14T12:53:38Z

hi, I have tested neural plc using different nn model. opus-ng deep plc seems to have a worse plc audio quality than opus lpcnet plc. How can I increase the plc quality?

jmvalin · 2023-12-17T23:16:56Z

Can you explain a bit more here?

wumaster · 2023-12-18T04:07:20Z

Can you explain a bit more here?

Hi, very happy to get your reply. I have a lot of interests on your work and want to use your plc and fec methods, and recently I am studying your code of opus-ng.
I have tested three plc methods: silk plc, lpcnet plc, fargan plc. It seems that fargan plc generates more artifacts than silk and lpcnet plc. And I also tested them using subjective tests and pesq score, the results showed that plc method using lpcnet has a better quality than others, and plc method using fargan sometimes gets a worse result than silk plc (more sound artifacts). I guess that the fargan has a worse audio quality than lpcnet as a vocoder.

jmvalin · 2023-12-18T05:36:16Z

Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting?

wumaster · 2023-12-18T06:32:44Z

Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting?

Thank you a lot for the reply. Please wait a moment and let me prepare these materials.

wumaster · 2023-12-18T08:36:55Z

Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting?

The lpcnet test branch: https://github.com/xiph/opus/tree/neural_plc. I have modified opus_demo.c to make it support lost file input. Base commit is 4e46ccd. The command line is: ./opus_demo voip 16000 1 64000 -use_lost_file -complexity 5 arctic_a0023_16k.pcm out_plc.pcm arctic_a0023_16k_is_lost.txt
The fargan test branch: https://github.com/xiph/opus/tree/opus-ng. Build option is ./configure --enable-deep-plc. Base commit is 591c8ba. The command line is: ./opus_demo voip 16000 1 64000 -lossfile arctic_a0023_16k_is_lost.txt -dec_complexity 10 -complexity 5 arctic_a0023_16k.pcm out_plc.pcm. I use enc complexity 5 to make sure only the silk encoder/decoder works.
The results shows that the fargan plc generates more audio data than silk or lpcnet plc, but it could generate more artifacts than other plc methods.
test_and_res_pcm.zip
In 2.0s/ 3.7s/ 5.5s... , the fargan plc generates more signals with pitch, but the original signal is not pitch signal. This I think we can solve it using silk plc instead of fargan plc when dealing with lost signal of TYPE_UNVOICED and TYPE_NO_VOICE_ACTIVITY.
In 3.0s, the fargan plc generates some artifacts, others would generate artifacts too. But the artifacts is easier to hear than other plc methods. This makes the plc method sometimes get worse subjective test scores.
We tested many files, it seems that above problems would also occur in other files.

wumaster · 2023-12-18T08:51:16Z

I also tested the 2022 PLC challenge test database using clean signal and loss file. The results shows that lpcnet plc get a higher PLCmos score.

jmvalin · 2023-12-18T19:19:53Z

There's hundreds of changes between the two points you're comparing (not just switching from LPCNet to FARGAN). Are you able to narrow it down further?

wumaster · 2023-12-19T03:13:23Z

There's hundreds of changes between the two points you're comparing (not just switching from LPCNet to FARGAN). Are you able to narrow it down further?

Sorry, I have been learning your code just for a short time, and for now I can't figure out the details between the two plc algorithms. I just tested your two plc algorithms, and the results just showed that the fargan plc sometimes get worse results both in PLCMOS and our subjective tests. Just a polite question, I would like to ask your research team's test results between the two plc algorithms.
Here is the clean speech, lostfile and plc results of plc challenge test data(54.wav), in the subjective tests, the fargan plc get worse results. The command line used is same as mentioned above.
plc-challenge-54.zip
Recently I'm trying to figure out what causes the differences.

jmvalin · 2023-12-19T08:25:57Z

I was just saying that if you have some time it may be useful to look at intermediate versions between the two you tested. There have been many more changes between the two, including a different pitch estimator, a smaller feature predictor, etc. In terms of objective results, we don't use PLCMOS as we've seen it to be unreliable in the past. I'll still see if I can find anything.

wumaster · 2023-12-19T09:29:16Z

I was just saying that if you have some time it may be useful to look at intermediate versions between the two you tested. There have been many more changes between the two, including a different pitch estimator, a smaller feature predictor, etc. In terms of objective results, we don't use PLCMOS as we've seen it to be unreliable in the past. I'll still see if I can find anything.

OK, thanks a lot. I need to take more time to look into some details between the two. In my test, the fargan plc sometimes generate more artifacts (more harmonic noise) than silk or lpcnet plc. I think the decoder information such as signal type can help fargan to generate less artifacts.

jmvalin · 2023-12-19T21:46:58Z

If you want to see just the effect of FARGAN, you could test commit d1c5b32, which is just before FARGAN got added.

wumaster · 2023-12-20T03:53:35Z

If you want to see just the effect of FARGAN, you could test commit d1c5b32, which is just before FARGAN got added.

Thanks a lot!

mklingb · 2024-01-11T00:08:51Z

I did some investigation and found some commits where I think there is regression. I just did subjective listening to the arctic_a0023_16k.pcm example. On the opus-ng branch, the original LPCNet PLC is at 4414db0.

First potential regression is seen at 2d98ced. I notice that some of the PLC includes a bit more pitched content mixed in. I think it actually sounds fine but it is a change. I didn't run PESQ or PLCMOS on this.

Next potential regression is f0ec990. Here there are some strange choices of pitch, and again the pitched (voiced) segments are louder.

All of these predate the changeover to FARGAN. There is an addition possible regression that happens somewhere between f0ec990 and 591c8ba, but I haven't tracked that down yet.

There were changes to the PLC predictor and pitch models prior to the switch to FARGAN, so we're going to be looking at these as well as other possible root causes.

wumaster · 2024-01-11T11:00:02Z

I did some investigation and found some commits where I think there is regression. I just did subjective listening to the arctic_a0023_16k.pcm example. On the opus-ng branch, the original LPCNet PLC is at 4414db0.

First potential regression is seen at 2d98ced. I notice that some of the PLC includes a bit more pitched content mixed in. I think it actually sounds fine but it is a change. I didn't run PESQ or PLCMOS on this.

Next potential regression is f0ec990. Here there are some strange choices of pitch, and again the pitched (voiced) segments are louder.

All of these predate the changeover to FARGAN. There is an addition possible regression that happens somewhere between f0ec990 and 591c8ba, but I haven't tracked that down yet.

There were changes to the PLC predictor and pitch models prior to the switch to FARGAN, so we're going to be looking at these as well as other possible root causes.

thanks!

jmvalin · 2024-01-17T07:20:35Z

Still looking into this, but can you give the exp_plc_fix1 branch (commit c1b80a7) a try and let me know?

wumaster · 2024-01-22T08:49:46Z

Still looking into this, but can you give the exp_plc_fix1 branch (commit c1b80a7) a try and let me know?

OK, I'm a little busy these days, I'll test it soon

jmvalin · 2024-01-22T08:58:47Z

Well, you can now compare to the latest commit on opus-ng, which has the changes from exp_plc_fix1 and more

wumaster · 2024-01-22T09:42:39Z

I just test the new commit, it seems that the pitch-liked content decreased, but still has the problem.
test_and_res_pcm-1_22.zip

It seems that the network judged a wrong signal type, the lpcnet and silk plc get the correct signal type.

zhangshengoo · 2024-01-25T11:20:40Z

May I inquire if there are any papers available that provide an introduction to FarGan?

jmvalin · 2024-01-25T19:53:53Z

There's no paper on FARGAN -- yet.

jmvalin · 2024-01-25T19:57:37Z

So one of the things in the new PLC that are known to be a bit worse is that for complexity reasons, the context is no longer updated when there's no loss, only the most recent history. You could still try increasing the size of that history buffer to make it more similar to the old behaviour. It's easy to do by editing the dnn/lpcnet_private.h file and changing this line:
#define PLC_BUF_SIZE ((CONT_VECTORS+5)*FRAME_SIZE)
You can change the "+5" into "+100" and see what happens.

jmvalin · 2024-04-01T08:07:06Z

Increased to +10 seems to fix other cases where I've seen problems. See if there's any issue now.

zhangshengoo · 2024-04-26T09:38:49Z

hi, I have a question about the details of the Fargan inference code. It seems that the output waveform does not center around the input features, which is different from the description in the LPCNet paper. I am wondering whether the input feature is centered on the frame when training, and if yes, will the mismatch affect inference performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opus-ng deep plc seems to have a worse plc audio quality than lpcnet plc #306

opus-ng deep plc seems to have a worse plc audio quality than lpcnet plc #306

wumaster commented Dec 14, 2023

jmvalin commented Dec 17, 2023

wumaster commented Dec 18, 2023

jmvalin commented Dec 18, 2023

wumaster commented Dec 18, 2023

wumaster commented Dec 18, 2023

wumaster commented Dec 18, 2023

jmvalin commented Dec 18, 2023

wumaster commented Dec 19, 2023

jmvalin commented Dec 19, 2023

wumaster commented Dec 19, 2023

jmvalin commented Dec 19, 2023

wumaster commented Dec 20, 2023

mklingb commented Jan 11, 2024

wumaster commented Jan 11, 2024

jmvalin commented Jan 17, 2024

wumaster commented Jan 22, 2024

jmvalin commented Jan 22, 2024

wumaster commented Jan 22, 2024 •

edited

Loading

zhangshengoo commented Jan 25, 2024 •

edited

Loading

jmvalin commented Jan 25, 2024

jmvalin commented Jan 25, 2024

jmvalin commented Apr 1, 2024

zhangshengoo commented Apr 26, 2024

opus-ng deep plc seems to have a worse plc audio quality than lpcnet plc #306

opus-ng deep plc seems to have a worse plc audio quality than lpcnet plc #306

Comments

wumaster commented Dec 14, 2023

jmvalin commented Dec 17, 2023

wumaster commented Dec 18, 2023

jmvalin commented Dec 18, 2023

wumaster commented Dec 18, 2023

wumaster commented Dec 18, 2023

wumaster commented Dec 18, 2023

jmvalin commented Dec 18, 2023

wumaster commented Dec 19, 2023

jmvalin commented Dec 19, 2023

wumaster commented Dec 19, 2023

jmvalin commented Dec 19, 2023

wumaster commented Dec 20, 2023

mklingb commented Jan 11, 2024

wumaster commented Jan 11, 2024

jmvalin commented Jan 17, 2024

wumaster commented Jan 22, 2024

jmvalin commented Jan 22, 2024

wumaster commented Jan 22, 2024 • edited Loading

zhangshengoo commented Jan 25, 2024 • edited Loading

jmvalin commented Jan 25, 2024

jmvalin commented Jan 25, 2024

jmvalin commented Apr 1, 2024

zhangshengoo commented Apr 26, 2024

wumaster commented Jan 22, 2024 •

edited

Loading

zhangshengoo commented Jan 25, 2024 •

edited

Loading