Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opus-ng deep plc seems to have a worse plc audio quality than lpcnet plc #306

Open
wumaster opened this issue Dec 14, 2023 · 23 comments
Open

Comments

@wumaster
Copy link

hi, I have tested neural plc using different nn model. opus-ng deep plc seems to have a worse plc audio quality than opus lpcnet plc. How can I increase the plc quality?

@jmvalin
Copy link
Member

jmvalin commented Dec 17, 2023

Can you explain a bit more here?

@wumaster
Copy link
Author

Can you explain a bit more here?

Hi, very happy to get your reply. I have a lot of interests on your work and want to use your plc and fec methods, and recently I am studying your code of opus-ng.
I have tested three plc methods: silk plc, lpcnet plc, fargan plc. It seems that fargan plc generates more artifacts than silk and lpcnet plc. And I also tested them using subjective tests and pesq score, the results showed that plc method using lpcnet has a better quality than others, and plc method using fargan sometimes gets a worse result than silk plc (more sound artifacts). I guess that the fargan has a worse audio quality than lpcnet as a vocoder.

@jmvalin
Copy link
Member

jmvalin commented Dec 18, 2023

Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting?

@wumaster
Copy link
Author

Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting?

Thank you a lot for the reply. Please wait a moment and let me prepare these materials.

@wumaster
Copy link
Author

Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting?

The lpcnet test branch: https://github.com/xiph/opus/tree/neural_plc. I have modified opus_demo.c to make it support lost file input. Base commit is 4e46ccd. The command line is: ./opus_demo voip 16000 1 64000 -use_lost_file -complexity 5 arctic_a0023_16k.pcm out_plc.pcm arctic_a0023_16k_is_lost.txt
The fargan test branch: https://github.com/xiph/opus/tree/opus-ng. Build option is ./configure --enable-deep-plc. Base commit is 591c8ba. The command line is: ./opus_demo voip 16000 1 64000 -lossfile arctic_a0023_16k_is_lost.txt -dec_complexity 10 -complexity 5 arctic_a0023_16k.pcm out_plc.pcm. I use enc complexity 5 to make sure only the silk encoder/decoder works.
The results shows that the fargan plc generates more audio data than silk or lpcnet plc, but it could generate more artifacts than other plc methods.
test_and_res_pcm.zip
In 2.0s/ 3.7s/ 5.5s... , the fargan plc generates more signals with pitch, but the original signal is not pitch signal. This I think we can solve it using silk plc instead of fargan plc when dealing with lost signal of TYPE_UNVOICED and TYPE_NO_VOICE_ACTIVITY.
In 3.0s, the fargan plc generates some artifacts, others would generate artifacts too. But the artifacts is easier to hear than other plc methods. This makes the plc method sometimes get worse subjective test scores.
We tested many files, it seems that above problems would also occur in other files.

@wumaster
Copy link
Author

I also tested the 2022 PLC challenge test database using clean signal and loss file. The results shows that lpcnet plc get a higher PLCmos score.

@jmvalin
Copy link
Member

jmvalin commented Dec 18, 2023

There's hundreds of changes between the two points you're comparing (not just switching from LPCNet to FARGAN). Are you able to narrow it down further?

@wumaster
Copy link
Author

There's hundreds of changes between the two points you're comparing (not just switching from LPCNet to FARGAN). Are you able to narrow it down further?

Sorry, I have been learning your code just for a short time, and for now I can't figure out the details between the two plc algorithms. I just tested your two plc algorithms, and the results just showed that the fargan plc sometimes get worse results both in PLCMOS and our subjective tests. Just a polite question, I would like to ask your research team's test results between the two plc algorithms.
Here is the clean speech, lostfile and plc results of plc challenge test data(54.wav), in the subjective tests, the fargan plc get worse results. The command line used is same as mentioned above.
plc-challenge-54.zip
Recently I'm trying to figure out what causes the differences.

@jmvalin
Copy link
Member

jmvalin commented Dec 19, 2023

I was just saying that if you have some time it may be useful to look at intermediate versions between the two you tested. There have been many more changes between the two, including a different pitch estimator, a smaller feature predictor, etc. In terms of objective results, we don't use PLCMOS as we've seen it to be unreliable in the past. I'll still see if I can find anything.

@wumaster
Copy link
Author

I was just saying that if you have some time it may be useful to look at intermediate versions between the two you tested. There have been many more changes between the two, including a different pitch estimator, a smaller feature predictor, etc. In terms of objective results, we don't use PLCMOS as we've seen it to be unreliable in the past. I'll still see if I can find anything.

OK, thanks a lot. I need to take more time to look into some details between the two. In my test, the fargan plc sometimes generate more artifacts (more harmonic noise) than silk or lpcnet plc. I think the decoder information such as signal type can help fargan to generate less artifacts.

@jmvalin
Copy link
Member

jmvalin commented Dec 19, 2023

If you want to see just the effect of FARGAN, you could test commit d1c5b32, which is just before FARGAN got added.

@wumaster
Copy link
Author

If you want to see just the effect of FARGAN, you could test commit d1c5b32, which is just before FARGAN got added.

Thanks a lot!

@mklingb
Copy link
Collaborator

mklingb commented Jan 11, 2024

I did some investigation and found some commits where I think there is regression. I just did subjective listening to the arctic_a0023_16k.pcm example. On the opus-ng branch, the original LPCNet PLC is at 4414db0.

First potential regression is seen at 2d98ced. I notice that some of the PLC includes a bit more pitched content mixed in. I think it actually sounds fine but it is a change. I didn't run PESQ or PLCMOS on this.

Next potential regression is f0ec990. Here there are some strange choices of pitch, and again the pitched (voiced) segments are louder.

All of these predate the changeover to FARGAN. There is an addition possible regression that happens somewhere between f0ec990 and 591c8ba, but I haven't tracked that down yet.

There were changes to the PLC predictor and pitch models prior to the switch to FARGAN, so we're going to be looking at these as well as other possible root causes.

@wumaster
Copy link
Author

I did some investigation and found some commits where I think there is regression. I just did subjective listening to the arctic_a0023_16k.pcm example. On the opus-ng branch, the original LPCNet PLC is at 4414db0.

First potential regression is seen at 2d98ced. I notice that some of the PLC includes a bit more pitched content mixed in. I think it actually sounds fine but it is a change. I didn't run PESQ or PLCMOS on this.

Next potential regression is f0ec990. Here there are some strange choices of pitch, and again the pitched (voiced) segments are louder.

All of these predate the changeover to FARGAN. There is an addition possible regression that happens somewhere between f0ec990 and 591c8ba, but I haven't tracked that down yet.

There were changes to the PLC predictor and pitch models prior to the switch to FARGAN, so we're going to be looking at these as well as other possible root causes.

thanks!

@jmvalin
Copy link
Member

jmvalin commented Jan 17, 2024

Still looking into this, but can you give the exp_plc_fix1 branch (commit c1b80a7) a try and let me know?

@wumaster
Copy link
Author

Still looking into this, but can you give the exp_plc_fix1 branch (commit c1b80a7) a try and let me know?

OK, I'm a little busy these days, I'll test it soon

@jmvalin
Copy link
Member

jmvalin commented Jan 22, 2024

Well, you can now compare to the latest commit on opus-ng, which has the changes from exp_plc_fix1 and more

@wumaster
Copy link
Author

wumaster commented Jan 22, 2024

I just test the new commit, it seems that the pitch-liked content decreased, but still has the problem.
test_and_res_pcm-1_22.zip
image
It seems that the network judged a wrong signal type, the lpcnet and silk plc get the correct signal type.
image

@zhangshengoo
Copy link

zhangshengoo commented Jan 25, 2024

May I inquire if there are any papers available that provide an introduction to FarGan?

@jmvalin
Copy link
Member

jmvalin commented Jan 25, 2024

There's no paper on FARGAN -- yet.

@jmvalin
Copy link
Member

jmvalin commented Jan 25, 2024

So one of the things in the new PLC that are known to be a bit worse is that for complexity reasons, the context is no longer updated when there's no loss, only the most recent history. You could still try increasing the size of that history buffer to make it more similar to the old behaviour. It's easy to do by editing the dnn/lpcnet_private.h file and changing this line:
#define PLC_BUF_SIZE ((CONT_VECTORS+5)*FRAME_SIZE)
You can change the "+5" into "+100" and see what happens.

@jmvalin
Copy link
Member

jmvalin commented Apr 1, 2024

Increased to +10 seems to fix other cases where I've seen problems. See if there's any issue now.

@zhangshengoo
Copy link

hi, I have a question about the details of the Fargan inference code. It seems that the output waveform does not center around the input features, which is different from the description in the LPCNet paper. I am wondering whether the input feature is centered on the frame when training, and if yes, will the mismatch affect inference performance?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants