-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opus-ng deep plc seems to have a worse plc audio quality than lpcnet plc #306
Comments
Can you explain a bit more here? |
Hi, very happy to get your reply. I have a lot of interests on your work and want to use your plc and fec methods, and recently I am studying your code of opus-ng. |
Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting? |
Thank you a lot for the reply. Please wait a moment and let me prepare these materials. |
The lpcnet test branch: https://github.com/xiph/opus/tree/neural_plc. I have modified opus_demo.c to make it support lost file input. Base commit is 4e46ccd. The command line is: |
I also tested the 2022 PLC challenge test database using clean signal and loss file. The results shows that lpcnet plc get a higher PLCmos score. |
There's hundreds of changes between the two points you're comparing (not just switching from LPCNet to FARGAN). Are you able to narrow it down further? |
Sorry, I have been learning your code just for a short time, and for now I can't figure out the details between the two plc algorithms. I just tested your two plc algorithms, and the results just showed that the fargan plc sometimes get worse results both in PLCMOS and our subjective tests. Just a polite question, I would like to ask your research team's test results between the two plc algorithms. |
I was just saying that if you have some time it may be useful to look at intermediate versions between the two you tested. There have been many more changes between the two, including a different pitch estimator, a smaller feature predictor, etc. In terms of objective results, we don't use PLCMOS as we've seen it to be unreliable in the past. I'll still see if I can find anything. |
OK, thanks a lot. I need to take more time to look into some details between the two. In my test, the fargan plc sometimes generate more artifacts (more harmonic noise) than silk or lpcnet plc. I think the decoder information such as signal type can help fargan to generate less artifacts. |
If you want to see just the effect of FARGAN, you could test commit d1c5b32, which is just before FARGAN got added. |
Thanks a lot! |
I did some investigation and found some commits where I think there is regression. I just did subjective listening to the First potential regression is seen at 2d98ced. I notice that some of the PLC includes a bit more pitched content mixed in. I think it actually sounds fine but it is a change. I didn't run PESQ or PLCMOS on this. Next potential regression is f0ec990. Here there are some strange choices of pitch, and again the pitched (voiced) segments are louder. All of these predate the changeover to FARGAN. There is an addition possible regression that happens somewhere between f0ec990 and 591c8ba, but I haven't tracked that down yet. There were changes to the PLC predictor and pitch models prior to the switch to FARGAN, so we're going to be looking at these as well as other possible root causes. |
thanks! |
Still looking into this, but can you give the exp_plc_fix1 branch (commit c1b80a7) a try and let me know? |
OK, I'm a little busy these days, I'll test it soon |
Well, you can now compare to the latest commit on opus-ng, which has the changes from exp_plc_fix1 and more |
I just test the new commit, it seems that the pitch-liked content decreased, but still has the problem. |
May I inquire if there are any papers available that provide an introduction to FarGan? |
There's no paper on FARGAN -- yet. |
So one of the things in the new PLC that are known to be a bit worse is that for complexity reasons, the context is no longer updated when there's no loss, only the most recent history. You could still try increasing the size of that history buffer to make it more similar to the old behaviour. It's easy to do by editing the dnn/lpcnet_private.h file and changing this line: |
Increased to +10 seems to fix other cases where I've seen problems. See if there's any issue now. |
hi, I have a question about the details of the Fargan inference code. It seems that the output waveform does not center around the input features, which is different from the description in the LPCNet paper. I am wondering whether the input feature is centered on the frame when training, and if yes, will the mismatch affect inference performance? |
hi, I have tested neural plc using different nn model. opus-ng deep plc seems to have a worse plc audio quality than opus lpcnet plc. How can I increase the plc quality?
The text was updated successfully, but these errors were encountered: