Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于我的测试结果,不是很理想? #6

Open
zhanghongyong123456 opened this issue May 28, 2024 · 8 comments
Open

关于我的测试结果,不是很理想? #6

zhanghongyong123456 opened this issue May 28, 2024 · 8 comments

Comments

@zhanghongyong123456
Copy link

zhanghongyong123456 commented May 28, 2024

我的驱动视频:
预处理:python scripts/extract_kps_sequence_and_audio.py
--video_path "./test_samples/short_case/10/gt.mp4"
--kps_sequence_save_path "./test_samples/short_case/10/kps.pth"
--audio_save_path "./test_samples/short_case/10/aud.mp3"

001.mp4

我的参考图像:(截图 512x512)
003

我的结果:
运行脚本:
python inference.py
--reference_image_path "./test_samples/short_case/tys/ref.jpg"
--audio_path "./test_samples/short_case/tys/aud.mp3"
--kps_path "./test_samples/short_case/tys/kps.pth"
--output_path "./output/short_case/talk_tys_fix_face.mp4"
--retarget_strategy "fix_face"

talk_emotion.mp4

我不确定我哪里有问题,希望指点一下

@FurkanGozukara
Copy link

i planned to make a gradio app for this but this result looks very bad

@tiankuan93
Copy link
Collaborator

tiankuan93 commented May 28, 2024

  1. Our model is trained using English audio, and our audio feature extractor is also trained in English, so our model will perform more consistently on English audio for now. Other languages may yield some reasonable results, but it will require some experimentation with the parameters.
  2. For the mode of "fix_face", we provide parameters to adjust the effect of the audio. We also commit the default parameters in the new commit.
  • We get the same results if we use the reference_attention_weight=1.0 and audio_attention_weight=1.0 parameters.
  • We get results bellow if we use the reference_attention_weight=0.95 and audio_attention_weight=3.0 parameters.
test1_aud_result_0.95_3.0.mp4
  • If we crop the reference image more properly and use English audio, we get the following results.
test1_crop_aud1_result_0.95_3.0.mp4
test1_crop_aud_result_0.95_3.0.mp4

@FurkanGozukara
Copy link

@tiankuan93 thanks huge info
do you plan to make a gradio demo app? or we have to make ourselves

@tiankuan93
Copy link
Collaborator

tiankuan93 commented May 28, 2024

@tiankuan93 thanks huge info do you plan to make a gradio demo app? or we have to make ourselves

@FurkanGozukara We have no plans to implement a gradio demo app for now, thank you for your interest.

@FurkanGozukara
Copy link

@tiankuan93 thanks huge info do you plan to make a gradio demo app? or we have to make ourselves

@FurkanGozukara We have no plans to implement a gradio demo app for now, thank you for your interest.

Thanks then hopefully I will do myself and publish

I hope you don't change much with consistency Lora so I can implement that too

@tiankuan93
Copy link
Collaborator

@tiankuan93 thanks huge info do you plan to make a gradio demo app? or we have to make ourselves

@FurkanGozukara We have no plans to implement a gradio demo app for now, thank you for your interest.

Thanks then hopefully I will do myself and publish

I hope you don't change much with consistency Lora so I can implement that too

Consistency Lora only reduces the number of steps for inference and doesn't change much else.

@cantonalex
Copy link

cantonalex commented May 29, 2024

@zhanghongyong123456 what retarget strategy did you use for your first video? how did you get the output video uncropped to the reference face

EDIT: oh thats your source video. bummer..

@boboji21
Copy link

牛逼A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants