Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time #38

Open
cs20162004 opened this issue Aug 17, 2021 · 17 comments
Open

Training time #38

cs20162004 opened this issue Aug 17, 2021 · 17 comments

Comments

@cs20162004
Copy link

Hello and thank you for your great work!

You trained ESRNET for 1,000K and ESRGAN for 400K iterations. I was wondering how long did training take in your case with 4 V100 GPU?
I am training with 2 RTX 3090 GPU and training only ESRNET shows 10days 😕 . My training dataset includes FFHQ dataset also (i.e. DIV2K+Flickr2K+FFHQ). Maybe training on FFHQ improves human face result.
Thank you.

@tg-bomze
Copy link

@cs20162004 Don't forget resize all images to 400x400 manually

@n00mkrad
Copy link

Train with this instead, it has tons of advanced options, auto-cropping, etc:

https://github.com/victorca25/traiNNer

@cs20162004
Copy link
Author

@tg-bomze why do I need to resize images to 400x400 manually?

@xinntao
Copy link
Owner

xinntao commented Aug 18, 2021

@cs20162004

  1. It takes about 6-7 days for training RealESRNet; and 4-5 days for RealESRGAN.

  2. You can directly finetune from the pretrained RealESRGAN with fewer iterations (I think 100k ~200k, you can see the difference). There is no need to train from scratch.

  3. Including more face images will improve its ability to restore faces. For now, I recommend using together with GFPGAN, here is the script: https://github.com/TencentARC/GFPGAN/blob/master/inference_gfpgan.py I will also integrate GFPGAN to Real-ESRGAN.

  4. @tg-bomze I think there is no need to resize all images to 400x400

  5. @n00mkrad Thanks for the information. I will also improve Real-ESRGAN for easier use.

If you still have training issues, please let me know.
I will later improve the repo for more handy training and fine-tuning. 😄

@cs20162004
Copy link
Author

@xinntao
Thank you for your detailed answer!
Could you please explain what do you exactly mean by integrating with GFPGAN? Do you mean: run the input image on both networks (first on Real_ESRGAN and then on GFPGAN or vice versa)?

@cs20162004
Copy link
Author

GFPGAN network by default uses Real-ESRGAN on regions that don't contain human face (using detection algorithm maybe). But for some images containing face, the generated face image looks unnatural. Like the following:
0813_02
0813_00

@xinntao do you have any other idea to improve Real_ESRGAN for human face?

@xinntao
Copy link
Owner

xinntao commented Aug 19, 2021

@xinntao
Thank you for your detailed answer!
Could you please explain what do you exactly mean by integrating with GFPGAN? Do you mean: run the input image on both networks (first on Real_ESRGAN and then on GFPGAN or vice versa)?

Yes, your understanding is right~

@xinntao
Copy link
Owner

xinntao commented Aug 19, 2021

GFPGAN network by default uses Real-ESRGAN on regions that don't contain human face (using detection algorithm maybe). But for some images containing face, the generated face image looks unnatural. Like the following:

@xinntao do you have any other idea to improve Real_ESRGAN for human face?

These failures are limitations of GFPGAN.
Training with human faces will improve Real-ESRGAN performance on faces.
Another way is to improve the GFPGAN performance.

We also want to improve Real-ESRGAN's performance on human faces by utilizing more face data.
I think you can also contribute to Real-ESRGAN, if you want or obtain better results 😄

BTW, could you please share with me the original faces that GFPGAN failed in your examples? (Email: xintao.wang@outlook.com)

@cs20162004
Copy link
Author

Sure.
The pretrained model you shared (RealESRGAN_x4plus.pth) contains only the Generator's weights I guess. If I want to use your pretraining I will also need the Discriminator's weights.
Could you please share it if you have?

@xinntao
Copy link
Owner

xinntao commented Aug 19, 2021

@cs20162004
Sure, I will release the Discriminator.

@Nidish96
Copy link

Nidish96 commented Dec 1, 2021

@xinntao Thanks a lot for this repository!
I've been trying to train the RealESRGAN with my own image dataset for a very specific application. Is there any way for me to check if iterations are progressing? I ran it and it threw out a bunch of text and here's the last few lines :

2021-12-01 07:02:47,942 INFO: Loading UNetDiscriminatorSN model from experiments/pretrained_models/RealESRGAN_x4plus_netD.pth, with param key: [params].
2021-12-01 07:02:47,964 INFO: Loss [L1Loss] is created.
2021-12-01 07:02:49,612 INFO: Loss [PerceptualLoss] is created.
2021-12-01 07:02:49,648 INFO: Loss [GANLoss] is created.
2021-12-01 07:02:49,678 INFO: Model [RealESRGANModel] is created.
2021-12-01 07:02:50,093 INFO: Start training from epoch: 0, iter: 0

It has just been like this for the past 5-6 hours. Will the text on the screen progress further?
I checked the directory in "experiments" that it created and it has a log file that has exactly the above (which it returned to stdout). This directory also has three sub-directories (models, training_states, visualization), all of which are completely empty.

I am using the "finetune_realesrgan_x4plus.yml" file , making modifications to point to my data directory. I'm running it with 4 GPUs (Tesla P100-SXM2).

Please let me know if there's anything I might be doing wrong.
Thank you.

@Doris1887
Copy link

@Nidish96 Hey I met the same problem, please tell me how it goes if you get any solution

@Nidish96
Copy link

Nidish96 commented Dec 7, 2021

@Doris1887 I haven't solved it, but I found something. In "train.py" in basicsr, the iterations start in line 154, where the training data (in variable "train_data") is invoked through "prefetcher.next()". This always seems to be "None" and I don't understand why. I've checked the path of the dataset, etc..

@zoezhou1999
Copy link

Hi @xinntao could I ask the license of ESRGAN_SRx4_DF2KOST_official-ff704c30.pth model?
Thank you!

@cliffordkleinsr
Copy link

lower your GPU batch size
then restart your environment

@Ncssmhcm
Copy link

Ncssmhcm commented Aug 5, 2022 via email

@cliffordkleinsr
Copy link

Meaning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants