Training time #38

cs20162004 · 2021-08-17T06:19:15Z

Hello and thank you for your great work!

You trained ESRNET for 1,000K and ESRGAN for 400K iterations. I was wondering how long did training take in your case with 4 V100 GPU?
I am training with 2 RTX 3090 GPU and training only ESRNET shows 10days 😕 . My training dataset includes FFHQ dataset also (i.e. DIV2K+Flickr2K+FFHQ). Maybe training on FFHQ improves human face result.
Thank you.

tg-bomze · 2021-08-17T17:15:56Z

@cs20162004 Don't forget resize all images to 400x400 manually

n00mkrad · 2021-08-17T20:52:54Z

Train with this instead, it has tons of advanced options, auto-cropping, etc:

https://github.com/victorca25/traiNNer

cs20162004 · 2021-08-18T01:38:34Z

@tg-bomze why do I need to resize images to 400x400 manually?

xinntao · 2021-08-18T01:44:24Z

@cs20162004

It takes about 6-7 days for training RealESRNet; and 4-5 days for RealESRGAN.
You can directly finetune from the pretrained RealESRGAN with fewer iterations (I think 100k ~200k, you can see the difference). There is no need to train from scratch.
Including more face images will improve its ability to restore faces. For now, I recommend using together with GFPGAN, here is the script: https://github.com/TencentARC/GFPGAN/blob/master/inference_gfpgan.py I will also integrate GFPGAN to Real-ESRGAN.
@tg-bomze I think there is no need to resize all images to 400x400
@n00mkrad Thanks for the information. I will also improve Real-ESRGAN for easier use.

If you still have training issues, please let me know.
I will later improve the repo for more handy training and fine-tuning. 😄

cs20162004 · 2021-08-18T04:04:27Z

@xinntao
Thank you for your detailed answer!
Could you please explain what do you exactly mean by integrating with GFPGAN? Do you mean: run the input image on both networks (first on Real_ESRGAN and then on GFPGAN or vice versa)?

cs20162004 · 2021-08-18T11:47:14Z

GFPGAN network by default uses Real-ESRGAN on regions that don't contain human face (using detection algorithm maybe). But for some images containing face, the generated face image looks unnatural. Like the following:

@xinntao do you have any other idea to improve Real_ESRGAN for human face?

xinntao · 2021-08-19T02:10:46Z

@xinntao
Thank you for your detailed answer!
Could you please explain what do you exactly mean by integrating with GFPGAN? Do you mean: run the input image on both networks (first on Real_ESRGAN and then on GFPGAN or vice versa)?

Yes, your understanding is right~

xinntao · 2021-08-19T02:19:58Z

GFPGAN network by default uses Real-ESRGAN on regions that don't contain human face (using detection algorithm maybe). But for some images containing face, the generated face image looks unnatural. Like the following:

@xinntao do you have any other idea to improve Real_ESRGAN for human face?

These failures are limitations of GFPGAN.
Training with human faces will improve Real-ESRGAN performance on faces.
Another way is to improve the GFPGAN performance.

We also want to improve Real-ESRGAN's performance on human faces by utilizing more face data.
I think you can also contribute to Real-ESRGAN, if you want or obtain better results 😄

BTW, could you please share with me the original faces that GFPGAN failed in your examples? (Email: xintao.wang@outlook.com)

cs20162004 · 2021-08-19T03:23:50Z

Sure.
The pretrained model you shared (RealESRGAN_x4plus.pth) contains only the Generator's weights I guess. If I want to use your pretraining I will also need the Discriminator's weights.
Could you please share it if you have?

xinntao · 2021-08-19T03:31:35Z

@cs20162004
Sure, I will release the Discriminator.

Nidish96 · 2021-12-01T12:49:14Z

@xinntao Thanks a lot for this repository!
I've been trying to train the RealESRGAN with my own image dataset for a very specific application. Is there any way for me to check if iterations are progressing? I ran it and it threw out a bunch of text and here's the last few lines :

2021-12-01 07:02:47,942 INFO: Loading UNetDiscriminatorSN model from experiments/pretrained_models/RealESRGAN_x4plus_netD.pth, with param key: [params].
2021-12-01 07:02:47,964 INFO: Loss [L1Loss] is created.
2021-12-01 07:02:49,612 INFO: Loss [PerceptualLoss] is created.
2021-12-01 07:02:49,648 INFO: Loss [GANLoss] is created.
2021-12-01 07:02:49,678 INFO: Model [RealESRGANModel] is created.
2021-12-01 07:02:50,093 INFO: Start training from epoch: 0, iter: 0

It has just been like this for the past 5-6 hours. Will the text on the screen progress further?
I checked the directory in "experiments" that it created and it has a log file that has exactly the above (which it returned to stdout). This directory also has three sub-directories (models, training_states, visualization), all of which are completely empty.

I am using the "finetune_realesrgan_x4plus.yml" file , making modifications to point to my data directory. I'm running it with 4 GPUs (Tesla P100-SXM2).

Please let me know if there's anything I might be doing wrong.
Thank you.

Doris1887 · 2021-12-04T10:46:25Z

@Nidish96 Hey I met the same problem, please tell me how it goes if you get any solution

Nidish96 · 2021-12-07T19:48:09Z

@Doris1887 I haven't solved it, but I found something. In "train.py" in basicsr, the iterations start in line 154, where the training data (in variable "train_data") is invoked through "prefetcher.next()". This always seems to be "None" and I don't understand why. I've checked the path of the dataset, etc..

zoezhou1999 · 2022-06-16T17:39:24Z

Hi @xinntao could I ask the license of ESRGAN_SRx4_DF2KOST_official-ff704c30.pth model?
Thank you!

cliffordkleinsr · 2022-08-04T09:54:49Z

lower your GPU batch size
then restart your environment

Ncssmhcm · 2022-08-05T12:01:54Z

unsubscribe

…

---Original--- From: "Cliff ***@***.***> Date: Thu, Aug 4, 2022 17:55 PM To: ***@***.***>; Cc: ***@***.***>; Subject: Re: [xinntao/Real-ESRGAN] Training time (#38) lower your GPU batch size then restart your environment — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

cliffordkleinsr · 2022-08-05T12:08:52Z

Meaning?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training time #38

Training time #38

cs20162004 commented Aug 17, 2021

tg-bomze commented Aug 17, 2021

n00mkrad commented Aug 17, 2021

cs20162004 commented Aug 18, 2021

xinntao commented Aug 18, 2021

cs20162004 commented Aug 18, 2021

cs20162004 commented Aug 18, 2021

xinntao commented Aug 19, 2021

xinntao commented Aug 19, 2021

cs20162004 commented Aug 19, 2021

xinntao commented Aug 19, 2021

Nidish96 commented Dec 1, 2021

Doris1887 commented Dec 4, 2021

Nidish96 commented Dec 7, 2021

zoezhou1999 commented Jun 16, 2022

cliffordkleinsr commented Aug 4, 2022

Ncssmhcm commented Aug 5, 2022 via email

cliffordkleinsr commented Aug 5, 2022

Training time #38

Training time #38

Comments

cs20162004 commented Aug 17, 2021

tg-bomze commented Aug 17, 2021

n00mkrad commented Aug 17, 2021

cs20162004 commented Aug 18, 2021

xinntao commented Aug 18, 2021

cs20162004 commented Aug 18, 2021

cs20162004 commented Aug 18, 2021

xinntao commented Aug 19, 2021

xinntao commented Aug 19, 2021

cs20162004 commented Aug 19, 2021

xinntao commented Aug 19, 2021

Nidish96 commented Dec 1, 2021

Doris1887 commented Dec 4, 2021

Nidish96 commented Dec 7, 2021

zoezhou1999 commented Jun 16, 2022

cliffordkleinsr commented Aug 4, 2022

Ncssmhcm commented Aug 5, 2022 via email

cliffordkleinsr commented Aug 5, 2022