Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training stuck at scale 9:[1999/2000] #19

Closed
phonygene opened this issue Nov 4, 2019 · 8 comments
Closed

training stuck at scale 9:[1999/2000] #19

phonygene opened this issue Nov 4, 2019 · 8 comments

Comments

@phonygene
Copy link

phonygene commented Nov 4, 2019

Training constantly stuck at [1999/2000]
such as "scale 7:[1999/2000]" or "scale 9:[1999/2000]".

Can't interrupt it even if I used ctrl+c, it's totally dead.

I used a mountain picture, resized to same size as one of your sample image.

I'm using :
python 3.6.8
torch 1.3.0

GPU rtx2080ti
NVIDIA Driver 419.35
CUDA 10.1

@sno6
Copy link

sno6 commented Nov 4, 2019

Having the same issue running on Google Colab: seems to stall out at scale 8:[1999/2000]

@tamarott
Copy link
Owner

tamarott commented Nov 5, 2019

This seems to be a memory problem. When the number of scales is large, there are more model parameters to store.

@phonygene phonygene reopened this Nov 6, 2019
@phonygene
Copy link
Author

Sorry, I had fat-fingered.(clicked on Close button accidentally.)

I 've checked GPU memory usage while training.
It truly was nearly full loaded when the training stuck.

@tamarott
Do you have any suggestion ?
Is it possible to reduce the batch size or something for avoiding this ?
Or restart training from last checkpoint ?

I read your paper. There's an example of the starry night.
Seems like it goes well on scale 8.
But when I tried Random Samples on scale 8 , it just generated 50 images which are exactly the same as each other.

@JonathanFly
Copy link

JonathanFly commented Nov 6, 2019

With 16GB of GPU memory, the highest resolution output I have achieved is 667 x 413 from the main training script. Does that seem right? Would changing the aspect ratio let me squeeze more pixels into the model so I can also get more in the final random samples?

@phonygene
Copy link
Author

phonygene commented Nov 6, 2019

OH, it turned out that scale 0 just worked fine .
And as the scale increases, the differences between output images drop sharply.
The images generated by scale 1 have slightly shift-effect, and the images generated by scale 3 are almost the same.
So, at this rate, there's no need to train over scale 3 at all.

This is pretty amazing .
Thanks for sharing your elegant work.

@rickdotta
Copy link

@phonygene what do you mean by you dont need to train over scale 3? Is it possible to generate arbitrary sized images using just scale 3? How?

Thank you!

@phonygene
Copy link
Author

phonygene commented Nov 8, 2019

@rickdotta As I said : In my case, when training scale larger than scale 3 , it only generated identical images, so I tried scale 0 model and found out that it worked fine. I don't understand why it works so differently from the paper, but at least It saves me a lot of time ( troll face ) .

@xivh
Copy link

xivh commented May 28, 2020

@phonygene How do you stop training at a smaller scale?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants