Hypernetwork Style Training, a tiny guide #2670
Replies: 72 comments 381 replies
-
I find that my hypernetworks are starting to cook on 5e-6 somewhere after ~17k steps, so that might be a good stopping point. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the guide! I find that hypernetworks work best to use after fine tuning or merging a model. Trying to train things that are too far out of domain seem to go haywire. It makes sense considering that when you fine tune a Stable Diffusion model, it will learn the concepts pretty well, but will be somewhat difficult to prompt engineer what you've trained on. Hypernetworks seem to help alleviate this issue. |
Beta Was this translation helpful? Give feedback.
-
A few more examples of NAI + Andreas Rocha hypernetwork now that it is trained. |
Beta Was this translation helpful? Give feedback.
-
Thanks for this guide I have been struggling to get an embedding of a particular atrists style sorted out and this helped no end to an acceptable result. I had 26 examples of the artists work which I manually resized/cropped to 512x512. Embeddings need a much bigger Learning rate, and after some trial and error, I ended up with: Initialization text: * Nice thing about the embeddings is I can use a standard model and just add "painting by [artist-name]" to my prompts. I will try extending the final learn out to a much bigger number of steps and see if more details appear. |
Beta Was this translation helpful? Give feedback.
-
can you share the post processed imgs that you used for the training? if its possible. Just to have a better idea of what works |
Beta Was this translation helpful? Give feedback.
-
Very good tutorial, although my VRAM currently doesn't support me to use Train XD |
Beta Was this translation helpful? Give feedback.
-
I trained a mob psycho Hypernetwork , here are the results with 26k steps No mob psycho prompts where use to generate these images. Some extras |
Beta Was this translation helpful? Give feedback.
-
I'm trying to train it on Mass Effect aliens. I know the SD 1.4/1.5 model has a vague idea of what they are, but training goes in circles. Is Hypernetwork the wrong tool for that? |
Beta Was this translation helpful? Give feedback.
-
Do you know where the default values for training are kept? I'd like to edit change the usual 0.005 to your recommended schedule. There's usually always one value I forget to set and I have to start all over. |
Beta Was this translation helpful? Give feedback.
-
For anyone wanting to test something, this is an annealing learning rate I'm trying out: It would be better if we could put math expressions in the learning rate field instead. |
Beta Was this translation helpful? Give feedback.
-
There is a PR for multilayer structure settings for hypernetworks #3086. Does anyone have an idea on this affects training? |
Beta Was this translation helpful? Give feedback.
-
What learning rate did you use?
fim., 20. okt. 2022 kl. 17:42 skrifaði Pirate Kitty <
***@***.***>:
… I've been able to train faster with normalization, but increasing the
neural network density only slowed training down without any perceivable
gain, at least on 100ish picture dataset.
—
Reply to this email directly, view it on GitHub
<#2670 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3XG2HZZR47QM2FXR3N7R2LWEGABLANCNFSM6AAAAAARFV5TAI>
.
You are receiving this because you commented.Message ID:
<AUTOMATIC1111/stable-diffusion-webui/repo-discussions/2670/comments/3927066
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
Yeah, my goal is good, not fast.
fim., 20. okt. 2022 kl. 19:43 skrifaði Pirate Kitty <
***@***.***>:
… I'm currently using 5e-3:200, 5e-4:400, 5e-5:1000, 5e-6:2000, 5e-7:3000
for normalized. Only training up to 3000 steps. But the results aren't good
and they don't seem to get better with normalized. So it's only if you want
something fast, I suppose.
—
Reply to this email directly, view it on GitHub
<#2670 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3XG2H2NL22VKOMDM6G22UDWEGOGJANCNFSM6AAAAAARFV5TAI>
.
You are receiving this because you commented.Message ID:
<AUTOMATIC1111/stable-diffusion-webui/repo-discussions/2670/comments/3927937
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Just a quick question on how to read "loss convergence". For example, a loss from "0.30-0.10" to "0.25-0.15", can be interpreted as converging? Am I understanding this correctly? |
Beta Was this translation helpful? Give feedback.
-
I can't get Hypernetwork training "to work". Training a model on myself via Dreambooth creats great results, but trying the same with Hypernetwork I look like a** 😂 Like a long lost cousin or something. I tried Hypernetwork on a friend of mine and no matter what I did he turned up a good lookin asian dude and he is not asian at all 😂 Lol... Wish I could use this training since 10GB VRAM is to little to train Dreambooth locally |
Beta Was this translation helpful? Give feedback.
-
So much has changed in the recent commits that I feel like most of the info in this thread is no longer relevant. Kinda feel like its time to start a new one with revised research and findings. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Has anyone got good hypernetworks working on non anime styles ? |
Beta Was this translation helpful? Give feedback.
-
So, I was going to look into how weight decay affects training, but in the process found a different issue with current training. So, the first thing I saw was the very smooth loss line ( So I looked into code. Turns out, when generating any images, we call this function: stable-diffusion-webui/modules/devices.py Lines 84 to 88 in 44c46f0 From what I can tell, this is also called in case of Which lead to the second test. Now, I'm still not sure which parts of HN training cause non-determinism, but I've set a fixed seed at the beginning of So, yeah. They are mostly identical. And the thing that really sticks out is that it's very periodic, like you see an identical pattern every 45 steps, but squashed a little bit (because it still converges). We can also add a loss graph of the same training, but without resetting RNG every 45 steps. And yes, it diverges at 45 steps. Well, 44, but that's a different issue. Now the question is, does it only affect loss metrics, or it also affects training as well. Easy, just all of our hypernets against each other and the answer should be clear, right?
Three identical hypernets, 90 steps (10 epochs), batch size 1, fixed seed at the beginning of Some results from here on may be NSFWish. Hypernets have a very loose idea of There isn't even a question, 5-1 and 5-2 look like they're the same image on slightly different systems, while 5-3 is drastically different by comparison! So what does this all mean?
|
Beta Was this translation helpful? Give feedback.
-
The training cannot be carried out due to the above error. plz help.. |
Beta Was this translation helpful? Give feedback.
-
So, is there a universal guide how to train art styles (character drawing style) using monkeypatch fast and without blowing up weights, and destroying background details |
Beta Was this translation helpful? Give feedback.
-
@Heathen have you seen #4940 ? After reading here and getting some good results in ~5k steps, I achieved similar results in 500 steps with the recommendations there (1, 0.1, 1 lr: 1e-4, changing optimizer, adding the if loss) |
Beta Was this translation helpful? Give feedback.
-
I don't know if this is still relevant, but I have found my biggest success on wide and deep bottlenecked networks. From my empirical testing: Wide nets (1, 3, 1) or (1, 4, 1) Deep nets (1, 1.5, 1.5, 1) or (1, 1.5, 1.5, 1.5, 1) 'Default' nets (1, 2, 1) Needless to say, all three architectures are far from ideal, since all of them have a very hard time generating images where prompt is very far off from trained data captions. So I went on and did some more testing: Wide, deep and bottlenecked (1, 3, 0.75, 0.75, 0.75, 3, 1) or (1, 4, 0.75, 0.75, 0.75, 4, 1) The activation functions does not seem to play a huge part in the results. Sure, certain activation functions will allow the net to learn further or not explode, but I've been having decent results even with linear activation and normal initialization. It seems that the net architecture plays a much bigger role in making a good hypernet than the activation functions themselves. |
Beta Was this translation helpful? Give feedback.
-
I noticed a plenty of artists that embeddings are not capable of recreating the artstyle (when it comes to recreating an artstyle instead of a subject), did some testing with hypernetworks and they started "looking like something different" towards the artstyle, however, only with dreambooth I could achieve such quality, however ... they were ... "mimmicing" the inputs ... even if i checked to train an artstyle, not a subject, stable difffusion always came up with something that looked like an "interpolation" of the original input instead of keeping the composition intact .... The artists : Wayne Raynolds and Robbie trevino |
Beta Was this translation helpful? Give feedback.
-
And expanding a bit on #2670 (reply in thread). After I removed setting seed at each step and forked rng for previews I tried to make this double descent I keep talking about but it still didn't work out (admiteddly the size of the network was kinda small). However I made a network that makes "perfect copies" of the training image. Here is the exact prompt used during training with a 4 batch generation. face, ears, hair, pupils, eyes, glasses, eyebrows, nose from front, closed mouth, art by artist name Same image each time regardless of seed. And now same prompt but I remove glasses. And now I add "from side" And now I say just art by artist name I added braid into negative prompt. Didn't go away. Then I added black hair instead of hair. No black hair. And finally green hair, smug smile, pupils, eyes, eyebrows, from side And I am tired so I don't have any huge revelation about how tagging works, but it seems like the closer the prompt is to the tags you use the more features will be overridden by hypernetwork. Granted this is a special case of single image used for training but I think this applies to all networks. So tagging even with obscure tags could be good - personally I thought it makes sense just to tell the network which feature it should learn from image. |
Beta Was this translation helpful? Give feedback.
-
another tutorial: https://civitai.com/models/4086/luisap-tutorial-hypernetwork-monkeypatch-method |
Beta Was this translation helpful? Give feedback.
-
I wrote down what I was working on for the past month or two: Would be nice if someone else would give it a shot and maybe get a better result with those settings so I can finally stop trying to make it perfect. |
Beta Was this translation helpful? Give feedback.
-
How should I read the log files of the hypernetwork? I use the extension to complete the training, run the command tensorboard --logdir and open the link, but it shows that there is no data |
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
The negative text preview during training appears to have been fixed a few patches ago, carry on.
tl;dr
Prep:
Training:
5e-5:100, 5e-6:1500, 5e-7:10000, 5e-8:20000
[filewords]
in it.Longer explanation:
Select good images, quality over quantity
My best trained model was done using 21 images. Keep in mind that hypernetwork style transfer is highly dependent on content. If you pick an artist that only does cityscapes and then ask the AI to generate a character with his style, it might not give the results you expect. The hypernetwork intercepts the words used during training, so if there are no words describing characters, it doesn't know what to do. It might work, might not.
Train in 512x512, anything else can add distortion
I've tested this several times. I haven't gotten good results out of it yet. So up to you.
Use BLIP and/or deepbooru to create labels AND Examine every label and remove whatever it wrong, add whatever is missing
It's tedious and might not be necessary, if you see blip and deepbooru are working well, you can let it as is. In any way, describing the images is important so the hypernetwork knows what it is trying to change to be more like the training image.
Learning Rate:
5e-5:100, 5e-6:1500, 5e-7:10000, 5e-8:20000
They added a training scheduler a couple days ago. I've seen people recommending training fast and this and that. Well, this kind of does that. This schedule is quite safe to use. I haven't had a single model go bad yet at these rates and if you let it go to 20000 it captures the finer details of the art/style.
Prompt Template: a .txt with only
[filewords]
in it.If your blip/booru labels are correct, this is all you need. You might want to use the regular hypernetwork txt file if you want to remove photo/art/etc bias from the model you're using. Up to you.
Steps: 20000 or less should be enough.
I'd say it's usable in the 5000-10000 range with my learning rate schedule up there. Buuut you will notice that in the 10000-20000 range, a lot of the finer details will show up. So as the rock would say, put in the work, put in the hours.
Final notes after the rock intermission.
Examples:
Trained NAI for 6500 steps on Andreas Rocha style. I plan on letting it train to 20000 later. And done.
Vanilla NAI
RTX On, I mean, Style on
20000 steps
Vanilla NAI
Rocha ON
20000 Steps
Beta Was this translation helpful? Give feedback.
All reactions