Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't train a model with the same reconstruct score as your model. #15

Closed
manurubo opened this issue Jul 10, 2018 · 7 comments
Closed

Comments

@manurubo
Copy link

Hello,

I've been trying to train a model with the same instructions you provide in the Readme of molvae, but I have not been able to get the same reconstruct as you.
First I used your code, and had some problems with this part of the code in vaetrain.py:

if (it + 1) % 1500 == 0: #Fast annealing
            scheduler.step()
            print "learning rate: %.6f" % scheduler.get_lr()[0]
            torch.save(model.state_dict(), opts.save_path + "/model.iter-%d-%d" % (epoch, it + 1))
            beta = max(1.0, beta + anneal) 
 

When the code first enters into this if, the LOGs of the reconstruction goes crazy and the final model I obtained had 0 reconstruction score. After reading and understanding more of your code I thought that the max in beta = max(1.0, beta + anneal) may be a min, so I changed that line of code to beta = min(1.0, beta + anneal) .

With this new change, I trained the whole model and now the LOGs seemed fine, but after I had the whole model, I tried to test this new model with the reconstruct.py and the model is giving me about a 0.52 reconstruct score, way too far from the 0.77 reconstruction score that the model MPNVAE-h450-L56-d3-beta0.005 has.

I would like to know if the change I made to the beta = max(1.0, beta + anneal) it's correct, also why with or without this change, I'm not being able to train a new model following your instructions? Is there something I would have to do that I'm not doing to train a model? Did you trained your model with the same instructions that the Readme says?

Thank you, I'm going crazy because this doesn't make sense at all, I would be very grateful if you had any idea of what's happening.

@wengong-jin
Copy link
Owner

wengong-jin commented Jul 11, 2018

That's some sort of debugging code that I accidentally introduced. I removed this line. Please see the updated version. Many thanks

For the performance on reconstruction, I set beta=0.001. This should give you the 77% performance.

@manurubo
Copy link
Author

I've tried again with beta=0.001 and I'm not obtaining 77% performance. The iterations defined with MAX_EPOCH in pretrain and vaetrain are the same you used or should I make more iterations?

Also, I tried to see if the first part of the model, the one that is obtained from the pretrain.py, had similar reconstruct performance to the MPNVAE-h450-L56-d3-noKL/model.iter-2 but obtained totally diferent performance, maybe there's any problem here?
I saw that the MPNVAE-h450-L56-d3-noKL/model.iter-2 had similar reconstruct performance to the model.iter-0 I obtained from pretrain.py, but in the molvae Readme says that I should use the model.iter-2 for vaetrain.py, so that's the model I'm using.
CUDA_VISIBLE_DEVICES=0 python vaetrain.py --train ../data/train.txt --vocab ../data/vocab.txt \ --hidden 450 --depth 3 --latent 56 --batch 40 --lr 0.0007 --beta 0.005 \ **--model pre_model/model.2** --save_dir vae_model/
I changed the beta as you said when I executed the command this is just the example of the Readme.

Thank you.

@NamanChuriwala
Copy link

The reconstruction accuracy obtained from model.iter-0 and model.iter-2 wouldn't be very different since during training the models, you can see the loss flattens out after a few thousand minibatches in the first epoch itself. It is possible that the author of the repo trained their model on a larger number of molecules and not on molecules only from 'data/train.txt'.
You could try increasing the training data size. The number of iterations wouldn't make a difference.

@maxime-langevin
Copy link

Dear @wengong-jin ,

I'm currently trying out your repo (very nice work by the way!), and I'm running into similar problems as described in this issue.
I'd just like some insight on what I understood of your answer: if I want to use your model's encoding and reconstruction abilities, I should train it with a low beta value (0.001), and if I want to use purely its molecule generation abilities, I should train it with the standard beta value (1)?
I using sometimes one ability, sometimes the other, and just wanted to know if it was normal to have the model trained in two different fashions for each task (which is what I understand from your answer to this issue, and is what I do currently)? It would be a bit more convenient to have the model trained in one fashion and be able to use it for the two tasks, but the way I use it currently is already very pleasant.

Thanks a lot!

@wengong-jin
Copy link
Owner

Hi maxime1310,

To some degrees what you said is right. Training with small beta helps reconstruction and larger beta helps generation. Yet training with beta=1.0 wouldn't necessarily give you the best molecule generation ability, and it varies from dataset to dataset. In general you need to treat it as hyperparameter and try different values for the downstream task.

@maxime-langevin
Copy link

Thanks a lot for the quick answer!

On the datasets you worked on, doing hyperparameter search for beta allowed you to sometimes find beta values for which you got both good reconstruction quality and molecule generation? Or do you recommend to first select the downstream task, and then search for a beta value for which the model will perform well on this particular downstream task?

@wengong-jin
Copy link
Owner

I would recommend to search for beta that performs well on the particular task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants