-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to turn off variable reuse a number of scopes down? #537
Comments
Thanks for your report cinjon, I think it's a bug in seq2seq.model_with_buckets. We'll work on a fix, I'll also test it with a bunch of other models. In the meantime, can you try replacing these two lines in model_with_buckets by something like this: I think that should help and be better than doing your own variables. Thanks for catching this problem! |
Awesome, thanks for looking into this Lukasz. I'll try to implement your fix in the meantime and will report back. |
Hey Lukasz, I just had some time to try this out and I'm still getting the Under-Sharing error. This makes sense because the scope change you're suggesting takes place in a superset scope of the attention module. So when we set the reuse=True and the module introduces a new variable name (given in the If I instead turn off the reuse=True in model_with_buckets, then we get an Over-Sharing error in the embedding wrapper. This also makes sense because of the It seems like there isn't a way to break the reuse contract in a sub scope. Is that right? If so, is there a way to satisfy this without passing the variables through to the right function / making a separate attention module? |
I tried a few things to solve this including changing the That's complicated by the fact that the attention_states aren't calculated until embedding_attention_seq2seq. I moved it to its own function like this:
And then instantiated it in embedding_attention_seq2seq:
But there is still this Under-Sharing problem. This time, it's on the AttnW_0_b. And it happens because to instantiate the modules with the overall model, I am doing this:
If I remove the reuse=True on that, then the error is again over-sharing, this time because Is there something else I could do to instantiate the graph like I'm describing? Thanks. |
Hi cinjon. I think I corrected the reuse leaking from model_with_buckets in a recent commit, so now you're hitting a different problem, this one.
I think removing reuse=True in your case is the right thing to do (you don't want to reuse across modules, right?). But then - do you want to share the encoder embedding across modules? If you do, then you should create a variable just for the embedding and pass it to EmbeddingWrapper, I think. If you don't, then maybe just put each module in it's own scope (e.g., with variable_scope("module" + str(num))) -- that will make all variables unique. Does that help? I'm not fully sure how exactly you want to share variables -- if every module is separate, then I think the best way is to have separate scope for each of them. |
I haven't been able to code up a solution where the modules share the encoder embedding yet, but yes, that's the idea. The only thing that I want to be unique is the attention component - the encoder should be idempotent across the modules. I'll report back on doing the former. |
@lukaszkaiser , I got this to work by declaring all of the attention modules on every run so there is no under-sharing. The gradients are then the sum of the gradients of each module. At step time, I only feed the losses for the current module into the output. This is inefficient, but it seems to work. I'm going to dig into the graph that TensorFlow made just to make sure that it's right and then work on making it more efficient. It does seem though that it would be a lot easier to build this graph if there was a way to turn off reuse in a sub-scope. |
Hi again. I needed to make this more efficient, but my attempts haven't been working. I also realized that the previous solution I had built failed when there is more than one bucket (I had made it uni-bucket for testing). TF would throw an error about the encoder and decoder placeholders needing values. If I used a bucket of (10,12) and had set it up to also have a bucket of (20,25), then the error would be for encoders 10 through 20 and decoders 12 through 25. This was confusing because it seemed like I was building the graph similarly. What I have now is that I make the encoder_cell and the embedding in the Seq2Seq init:
I'm then passing them through into the embedding_attention_seq2seq:
This throws a confusing error even before I can start training it:
I also printed out the params (see below) to see if there was anything peculiar and I noticed the first value is
I realize this might be out of scope of a Github issue now. Let me know if you'd rather take this offline. Thanks @lukaszkaiser |
Some more info on the above error:
This happens on the second module. The loop makes the first one without a hitch and then goes on to make the second one where it finds that the softmax Reshapes are now int32s. Printout of true_b after
And then on the second module, just before the error is thrown:
|
It looks like if I change _compute_sampled_logits in ops/nn.py to cast all_b into a float32, then I get past this error and can build the graph, but later run into another error at step time where the params are not 1-dimensional.
EDIT: I'm going to dig more into this tomorrow, but in the meantime, I forgot to mention an important part and that's that the above error comes after we use control_flow_ops.cond to choose a different attention weight/bias (via the def attention_module above) based on which module we're using. This happens in the attention_decoder, where instead of |
I couldn't leave it alone and tried one more thing, which was to use a control_flow_ops.cond on the models themselves. What this looks like is that I have a cond set up to branch to a different model_with_buckets output / loss based off of the module. This didn't work because I run into a problem with branching in control_flow_ops's BuildCondBranch (~ L568: At this point, I'm unsure what else to try to get this to work. I feel like I am missing some key understanding that should make this easy. :( |
@lukaszkaiser Thoughts? |
I was offline for a while cinjon, and I'm not sure I understand now what you were trying to accomplish. If you still have this problem, let's take it offline and, if needed, open another issue (just to not prolong this one, it becomes unreadable). Thanks! |
Hey Lukasz, thanks for the reply. |
As per my last comment -- it's not clear from this thread what the issue is any more. If you're having some issue, feel free to open a new one and please explain precisely. Also note that variable reuse is inherited on purpose -- the other design would break compositionality across modules. If you have a variable in the same module or function, why not reuse it by reference? |
…le-nvvm-emission [XLA:GPU] Emit llvm::Intrinsic::nvvm_atomic_load_add_f32 only on NVPTX
I'm having trouble with an under-sharing error for scope reuse. A specific example that I can point to using TF repo code is if we were to build an architecture with parallel attention modules.
Say we did something in the attention_decoder in rnn/seq2seq.py like this:
I then build the model by running:
This works just fine for one module, i.e. the first loop goes off without a hitch. When I get to the second module though, I get the following error:
ValueError: Under-sharing: Variable embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/b/AttnW_0 does not exist, disallowed. Did you mean to set reuse=None in VarScope?
I realize that I can build the variables up front and then push them through functions to where they're needed. However, that seems really bad because then there are floating variables built in the beginning that are way out of program scope. Is there a better way?
The text was updated successfully, but these errors were encountered: