New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - Seq2Seq Inference Helper w/o Embeddings #12065
Comments
I'm trying to implement my suggestion, but I'm running into an error that I'm unsure of how to debug. My code:
I'm receiving the following error: My formal parameter for My Traceback:
|
@ebrevdo @lukaszkaiser Can you comment on this? Thanks! |
One mistake I'm making (I think) is that |
Adam, iirc you wrote the training version, right? Ptal?
…On Aug 9, 2017 9:37 PM, "Rylan Schaeffer" ***@***.***> wrote:
One mistake I'm making (I think) is that finished =
math_ops.equal(outputs, self._end_tensor) operates element-wise, which
means finished will be a tensor of shape [batch size, ...] and should
instead be a tensor of shape [batch size]. I believe a statement is finished
= math_ops.reduce_all(math_ops.equal(outputs, self._end_tensor), axis=1),
but I'm not sure if this generalizes to tensors of higher dimensions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12065 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtim0RVcPD-bKi8I97y970QWUsY4uNvks5sWokjgaJpZM4Ou28p>
.
|
This is actually a more specific case of what I am building, which is a general sampling helper that takes a layer like the ScheduledOutputTrainingHelper. The problem with using the training helper is that it doesn't output the actual samples so it can't be used during inference. Rylan, your solution solution will almost work, but you are going to hit the issue that sample_ids is currently required to be a scalar integer, something which I am also solving with my change. If you can wait until next week, I can have something that will work for you. In the meantime, I'd recommend hacking up the CustomHelper to work for your use case. You'll also need to adjust the BasicDecoder output_size and output_dtype to match the shape of your sample tensor. |
@adarob , thanks for responding! I need to meet a deadline this Sunday for my internship project, but I have another month after that to continue working on the project. Could I email you directly for help with hacking a solution in the short term? (I'm happy to talk here, but I know TensorFlow developers like to keep conversations focused on the original issue). |
Sure @RylanSchaeffer |
@adarob , thanks! Sent! |
@adarob @RylanSchaeffer Hi guys, I've also been looking for an output helper for this purpose. Would it be possible to share your solution with me? Thanks in advance. |
@ppyht2 Adam's solution for my problem might not work generally. In my case, my decoder's output is a (4,) tensor of binary values, so he suggested using a Let me know if that makes sense. |
I added a new InferenceHelper to the Google codebase. I'm not sure how
often that is synced with GitHub but I'd expect it to happen in the next
couple of days.
On Aug 13, 2017 9:35 AM, "Rylan Schaeffer" <notifications@github.com> wrote:
@ppyht2 <https://github.com/ppyht2> Adam's solution for my problem might
not work generally. In my case, my decoder's output is a (4,) tensor of
binary values, so he suggested using a CustomHelper that maps the decoder's
output to sample_ids by treating the output as an integer represented in
binary i.e. [0, 0, 0, 0,]'s sample_id is 0, [0, 1, 0, 0]'s sample_id is 4,
etc.
Let me know if that makes sense.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12065 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABCa6CdmtfB-wN4ypySoYD9zuVXjlqiXks5sXyXegaJpZM4Ou28p>
.
|
@adarob , would it be possible to also get a corresponding modified version of |
@RylanSchaeffer I'm currently working on a problem where my decoder's output has float values, in which case the solution will no longer work, is that correct? @adarob will the InferenceHelper resolve this issue? Thanks for you help guys :) |
@ppyht2 That's correct. However, I think there's an easier solution if all you want is the decoder's output to be passed as input at the next time step and you don't care about sample_ids. If you look at
If all you want is the cell's outputs to be the decoder's inputs at the next time step, you can write a Hope that makes sense! Disclaimer: My |
@RylanSchaeffer This solution sounds like it could work, I will give it a crack. Thanks for your help. Did you had any explanation as to why the |
@ppyht2 I suspect that I'm incorrectly using the TrainingHelper. I've been receiving help from people on another issue at the NMT tutorial (tensorflow/nmt#3), but I haven't been able to fix my issue yet. |
The new InferenceHelper added in e9a8d75 resolves this issue. |
@RylanSchaeffer does @adarob's PR solve your issue? |
@ebrevdo Let's presume yes, and I'll reopen the issue with additional details if it doesn't. |
@adarob , thank you! |
@RylanSchaeffer I am facing a similar issue, just not working on vectors but on regular good old floating point numbers for time series forecasting. Would you mind me contacting you via mail with a few short questions regarding your InferenceHelper implementation? |
@fritzfitzpatrick , I'd rather help you here in case anyone else runs into a similar issue. Here's the code I used. In my case, I had a (4,)-shaped tensor of zeros and ones as outputs, hence 16 possible outcomes, hence my sampling function. However, the sampling function was just for helping me debug whereas you can get by with a function that does nothing. inference
|
@adarob @ebrevdo Just my personal opinion, but even though the issue is technically solved by giving the programmer the ability to implement their own helper through |
@RylanSchaeffer are you suggesting having a CategoricalInferenceHelper that provides categorical sampling, as is most often used? I think it's reasonable to add that. |
@adarob that would be helpful, but I was referring to something slightly different. I haven't looked at this in 5+ months, so maybe the module has changed, but my understanding is that the sample_fn is used to collapse the rnn cell's output tensor into a single number (either deterministically or stochastically), which is then reconstituted as an input tensor for the next step. This is useful if I want the input tensor to represent a discrete input e.g. a word, but in the context that I was using the library, I wanted the output tensor to be passed to the next step unchanged. For concreteness, my understanding of the current implementation is like this: My desired behavior: |
@RylanSchaeffer Thanks for sharing. My code is a bit more bare bones, but I should be able to wire something together based on your snippet. @adarob Passing the decoder output at time step T as the decoder input at time step T+1 is exactly what I am after, and I have been struggling a bit with implementing this through other helpers (mostly because of my lack of understanding re: sampling in TF and/or how to deal with function arguments like time, that I now see Rylan just deletes from the function scope). Overall I am really happy with TensorFlow and it's a great entry point for ML beginners like me, so thanks a lot! |
What you're requesting is possible by setting sample_fn=tf.identity. |
@adarob Thanks, I'll try tf.identity. I will pass the modified decoder output (through a dense layer) of time step T as inputs in time step T+1 to the decoder, I was unclear in my above post, sorry for that. |
I'm facing a similar issue. I've described what I'm trying to do at stackoverflow. Has anyone managed to use a CustomHelper for this? |
@nishaskinner I think Rylan managed to get it to work, but I am getting stuck. @RylanSchaeffer can I pick your brain about this in a mail? I want to understand and share my knowledge, as apparently there's more people out there that need help implementing the CustomHelper outside the nmt domain. I am creating a generic example for regression using a sequence to sequence architecture that will work with any input sequence length and number of features, as well as any output sequence length and number of features. The work in progress notebook can be found here, and I want to turn this into a beginner friendly blog post as tutorials on non-language applications are few and far between. Thanks a lot in advance! |
@adarob , I agree that while you're correct, the lack of documentation and examples (and a general explanation of what a Helper even does) makes writing a CustomHelper difficult for people who aren't familiar with the library, as these comments demonstrate. @nishaskinner Yes, my CustomHelper worked as I intended. I posted the code above. @fritzfitzpatrick , what exactly would you like to talk about? My email address is rylanschaeffer@gmail.com, but like I said above, I'd prefer to keep my conversations public in case others have similar questions. |
Sorry to respond late. Is anyone here interested in adding a comprehensive
docstring to the module explaining helpers and how to create your own? You
could base it on existing unit tests.
…On Sat, Feb 10, 2018, 9:28 AM Rylan Schaeffer ***@***.***> wrote:
@adarob <https://github.com/adarob> , I agree that while you're correct,
the lack of documentation and examples (and a general explanation of what a
Helper even does) makes writing a CustomHelper difficult for people who
aren't familiar with the library, as these comments demonstrate.
@nishaskinner <https://github.com/nishaskinner> Yes, my CustomHelper
worked as I intended. I posted the code above.
@fritzfitzpatrick <https://github.com/fritzfitzpatrick> , what exactly
would you like to talk about? My email address is ***@***.***,
but like I said above, I'd prefer to keep my conversations public in case
others have similar questions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12065 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtim1ApLs7vSeSXl6VHjH2PNOuUO3v0ks5tTdHAgaJpZM4Ou28p>
.
|
@RylanSchaeffer I have tried using a go token as well as the last value of the encoder input sequence as my start_inputs in the initializer_fn, and I have tried using the projection layer outputs as well as the original signal as the next_inputs in the next_inputs_fn. I am training my model on a linear time series, it is seeing a sequence [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6] and should learn to predict [0.7, 0.8]. During training and validation the l2 regulated loss goes down to 0.010 and the validation prediction using the training helper looks somewhat like [0.69, 0.79], so close enough. When I save that model, restore the variables to an identical inference model and run it, the output looks more like this [-0.14, 0.19] etc. I have no idea why it is off like that. Attached is the model in its current form, but as I said above, I have been through a few configurations. I can't seem to find the reason as to why the inference prediction is so vastly off the mark.
|
@fritzfitzpatrick did you have any luck with this? I have been struggling with this for days and training is fine with the traininghelper, however I am still completely lost with what should happen during inference. |
@fritzfitzpatrick I'm also struggling with same issue. Could you please share the solution or guide me, if you have managed to fix this? |
I indeed have made some progress using a seq2seq architecture, but it is not very accurate and tends to generalise towards a perfectly linear horizontal line after just a few iterations. I have however tested another architecture that works quite nicely. I am currently pretty tapped out with business travels, but I will try to upload a python notebook for you to peruse. If anyone else finally got multistep numerical time signal predictions going with the inference or custom helper using a seq2seq architecture, please let us know. Can't be that hard, can it. |
@VinoJose @fritzfitzpatrick
From my limited understanding, this bypasses the problem of the training helper, does it not? By not running the train_op, I assume that the network weights remain constant. This gives quite good results, although I am not sure if I am 'cheating' in any way. Do you have any ideas on this? |
lacking a Hope this helps anyone who reaches this thread |
I got it to work for no embedding in a much simpler way, using a very rudimentary
My inputs are floats with the shape
|
@Andreea-G could you please elaborate on this flag I use the following lambda function |
@Andreea-G Could you elaborate the |
@ebrevdo I would love to do that, if it's still required? |
Sure
…On Mon, Dec 31, 2018, 1:10 AM SHIVAM PRASAD ***@***.*** wrote:
Sorry to respond late. Is anyone here interested in adding a comprehensive
docstring to the module explaining helpers and how to create your own? You
could base it on existing unit tests.
… <#m_4343391276730480940_>
On Sat, Feb 10, 2018, 9:28 AM Rylan Schaeffer ***@***.***> wrote: @adarob
<https://github.com/adarob> https://github.com/adarob , I agree that
while you're correct, the lack of documentation and examples (and a general
explanation of what a Helper even does) makes writing a CustomHelper
difficult for people who aren't familiar with the library, as these
comments demonstrate. @nishaskinner <https://github.com/nishaskinner>
https://github.com/nishaskinner Yes, my CustomHelper worked as I
intended. I posted the code above. @fritzfitzpatrick
<https://github.com/fritzfitzpatrick> https://github.com/fritzfitzpatrick
, what exactly would you like to talk about? My email address is
***@***.***, but like I said above, I'd prefer to keep my conversations
public in case others have similar questions. — You are receiving this
because you were mentioned. Reply to this email directly, view it on GitHub
<#12065 (comment)
<#12065 (comment)>>,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim1ApLs7vSeSXl6VHjH2PNOuUO3v0ks5tTdHAgaJpZM4Ou28p
.
@ebrevdo <https://github.com/ebrevdo> I would love to do that, if it's
still required?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12065 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtim2VegMe-9XpPHhLloJU4YJ7O9Tfyks5u-dSfgaJpZM4Ou28p>
.
|
tf.contrib.seq2seq
has twoHelper
classes to use during inference,SampleEmbeddingHelper
andGreedyEmbeddingHelper
. However, both make use of embeddings, which is unhelpful when building sequence-to-sequence models that operate on non-embedded target sequences (my target sequence already consists of meaningful vectors).I'd like a new
Helper
class that pipes the output of the decoder RNN at one time step into the decoder RNN at the following time step. It should permit thestart_tokens
to be vectors (tensors?) and theend_token
to be a vector (tensor?) as well. Right now, I'm attempting to useScheduledOutputTrainingHelper
withsampling_probability
set equal to 1.0, but I'm struggling to get it to work. Something like a simpleOutputInferenceHelper
would be very nice :)If there already exists an easy way to do what I'm suggesting, please let me know!
The text was updated successfully, but these errors were encountered: