New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How did you solve the [unk] problem? #8
Comments
Model 4 and model 5 are built upon the concept of pointer generator model , which simply means , that the model trains a neural network to know when to generate a new novel word , and when to copy a word from the given text . This is actually what happens when the model is confronted with out of dict (jargon, unk) words , it chooses to copy it from the original text , as it can't regenerate it , as it is simply hasn't seen it before . So pointer generator model actually combines the 2 worlds of abstractive and extractive models in one model , you can know more here |
I truly hope this was helpful |
My question is much deeper than your surface level answer. I know the basic idea of combining pointer and generator, and the claim that the pointer is there to take care of OOV. Looking thru the code though, the pointer seems to point to OOV in the article for training data, where the abstract (target) serves as the guidance. In test mode, there is no target, and the model is clueless as to when to use the OOV in the extended vocabulary, because those words never had a vector representation and do not participate in the context. When applying a trained model to a different domain, the oov problem is especially acute. For some reason, your trained model is able to generate those OOV word in the summary. In any case, I will run your code and model to track how its behaving to answer my questions. |
for pointer generator model in training i train a neural net with an equation where its main inputs are
it knows whether to copy the word or not from the output of the neural net that was trained . The equation of that neural net is (used in model 4) from this paper
Pgen ,is the probability of generating the word either from Vocab distribution (P vocab), or from Attention distribution (sum of attentions of words) , (i.e : either generate a new word , or copy the word from the sentence) there has been another implementation also for the pointer generator model from another page , it shares the same concept , same broad inputs (from decoder and from attention) , but it is not what is implemented in model 4 paper
Ws h ,Ws e ,Ws c , b s and v s are the learnable parameters. so simply by teaching this neural net , your model would be able for each word to know whether to copy or to generate , according to the final output pgen that truly acts as a switch , to choose either to copy or to generate |
Hey thanks for the thorough explanation. You are the most responsive author I have ever run into. I have gone thru both the theory and the code in detail. I am afraid I am still ahead of the current comments you just made. Yes pgen is responsible for know how much to rely on copying, and that decision is based on the contextual information drew from the attention mechanism. This argument is correct when it comes to in vocab words. You may have missed my point that oov words do not ever get a vector representation and do NOT participate in context derivations, and therefore during test mode, the model would be clueless when it should point to them. Is my point clear? |
ok, it appears the model knows to point to oov words from the article without their vector representation. I imagine due to their position and contextual support from neighboring words. |
I hope this made things clearner |
Hi theamrzake, thanks for adding the comment. However, I don't think you got this one. t is the time step of the decoding process, so it points to the position of the output. i points to the position of the encoder sequence. It works out so that for any oov word that shows up in the inferenced summary, the attention mechanism has to give high weights to a particular input oov word from the article (the right i). And that is exactly my question. Given the input oov is substituted with [unk], it does not offer the semantic meaning of the original oov word, so the model must be leverage other information to accentuate that oov word for the purpose of selecting it to point to. And the "other" information, I am suggesting would be the order of the oov word and its neighboring, non-oov, words. |
It's been a good discussion. Thanks again. |
i truly hope it has been useful , thank you |
I tried running some randomly selected text with lot's domain specific jargons. On my trained model, all the jargons got translated to [unk], which actually seems reasonable based on my understanding of the models (4 and 5). However, on your demo site, your model was able to spit back out the jargon words. Can you suggest what the difference might be that allowed your model to work effectively for the oov words?
The text was updated successfully, but these errors were encountered: