Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Elmo Embeddings (Word Vectors, Sentence Vectors) #149

Closed
andymancodes opened this Issue Sep 8, 2018 · 12 comments

Comments

Projects
None yet
9 participants
@andymancodes
Copy link

andymancodes commented Sep 8, 2018

Hi,

Even after trying to work with elmo and reading about it, I am not getting how to use it. It looks like for a given sentence, i have to pass the sentence through the elmo model and then I can get the elmo embeddings? But the parameters of a neural net are fixed after training. So, why not release elmo as a set of pretrained vectors like glove, why make it so hard to use?

Where could I find a clear description of how to use elmo embeddings. Does elmo have word embeddings? Does elmo only give sentence embeddings? How can I use elmo to say get word embeddings and compare their performance against Glove or compare the performance of sentence embeddings. How can I build a matrix of word embeddings as in Glove or word2vec?

I have been hearing about elmo and want to use it but so far all the nice metrics in their paper is lost on me because I can't really test it.

Tensorflow hub is awesome btw.

Cheers!

@vbardiovskyg

This comment has been minimized.

Copy link
Member

vbardiovskyg commented Sep 11, 2018

Hi,

I will try to answer sequentially.

  1. "It looks like for a given sentence, i have to pass the sentence through the elmo model and then I can get the elmo embeddings? " - you can embed your sentence with the elmo (and any other module) like this:
elmo = hub.Module("https://tfhub.dev/google/elmo/1")
embeddings = elmo(
    ["the cat is on the mat", "dogs are in the fog"],
    signature="default",
    as_dict=True)["elmo"]
  1. "But the parameters of a neural net are fixed after training." - Parameters of the graph are not necessarily fixed. It is possible to fine-tune some parameters of the elmo module, if one sets trainable=True as in:
elmo = hub.Module("https://tfhub.dev/google/elmo/1", trainable=True)
  1. "So, why not release elmo as a set of pretrained vectors like glove, why make it so hard to use?" - Even if the parameters were fixed, it doesn't mean that one can simply release them as embedding vectors. For example, elmo is purely character-based, meaning it has no vocabulary. Sure, you could define your own vocabulary and precompute embeddings using the module, but that is little bit defeating the purpose of having character level embedding.

(Btw if you wanted to do that as a speed/quality tradeoff, there is a mention of it here. You could then reexport the embedding as a TF-Hub module.)

  1. "Where could I find a clear description of how to use elmo embeddings." - Description on how to use elmo module. Take a look at the exporting tool test for the shortest path from module to embeddings. If it is still not clear how to use the module, please let us know what seems to be the missing part.

  2. "Does elmo have word embeddings? " - No, elmo does not have word embeddings in the sense "lookup table from word to embedding". But you still can embed words.

  3. "Does elmo only give sentence embeddings? " - It gives embedding of anything you put in - characters, words, sentences, paragraphs - but it is built for sentence embeddings in mind, more info here.

  4. "How can I use elmo to say get word embeddings and compare their performance against Glove or compare the performance of sentence embeddings. How can I build a matrix of word embeddings as in Glove or word2vec?" - what kind of metric do you want to use for comparison? If you want to use the STS benchmark, take a look here, but there are many more metrics. A very simple setup idea: you could for example reexport your Glove embeddings as a TF-Hub module and just plug both elmo and glove module in the linked STS benchmark.

@HimaVarsha94

This comment has been minimized.

Copy link

HimaVarsha94 commented Sep 27, 2018

Thanks for the elaborate answer above. I am trying to use the above Elmo model and train using some custom data that I have. I made the trainable flag as True, but my weights don't change. Is there a documentation that I can look up?

@arnoegw

This comment has been minimized.

Copy link
Collaborator

arnoegw commented Sep 27, 2018

Like https://tfhub.dev/google/elmo/2 says, the Elmo module only supports fine-tuning the weights for aggregating embeddings from its layers. It does not support training from scratch (new vocab and all). For a similar discussion of Universal Sentence Encoder, please see #155

@PrashantRanjan09

This comment has been minimized.

Copy link

PrashantRanjan09 commented Oct 17, 2018

@HimaVarsha94 try having a look at : https://github.com/PrashantRanjan09/Elmo-Tutorial
See if it can help.

@HimaVarsha94

This comment has been minimized.

Copy link

HimaVarsha94 commented Oct 25, 2018

@PrashantRanjan09 Thanks. You have two tutorials, one to use pre-trained ones and another for starting from scratch. Both I have already done. I am interested in using a pre-trained one at the beginning and training on top of it. The TensorFlow trainable flag is confusing me because I do not see the weights changing with that flag set as True.

@PrashantRanjan09

This comment has been minimized.

Copy link

PrashantRanjan09 commented Oct 25, 2018

@HimaVarsha94: There is one for incremental training as well, which addresses the issue you are facing. I also faced the same issue because when i was training with trainableflag as True, i also didnt see the weights changing. Although in the Tensorflow tutorial they mentioned that the new model hosted on hub allows that.

@Nomiluks

This comment has been minimized.

Copy link

Nomiluks commented Nov 2, 2018

Hi,
How can we train our data to obtain ELMO embeddings ?

@shikhar26

This comment has been minimized.

Copy link

shikhar26 commented Nov 28, 2018

@PrashantRanjan09 hello, i was able to train my model from scratch and was able to generate my weights file. Problem statement is to calculate the cosine similarity between sentence embedding. Please suggest the method to consume my model that gives us an aggregated result from all the layers in form of embedding for a sentence.
we used https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#using-elmo-interactively but the output embedding we receive is a tensor containing output from each layer of the ELMO model (which are 3) and contains vectors for each word in the sentence rather then the sentence as a whole.

@MitaliSodhi

This comment has been minimized.

Copy link

MitaliSodhi commented Dec 18, 2018

@PrashantRanjan09 hello, i was able to train my model from scratch and was able to generate my weights file. Problem statement is to calculate the cosine similarity between sentence embedding. Please suggest the method to consume my model that gives us an aggregated result from all the layers in form of embedding for a sentence.
we used https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#using-elmo-interactively but the output embedding we receive is a tensor containing output from each layer of the ELMO model (which are 3) and contains vectors for each word in the sentence rather then the sentence as a whole.

@PrashantRanjan09 I am also trying to do exactly the same as what you mentioned. Please let me know if you could come up with a solution.

@PrashantRanjan09

This comment has been minimized.

Copy link

PrashantRanjan09 commented Dec 18, 2018

@MitaliSodhi : since you have the vectors, the sentence representation can be an average sum of the vectors along an axis or a weighted sum. You just have to figure out how can you take sum of those vectors along an axis.
I am not sure if you can get the sentence level elmo embedding or if the capability is already provided by allenai.
However, take a look at https://www.tensorflow.org/api_docs/python/tf/math/reduce_sum
It might help.

@PrashantRanjan09

This comment has been minimized.

Copy link

PrashantRanjan09 commented Dec 18, 2018

Also, this is what Mark Neumann (from Allen AI has to say):

  • Elmo does have word embeddings, which are built up from character convolutions. However, when Elmo is used in downstream tasks, a contextual representation of each word is used which relies on the other words in the sentence.
  • Elmo does not produce sentence embeddings, rather it produces embeddings per word "conditioned" on the context. This is why you have to run them through the model.
  • You cannot "build a matrix of word embeddings" for the reasons above.

Please see the tutorial for a comprehensive overview.
See

@jundongq

This comment has been minimized.

Copy link

jundongq commented Dec 26, 2018

The default output vector size is 1024 with 'elmo'. Is it possible to downsize the output, say 300 or 100, like word2vec?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.