# [ELMo tfhub](https://tfhub.dev/google/elmo/2)
> Embeddings from a language model trained on the 1 Billion Word Benchmark.

## [Overview](https://tfhub.dev/google/elmo/2)

1. Computes contextualized word representations using character-based word representations and bidirectional LSTMs, as described in the paper "[Deep contextualized word representations](https://arxiv.org/abs/1802.05365v2)".
2. This modules supports inputs both in the form of raw text strings or tokenized text strings.

3. The module outputs fixed embeddings at each LSTM layer, a learnable aggregation of the 3 layers, and a fixed mean-pooled vector representation of the input.

4. The complex architecture achieves state of the art results on several benchmarks. Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.

## Inputs
The module defines two signatures: default, and tokens.

With the default signature, the module takes untokenized sentences as input. The input tensor is a string tensor with shape [batch_size]. The module tokenizes each string by splitting on spaces.

With the tokens signature, the module takes tokenized sentences as input. The input tensor is a string tensor with shape [batch_size, max_length] and an int32 tensor with shape [batch_size] corresponding to the sentence length. The length input is necessary to exclude padding in the case of sentences with varying length.

## Outputs
The output dictionary contains:

+ word_emb: the character-based word representations with shape [batch_size, max_length, 512].
+ lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024].
+ lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024].
+ elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024]
+ default: a fixed mean-pooling of all contextualized word representations with shape [batch_size, 1024].

In [1]:
import tensorflow as tf
import tensorflow_hub as hub

In [2]:
elmo = hub.Module("/home/b418/jupyter_workspace/B418_common/袁宵/tfhub_modules/elmo", trainable=True)

INFO:tensorflow:Using /tmp/tfhub_modules to cache modules.


We set the **trainable parameter** to **True** when creating the module so that the 4 scalar weights (as described in the paper) can be trained. In this setting, the module still keeps all other parameters fixed.

## Usage method 1
> signature="default" as_dict=Flase

In [3]:
embeddings = elmo(inputs=["the cat is on the mat", "dogs are in the fog"],as_dict=False,signature="default")
#shape=(batch_size, 1024), dtype=float32
print(embeddings)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
Tensor("module_apply_default/truediv:0", shape=(2, 1024), dtype=float32)


## Usage method 2
> signature="default" as_dict=True

In [4]:
module_features = elmo(inputs=["the cat is on the mat", "dogs are in the fog"],as_dict=True,signature="default")
elmo_embedding = module_features["elmo"]  #[batch_size, max_length, 1024], the weighted sum of the 3 layers, where the weights are trainable.
word_emb = module_features["word_emb"] #[batch_size, max_length, 512], the character-based word representations
lstm_outputs1 = module_features["lstm_outputs1"] #[batch_size, max_length, 1024], the first LSTM hidden state
lstm_outputs2 = module_features["lstm_outputs2"] #[batch_size, max_length, 1024], the second LSTM hidden state
default = module_features["default"] #[batch_size, 1024], a fixed mean-pooling of all contextualized word representations

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


In [5]:
print(elmo_embedding)

Tensor("module_apply_default_1/aggregation/mul_3:0", shape=(2, 6, 1024), dtype=float32)


## Usage method 3
> signature="tokens" as_dict=False

In [6]:
tokens_input = [["the", "cat", "is", "on", "the", "mat"],
["dogs", "are", "in", "the", "fog", ""]]
tokens_length = [6, 5]

In [7]:
embeddings = elmo(inputs={"tokens":tokens_input, "sequence_len": tokens_length}, as_dict=False,signature="tokens")
#shape=(batch_size, 1024), dtype=float32
print(embeddings)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
Tensor("module_apply_tokens/truediv:0", shape=(2, 1024), dtype=float32)


## Usage method 4
> signature="tokens" as_dict=True

In [8]:
tokens_input = [["the", "cat", "is", "on", "the", "mat"],
["dogs", "are", "in", "the", "fog", ""]]
tokens_length = [6, 5]

In [9]:
module_features = elmo(inputs={"tokens":tokens_input, "sequence_len": tokens_length}, as_dict=True, signature="tokens")
elmo_embedding = module_features["elmo"]  #[batch_size, max_length, 1024], the weighted sum of the 3 layers, where the weights are trainable.
word_emb = module_features["word_emb"] #[batch_size, max_length, 512], the character-based word representations
lstm_outputs1 = module_features["lstm_outputs1"] #[batch_size, max_length, 1024], the first LSTM hidden state
lstm_outputs2 = module_features["lstm_outputs2"] #[batch_size, max_length, 1024], the second LSTM hidden state
default = module_features["default"] #[batch_size, 1024], a fixed mean-pooling of all contextualized word representations

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


In [10]:
print(elmo_embedding)

Tensor("module_apply_tokens_1/aggregation/mul_3:0", shape=(2, 6, 1024), dtype=float32)
