Use pre-trained embedding instead of randome one #14

lacls · 2019-09-20T17:17:32Z

Hi there, as I did see in your attn_bi_lstm.py (maybe also other approaches), used a random technique in embedding words. I did give it a try to use pre-trained embedding, however, I do not know how to set it up, also get an error with "must have rank at least 3" (so sorry, I am newer to Tensorflow). Thank you and much appreciate.

Word embedding

    embeddings_var = tf.Variable(tf.random_uniform([self.vocab_size, self.embedding_size], -1.0, 1.0),
                                 trainable=True)
    batch_embedded = tf.nn.embedding_lookup(embeddings_var, self.x)

    rnn_outputs, _ = bi_rnn(BasicLSTMCell(self.hidden_size),
                            BasicLSTMCell(self.hidden_size),
                            inputs=batch_embedded, dtype=tf.float32)

My trial

def generate_embedding(word_index, model_embedding,EMBEDDING_DIM):
  count6 = 0
   countNot6 = 0
   #embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM)) 
   embedding_matrix = np.asarray([np.random.uniform(-0.01,0.01,EMBEDDING_DIM) for _ in range((len(word_index) + 1))])
   list_oov = []
   for word, i in word_index.items():
       try:
           embedding_vector = model_embedding[word]
       except:
           list_oov.append(word)
           countNot6 +=1
           continue
       if embedding_vector is not None:
           count6 +=1
           embedding_matrix[i] = embedding_vector
   return embedding_matrix

batch_embedded = generate_embedding(word_index,word_embedding,EMBEDDING_DIM)
rnn_outputs, _ = bi_rnn(BasicLSTMCell(self.hidden_size),
                               BasicLSTMCell(self.hidden_size),
                               inputs=batch_embedded, dtype=tf.float32)

Note that, I got an error at inputs=batch_embedded

The text was updated successfully, but these errors were encountered:

TobiasLee · 2019-10-09T09:30:29Z

Hi, according to your description, I guess that you want to use pre-trained word embedding to initialize the embedding matrix. A common way is to assign the pre-trained word embedding to the embedding

# create embedding matrix
embeddings_var = tf.Variable(tf.constant(0.0, shape=[vocab_size_after_process, embedding_dim]), trainable=False)
# define a embedding placeholder to pass the pre-trained word embedding in
embedding_placeholder = tf.placeholder(tf.float32, [vocab_size_after_process, embedding_dim]) 
embedding_init = embeddings_var.assign(embedding_placeholder) # an assign operation

Then you can run the embedding_init in your session like

with tf.Session() as sess:
  sess.run(embedding_init, feed_dict={embedding_placehoder:embedding})
  # ... other code

TobiasLee closed this as completed Oct 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pre-trained embedding instead of randome one #14

Use pre-trained embedding instead of randome one #14

lacls commented Sep 20, 2019 •

edited

TobiasLee commented Oct 9, 2019

Use pre-trained embedding instead of randome one #14

Use pre-trained embedding instead of randome one #14

Comments

lacls commented Sep 20, 2019 • edited

Word embedding

My trial

TobiasLee commented Oct 9, 2019

lacls commented Sep 20, 2019 •

edited