use pre-trained embedding with tf.feature_column.embedding_column by initializer #20663

ace-lii · 2018-07-10T08:06:22Z

I used pre-trained embedding with tf.feature_column.embedding_column by parameter initializer ,my code is blow

code here

weight, vocab_size, emb_size = _create_pretrained_emb_from_txt(FLAGS.vocab,FLAGS.pre_emb)
 W = tf.get_variable(trainable=False, name='W', shape=[vocab_size, emb_size])
embedding_placeholder = tf.placeholder(tf.float32, [vocab_size, emb_size])
embedding_init = W.assign(embedding_placeholder)

sess = tf.Session()
sess.run(embedding_init, feed_dict={embedding_placeholder: weight})

itemx_vocab = tf.feature_column.categorical_column_with_vocabulary_file(
    key='itemx',
    vocabulary_file=FLAGS.vocabx)

itemx_emb = tf.feature_column.embedding_column(itemx_vocab,
                                               dimension=emb_size,
                                               initializer=W,
                                               trainable=False)

it says:

ValueError: initializer must be callable if specified. Embedding of column_name: itemx

then I tried set `lambda w: W`.

itemx_emb = tf.feature_column.embedding_column(itemx_vocab,
                                               dimension=emb_size,
                                               initializer=lambda w: W,
                                               trainable=False)

but it doesn't work and got a typeError: TypeError: () got an unexpected keyword argument 'dtype'

I'm confused is this a bug?

thks fou your guys suggestion!

The text was updated successfully, but these errors were encountered:

tensorflowbutler · 2018-07-10T18:53:56Z

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

ace-lii · 2018-07-11T02:40:40Z

thks for quick response! details below:
Have I written custom code: No
OS Platform and Distribution: Linux
TensorFlow installed from: pip
TensorFlow version: 1.5.1
GPU model and memory: NVIDIA-SMI 384.66 16GB
Test Environment: Python 2.7.5

ace-lii · 2018-07-11T07:49:43Z

finally I figured out. Although. I'm not clear why answer above is not effective. @angersson if you saw my question, Thanks to give some suggestion to me.

ok~~~~here is my solvement. Actually from here
stackoverflow: feature-columns-embedding-lookup

itemx_vocab = tf.feature_column.categorical_column_with_vocabulary_file(
        key='itemx',
        vocabulary_file=FLAGS.vocabx)

embedding_initializer_x = tf.contrib.framework.load_embedding_initializer(
        ckpt_path='model.ckpt',
        embedding_tensor_name='w_in',
        new_vocab_size=itemx_vocab.vocabulary_size,
        embedding_dim=emb_size,
        old_vocab_file=FLAGS.vocab_emb,
        new_vocab_file=FLAGS.vocabx
    )
itemx_emb = tf.feature_column.embedding_column(itemx_vocab,
                                                   dimension=128,
                                                   initializer=embedding_initializer_x,
                                                   trainable=False)

angerson · 2018-08-01T20:46:53Z

Looks like this issue has been resolved, so I'm closing it. Thanks for posting your updates!

If you have any more questions that are not bug reports or feature requests, please ask a question on Stack Overflow, which has a wider audience. Thanks!

jagan-mathematics · 2021-07-19T08:31:38Z

You are passing the weights directly to the "embedding_column" layer, but it is expecting callable object.

Best way to create an initializer class and pass it to the embedding_column like this

import tensorflow as tf

class embedding_initialize(tf.keras.initializers.Initializer):
  def __call__(self, shape, dtype=None, **kwargs):
    return tf.compat.v1.Variable(initial_value=[[1, 2], [2,3],[3,4]], dtype = dtype)


one_hot_layer = tf.feature_column.sequence_categorical_column_with_identity('text', num_buckets=3)
text_embedding = tf.feature_column.embedding_column(one_hot_layer,
                                      dimension=2, 
                                      initializer = embedding_initialize())
columns = [text_embedding]

features = {'text': tf.sparse.from_dense([[1, 2], [2, 1]])}

sequence_input_layer = tf.keras.experimental.SequenceFeatures(columns)
sequence_input, sequence_length = sequence_input_layer(features, training=True)

tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Jul 10, 2018

tensorflowbutler assigned angerson Jul 10, 2018

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jul 11, 2018

angerson closed this as completed Aug 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use pre-trained embedding with tf.feature_column.embedding_column by initializer #20663

use pre-trained embedding with tf.feature_column.embedding_column by initializer #20663

ace-lii commented Jul 10, 2018 •

edited

tensorflowbutler commented Jul 10, 2018

ace-lii commented Jul 11, 2018

ace-lii commented Jul 11, 2018 •

edited

angerson commented Aug 1, 2018

jagan-mathematics commented Jul 19, 2021 •

edited

use pre-trained embedding with tf.feature_column.embedding_column by initializer #20663

use pre-trained embedding with tf.feature_column.embedding_column by initializer #20663

Comments

ace-lii commented Jul 10, 2018 • edited

code here

it says:

then I tried set lambda w: W.

tensorflowbutler commented Jul 10, 2018

ace-lii commented Jul 11, 2018

ace-lii commented Jul 11, 2018 • edited

angerson commented Aug 1, 2018

jagan-mathematics commented Jul 19, 2021 • edited

ace-lii commented Jul 10, 2018 •

edited

then I tried set `lambda w: W`.

ace-lii commented Jul 11, 2018 •

edited

jagan-mathematics commented Jul 19, 2021 •

edited