Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use pre-trained embedding with tf.feature_column.embedding_column by initializer #20663

Closed
ace-lii opened this issue Jul 10, 2018 · 5 comments
Closed
Assignees

Comments

@ace-lii
Copy link

ace-lii commented Jul 10, 2018

I used pre-trained embedding with tf.feature_column.embedding_column by parameter initializer ,my code is blow

code here

weight, vocab_size, emb_size = _create_pretrained_emb_from_txt(FLAGS.vocab,FLAGS.pre_emb)
 W = tf.get_variable(trainable=False, name='W', shape=[vocab_size, emb_size])
embedding_placeholder = tf.placeholder(tf.float32, [vocab_size, emb_size])
embedding_init = W.assign(embedding_placeholder)

sess = tf.Session()
sess.run(embedding_init, feed_dict={embedding_placeholder: weight})

itemx_vocab = tf.feature_column.categorical_column_with_vocabulary_file(
    key='itemx',
    vocabulary_file=FLAGS.vocabx)

itemx_emb = tf.feature_column.embedding_column(itemx_vocab,
                                               dimension=emb_size,
                                               initializer=W,
                                               trainable=False)

it says:

ValueError: initializer must be callable if specified. Embedding of column_name: itemx

then I tried set lambda w: W.

itemx_emb = tf.feature_column.embedding_column(itemx_vocab,
                                               dimension=emb_size,
                                               initializer=lambda w: W,
                                               trainable=False)

but it doesn't work and got a typeError: TypeError: () got an unexpected keyword argument 'dtype'

I'm confused is this a bug?

thks fou your guys suggestion!

@tensorflowbutler tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Jul 10, 2018
@tensorflowbutler
Copy link
Member

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

@ace-lii
Copy link
Author

ace-lii commented Jul 11, 2018

thks for quick response! details below:
Have I written custom code: No
OS Platform and Distribution: Linux
TensorFlow installed from: pip
TensorFlow version: 1.5.1
GPU model and memory: NVIDIA-SMI 384.66 16GB
Test Environment: Python 2.7.5

@ace-lii
Copy link
Author

ace-lii commented Jul 11, 2018

finally I figured out. Although. I'm not clear why answer above is not effective. @angersson if you saw my question, Thanks to give some suggestion to me.

ok~~~~here is my solvement. Actually from here
stackoverflow: feature-columns-embedding-lookup

itemx_vocab = tf.feature_column.categorical_column_with_vocabulary_file(
        key='itemx',
        vocabulary_file=FLAGS.vocabx)

embedding_initializer_x = tf.contrib.framework.load_embedding_initializer(
        ckpt_path='model.ckpt',
        embedding_tensor_name='w_in',
        new_vocab_size=itemx_vocab.vocabulary_size,
        embedding_dim=emb_size,
        old_vocab_file=FLAGS.vocab_emb,
        new_vocab_file=FLAGS.vocabx
    )
itemx_emb = tf.feature_column.embedding_column(itemx_vocab,
                                                   dimension=128,
                                                   initializer=embedding_initializer_x,
                                                   trainable=False)

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jul 11, 2018
@angerson
Copy link
Contributor

angerson commented Aug 1, 2018

Looks like this issue has been resolved, so I'm closing it. Thanks for posting your updates!

If you have any more questions that are not bug reports or feature requests, please ask a question on Stack Overflow, which has a wider audience. Thanks!

@angerson angerson closed this as completed Aug 1, 2018
@jagan-mathematics
Copy link

jagan-mathematics commented Jul 19, 2021

You are passing the weights directly to the "embedding_column" layer, but it is expecting callable object.

Best way to create an initializer class and pass it to the embedding_column like this

import tensorflow as tf

class embedding_initialize(tf.keras.initializers.Initializer):
  def __call__(self, shape, dtype=None, **kwargs):
    return tf.compat.v1.Variable(initial_value=[[1, 2], [2,3],[3,4]], dtype = dtype)


one_hot_layer = tf.feature_column.sequence_categorical_column_with_identity('text', num_buckets=3)
text_embedding = tf.feature_column.embedding_column(one_hot_layer,
                                      dimension=2, 
                                      initializer = embedding_initialize())
columns = [text_embedding]

features = {'text': tf.sparse.from_dense([[1, 2], [2, 1]])}

sequence_input_layer = tf.keras.experimental.SequenceFeatures(columns)
sequence_input, sequence_length = sequence_input_layer(features, training=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants