You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm using the Universal Sentence Encoder to compute the embeddings representations of text sentences stored into a dataframe, using the code below (use_preprocessing).
Now, I would like to also extract the internal representation of the words that compose the sentence to encode.
I've seen the issue #344 and, for what I understood, the model I'm using compute the final embedding through Σ_w Embed(w) / √sentence length .
The result that I would like to achieve is the following:
Given a sentence "Example text" I would like to extract the final embedding for "Example text" and the internal embedding for each token (i.e the. ebedding of "Example" and the embedding of "text").
I've also seen the issue #403 but it's still not clear to me how to access the token level embeddings.
Thank you,
Giulia
Relevant code
defuse_preprocessing(df, column):
"""Compute the embedding via the Universal Sentence Encode algorithm for every sentence in the given column Args: df: Dataframe column: column name to identify data to process """# Universal Sentence Encodertf.compat.v1.disable_eager_execution()
config=tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth=Truemodule_url="https://tfhub.dev/google/universal-sentence-encoder-large/3?tf-hub-format=compressed"embed=hub.Module(module_url)
withtf.compat.v1.Session(config=config) assession:
session.run([tf.compat.v1.global_variables_initializer(), tf.compat.v1.tables_initializer()])
text_embedding=session.run(embed(list(x[column])))
session.close()
returntext_embedding
Relevant log output
----
tensorflow_hub Version
0.13.0.dev (unstable development build)
TensorFlow Version
2.8 (latest stable release)
Other libraries
No response
Python Version
3.x
OS
Linux
The text was updated successfully, but these errors were encountered:
universal-sentence-encoder-large/3 has a transformer encoder and it uses the average pooling over all token embeddings at the last transformer layer as the output embedding.
Unfortunately depending on the model's architecture and the ops used it may be hard or impossible to get the underlying gradients at the token level.
Universal Sentence Encoder provides sentence encoding and cannot provide word level encodings. For word level encodings you can use Word2Vec or GloVe as shown here or follow this tutorial to create custom word embedding layer.
What happened?
Hi,
I'm using the Universal Sentence Encoder to compute the embeddings representations of text sentences stored into a dataframe, using the code below (use_preprocessing).
Now, I would like to also extract the internal representation of the words that compose the sentence to encode.
I've seen the issue #344 and, for what I understood, the model I'm using compute the final embedding through
Σ_w Embed(w) / √sentence length
.The result that I would like to achieve is the following:
Given a sentence "Example text" I would like to extract the final embedding for "Example text" and the internal embedding for each token (i.e the. ebedding of "Example" and the embedding of "text").
I've also seen the issue #403 but it's still not clear to me how to access the token level embeddings.
Thank you,
Giulia
Relevant code
Relevant log output
tensorflow_hub Version
0.13.0.dev (unstable development build)
TensorFlow Version
2.8 (latest stable release)
Other libraries
No response
Python Version
3.x
OS
Linux
The text was updated successfully, but these errors were encountered: