How to pad data in Tensorflow Transform so that sequences are of equal lenght ?

I am building a data pipeline using Tensorflow Transform using Apache Beam. Inside the preprocessing function I am generating vocabulary which is converting my input sentences to list of integers.
My data has 3 features:
Column 1 = Context (dtype = String) (Sentences of varying Length)
Column 2 = Utterance (dtype = String) (Sentences of varying Length)
Column 3 = Label (0/1)
I want to right pad my list of integers as soon as they are generated in the preprocessing function below to a max length of 160 words.
Example:
"I love Pizza" --> [34, 67, 78] --> Max length I want = 10
then I want [34, 67, 78,0,0,0,0,0,0,0] and if the length of my sentence is already greater than 10, then I want to trim the extra portion to make it length =10

Now to use tf.keras.preprocessing.sequence.pad_sequences you need as input a list of sequences but as shown below, my mapped_context and mapped_utterance are tf.int64 tensors. So I am not able to use the padding functionality of Keras.
Can someone please help me achieve this ?

The reference code I am following is Tensorflow Sentiment Analysis Example:
https://github.com/tensorflow/transform/blob/599691c8b94bbd6ee7f67c11542e7fef1792a566/examples/sentiment_example.py

My code's preprocessing function is below:

![image](https://user-images.githubusercontent.com/27782859/79502652-e691f500-7ffd-11ea-8c72-c51705a77324.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to pad data in Tensorflow Transform so that sequences are of equal lenght ? #171

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to pad data in Tensorflow Transform so that sequences are of equal lenght ? #171

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions