[Feature Request] Create demo for using dense, sparse and embedded features. #31

ramakumar1729 · 2019-02-19T23:57:07Z

No description provided.

eggie5 · 2019-03-07T16:02:27Z

Would it be possible to learn a ranker from pairwise data where the features are latent factors (w/o any hand made features)? Like a matrix factorization model?

So the input to the pairwise loss is the respective embeddings for two documents you are ranking...

ramakumar1729 · 2019-03-07T23:29:45Z

Yes, it is possible. transform_fn would be an appropriate place to convert your features to latent vectors. Whether you jointly train such representations with the ranker depends on your problem.

eggie5 · 2019-03-07T23:41:08Z

Yes, I was thinking about jointly training a document embedding. I have pairwise labels (A > B, etc). For each labeled pair (A,B), I'll lookup their embeddings (A_emb, B_emb) and use that as the document features. This would replace classical LTR query-document features (I don't have any queries in my context anyways). Not sure what you mean w/ the transform_fn but I'll research the code a bit.

eggie5 · 2019-03-18T18:07:52Z

Here's my example (modified from tf ranking example) of using an embedding to learn a latent factor model:

from tensorflow.feature_column import categorical_column_with_identity, embedding_column

_BUCKETS=10000
_K=10

def make_score_fn(buckets):
  """Returns a scoring function to build `EstimatorSpec`."""

  def _score_fn(context_features, group_features, mode, params, config):
    """Defines the network to score a documents."""
    del params
    del config

    item_id = categorical_column_with_identity(key='item_id', num_buckets=_BUCKETS, default_value=0)
    item_emb = embedding_column(item_id, _K)
    
    input_layer = tf.feature_column.input_layer(group_features, item_emb)

    cur_layer = input_layer
    for i, layer_width in enumerate(int(d) for d in _HIDDEN_LAYER_DIMS):
      cur_layer = tf.layers.dense(cur_layer, units=layer_width, activation="relu")

    logits = tf.layers.dense(cur_layer, units=1) #regression
    return logits

  return _score_fn

I modified my input function to only return an item_id which is an integer number that maps to an embedding. But it could be concerted w/ any other arbitrary features into fixed length vector for the FC layers.

This gets good results on my ranking task for MRR:

baseline: .54
LF model: .76

ramakumar1729 · 2019-03-18T18:34:47Z

Thanks for sharing this example, Alex. This looks great. If you wish, you could define the feature columns outside, so that you can also use them to make the parsing_spec to read in tf.Examples or tf.SequenceExamples.

eggie5 · 2019-04-02T03:19:48Z

When you say "define your feature columns outside", you mean like in the notebook example, where there is a example_feature_columns function which is called from the _score_fn function?

Also, I don't understand what the transform_fn function is for. Can you provide an example?

darlliu · 2019-04-17T00:46:54Z

Hello, I would just like to chime in that having an example of using feature columns where group_size and feature dimension is not 1 would be helpful. I can use a groupwise feature tensor directly with dimension [?, group_size (10), feature_dimension (180)] but when I put it in a numeric feature column with shape [group_size, feature_dimension] I get the following error on this line in _groupwise_dnn_v2:

      scores = _score_fn(
          large_batch_context_features, large_batch_group_features, reuse=False)

error is:

ValueError: Dimensions must be equal, but are 1800 and 180 for 'groupwise_dnn_v2/group_score/rescale/mul' (op: 'Mul') with input shapes: [?,1800], [180].

I think it has something to do with the shape to the feature column but I'm unsure what's the issue here.

sjhermanek · 2019-04-17T17:22:59Z

FWIW I ran into the same issue as @darlliu

ramakumar1729 · 2019-07-12T08:37:12Z

Please check out the demo on using sparse features and embeddings in TF-Ranking. You can click on the colab link to start executing the content of the notebook.

Tensorboard is integrated into the notebook, and you can use it to visualize the eval and loss curves.

Feel free to post your feedback by responding on this issue.

eggie5 · 2019-07-12T16:23:38Z

Thanks for posting some concrete examples in this new notebook. Some questions:

I see a new data format EIE whereas the previous examples seemed to use SequenceExample. What are some of the motivations on moving to the new format?
Also what is the intuition behind this?: _NUM_TRAIN_STEPS = 15 * 1000
"The transform function takes in the raw dense or sparse features from the input reader, applies suitable transformations to return dense representations for each feature"
I never understood what the transform function where for until reading this line. I always just made my features in the scoring function.
I noticed you moved away from tf.contrib.layers.optimize_loss. It was a nice abstraction, however it was probably replaced b/c contrib is being depreciated in tf 2.0?

eggie5 · 2019-07-12T18:22:16Z

And also, I think it would be nice to have the hyper params passed to the transform_fn just like the group_score_fn so you can do something like this:

def example_feature_columns(params):
    
    rest_id = categorical_column_with_identity(key='rid', num_buckets=item_buckets)
    rest_emb = embedding_column(rest_id, params.K)

    return {"rid": rest_emb}


def make_transform_fn():
    def _transform_fn(features, mode, params):
        """Defines transform_fn."""
        example_name = next(six.iterkeys(example_feature_columns(params)))
        input_size = tf.shape(input=features[example_name])[1]
        context_features, example_features = tfr.feature.encode_listwise_features(
            features=features,
            input_size=input_size,
            context_feature_columns=context_feature_columns(),
            example_feature_columns=example_feature_columns(),
            mode=mode,
            scope="transform_layer")

        return context_features, example_features
    return _transform_fn

See how I added params to the signature of _transform_fn so now it can accept hyper params like the group_score_fn?

The use case for this is the common one where the embedding dimensions are a hyper parameter.

Would you accept a PR from me for this?

ramakumar1729 · 2019-07-13T00:28:43Z

Hi Alex, great set of questions. Please find my replies inline.

Thanks for posting some concrete examples in this new notebook. Some questions:

I see a new data format EIE whereas the previous examples seemed to use SequenceExample. What are some of the motivations on moving to the new format?
The primary motivation for EIE is that it has the per-query and per-document (which we call context and example features) in self contained tf.Examples. SequenceExample represents in a feature major format, and EIE represents in a document major format. Does this make sense?

Also what is the intuition behind this?: _NUM_TRAIN_STEPS = 15 * 1000
The number of training steps are kept low for the sake of demonstration. This should show the curves in Tensorboard in around 15 mins. You can assign this to any number you feel appropriate.

"The transform function takes in the raw dense or sparse features from the input reader, applies suitable transformations to return dense representations for each feature"
I never understood what the transform function where for until reading this line. I always just made my features in the scoring function.
The transform function applies the transformation for all the documents only once, while the logic in group score function is applied over each group. Think of the transform function as a "pre-precocessing" before sending to scoring function.

I noticed you moved away from tf.contrib.layers.optimize_loss. It was a nice abstraction, however it was probably replaced b/c contrib is being depreciated in tf 2.0?
Yes, that is exactly the reason. Our repository is TF2.0 alpha compatible.

ramakumar1729 · 2019-07-13T00:30:26Z

I like this PR suggestion. Please go ahead with this. One thing to keep in mind is that you will need to change the model_fn builder, which expects only (features, mode) argument. See this line for more details.

And also, I think it would be nice to have the hyper params passed to the transform_fn just like the group_score_fn so you can do something like this:

def example_feature_columns(params):
    
    rest_id = categorical_column_with_identity(key='rid', num_buckets=item_buckets)
    rest_emb = embedding_column(rest_id, params.K)

    return {"rid": rest_emb}


def make_transform_fn():
    def _transform_fn(features, mode, params):
        """Defines transform_fn."""
        example_name = next(six.iterkeys(example_feature_columns(params)))
        input_size = tf.shape(input=features[example_name])[1]
        context_features, example_features = tfr.feature.encode_listwise_features(
            features=features,
            input_size=input_size,
            context_feature_columns=context_feature_columns(),
            example_feature_columns=example_feature_columns(),
            mode=mode,
            scope="transform_layer")

        return context_features, example_features
    return _transform_fn

See how I added params to the signature of _transform_fn so now it can accept hyper params like the group_score_fn?

The use case for this is the common one where the embedding dimensions are a hyper parameter.

Would you accept a PR from me for this?

tensorflow#31 (comment)

ramakumar1729 mentioned this issue Feb 19, 2019

How to use embedding vectors in lambdamart #29

Closed

ramakumar1729 self-assigned this Mar 7, 2019

bendersky changed the title ~~Create demo for using dense, sparse and embedded features.~~ [Feature Request] Create demo for using dense, sparse and embedded features. Apr 15, 2019

bendersky added the enhancement New feature or request label Apr 15, 2019

ramakumar1729 closed this as completed Jul 12, 2019

eggie5 added a commit to eggie5/ranking that referenced this issue Jul 13, 2019

Added params to transform_fn

a5cfee7

tensorflow#31 (comment)

eggie5 mentioned this issue Jul 13, 2019

Added params to transform_fn #81

Merged

eggie5 added a commit to eggie5/ranking that referenced this issue Jul 13, 2019

Added params to transform_fn

8c116d3

tensorflow#31 (comment)

eggie5 added a commit to eggie5/ranking that referenced this issue Jul 13, 2019

Added params to transform_fn

0997936

tensorflow#31 (comment)

eggie5 added a commit to eggie5/ranking that referenced this issue Jul 13, 2019

Added params to transform_fn

e2f2144

tensorflow#31 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Create demo for using dense, sparse and embedded features. #31

[Feature Request] Create demo for using dense, sparse and embedded features. #31

ramakumar1729 commented Feb 19, 2019

eggie5 commented Mar 7, 2019

ramakumar1729 commented Mar 7, 2019

eggie5 commented Mar 7, 2019 •

edited

Loading

eggie5 commented Mar 18, 2019 •

edited

Loading

ramakumar1729 commented Mar 18, 2019 •

edited

Loading

eggie5 commented Apr 2, 2019

darlliu commented Apr 17, 2019

sjhermanek commented Apr 17, 2019

ramakumar1729 commented Jul 12, 2019

eggie5 commented Jul 12, 2019 •

edited

Loading

eggie5 commented Jul 12, 2019 •

edited

Loading

ramakumar1729 commented Jul 13, 2019

ramakumar1729 commented Jul 13, 2019

[Feature Request] Create demo for using dense, sparse and embedded features. #31

[Feature Request] Create demo for using dense, sparse and embedded features. #31

Comments

ramakumar1729 commented Feb 19, 2019

eggie5 commented Mar 7, 2019

ramakumar1729 commented Mar 7, 2019

eggie5 commented Mar 7, 2019 • edited Loading

eggie5 commented Mar 18, 2019 • edited Loading

ramakumar1729 commented Mar 18, 2019 • edited Loading

eggie5 commented Apr 2, 2019

darlliu commented Apr 17, 2019

sjhermanek commented Apr 17, 2019

ramakumar1729 commented Jul 12, 2019

eggie5 commented Jul 12, 2019 • edited Loading

eggie5 commented Jul 12, 2019 • edited Loading

ramakumar1729 commented Jul 13, 2019

ramakumar1729 commented Jul 13, 2019

eggie5 commented Mar 7, 2019 •

edited

Loading

eggie5 commented Mar 18, 2019 •

edited

Loading

ramakumar1729 commented Mar 18, 2019 •

edited

Loading

eggie5 commented Jul 12, 2019 •

edited

Loading

eggie5 commented Jul 12, 2019 •

edited

Loading