INVALID_ARGUMENT: No defined default loss for this combination of label type and task #100

AlirezaSadeghi · 2022-04-19T12:31:40Z

I'm trying to use GradientBoostedTreesModel in a TFX pipeline, the code is roughly as follows:

model = tfdf.keras.GradientBoostedTreesModel(
        task=tfdf.keras.Task.CLASSIFICATION,
        num_trees=200,
        max_depth=6,
        verbose=True,
        hyperparameter_template="better_default",
        name="classifier",
    )
model.compile(metrics=[tf.keras.metrics.AUC(), "accuracy"])
model.fit(_input_fn(fn_args.train_files, fn_args.schema_path))

This unfortunately gives me an INVALID_ARGUMENT: No defined default loss for this combination of label type and task exception and fails the model training.

Definition of _input_fn is as follows:

def _input_fn(...):
        tf.data.TFRecordDataset(
            tf.data.Dataset.list_files(files), compression_type="GZIP"
        )
        .batch(1024)
        .map(
            lambda batch: tf.io.parse_example(batch, specs),
            num_parallel_calls=tf.data.AUTOTUNE,
        )
        .map(lambda batch: (batch, batch.pop(FeatureManager.LABEL_KEY)))
        .cache()
        .prefetch(tf.data.AUTOTUNE)

Which basically parses the schema into feature specs, parses the batch of TF-examples and finally maps them to a tuple of (Dict[feature_name, Tensor], Tensor), results is like this:

<PrefetchDataset 
 element_spec=(
   {'feature1': TensorSpec(shape=(None, 1), dtype=tf.float32, name=None), 'feature2': ...}, 
   TensorSpec(shape=(None, 1), dtype=tf.int64, name=None)
  )
>

Labels can be 0 or 1 and the task is a binary classification task.

Any idea what I might be doing wrong here?

Mac OS Monterey, tfdv 0.2.4, python 3.8, tfx 1.7

The text was updated successfully, but these errors were encountered:

Cheril311 · 2022-04-19T18:37:12Z

@AlirezaSadeghi can you specify your label type?

AlirezaSadeghi · 2022-04-19T18:39:48Z

@Cheril311 If I'm understanding you correctly, I've already done it in the text, it's the 2nd entry in the PrefetchDataset tuple (namely TensorSpec(shape=(None, 1), dtype=tf.int64, name=None)).

It's an integer with values either 0 or 1, but as we're reading it in batches, it's of type (None, 1).

So the dataset that's being passed to model.fit is a tuple of (Dict {feature name -> Tensor(None, 1)}, Label Tensor(None, 1))

Did I answer your question? If not please elaborate if possible.

Cheril184 · 2022-04-19T19:06:53Z

@AlirezaSadeghi my bad

achoum · 2022-04-20T06:53:34Z

Hi AlirezaSadeghi,

If the loss argument of the Gradient boosted tree is not specified, it is selected automatically from the label type, label values and task. The error you reported indicates that there is no loss matching your label.

Looking at your example, a likely situation is that your int64 label only contains zeros. Can you check it?

Alternatively, you can specify the loss to be the "BINOMIAL_LOG_LIKELIHOOD" i.e. binary classification loss.

On my side, I'll improve the error message for this particular situation.

AlirezaSadeghi · 2022-04-20T09:49:54Z

Hi AlirezaSadeghi,

If the loss argument of the Gradient boosted tree is not specified, it is selected automatically from the label type, label values and task. The error you reported indicates that there is no loss matching your label.

Looking at your example, a likely situation is that your int64 label only contains zeros. Can you check it?

Alternatively, you can specify the loss to be the "BINOMIAL_LOG_LIKELIHOOD" i.e. binary classification loss.

On my side, I'll improve the error message for this particular situation.

Hi @achoum ,

Yup your assumption is actually right, I'm just testing the pipeline and running the model on a part of the training set, which includes all zeros for starters. Didn't know that might become an issue.

I'll try with BINOMIAL_LOG_LIKELIHOOD and get back to you.

AlirezaSadeghi · 2022-04-20T10:55:17Z

Okay doing that, it tells me this:

INVALID_ARGUMENT: Binomial log likelihood loss is only compatible with a BINARY classification task

It's somehow assuming the task is not "binary classification"?

AlirezaSadeghi · 2022-04-22T10:07:11Z

@achoum just an fyi, have you seen my last comment? Wondering if you've got any further insights.

rstz · 2022-04-22T11:25:28Z

If your task is not a binary classification task, you can try setting the loss to MULTINOMIAL_LOG_LIKELIHOOD

AlirezaSadeghi · 2022-04-22T11:26:53Z

My task "is" binary classification, and the labels are all 0s, don't know how it's assuming the task is not "binary classification". (as I've already mentioned before)

rstz · 2022-04-22T11:27:58Z

Oh, apologies, I overlooked that part in your first message

AlirezaSadeghi · 2022-05-04T09:13:48Z

@achoum No new updates/insights on this? 😔

achoum · 2022-06-22T06:47:24Z

If all your labels are all 0, the framework detects that this is not a binary classification and fails.
If you want to test binary classification, can you create a synthetic dataset with both 0 and 1?

While for unit testing, training on dataset where all the labels have the value could make sense, this error/failure helps to catch error in datasets.

achoum closed this as completed Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INVALID_ARGUMENT: No defined default loss for this combination of label type and task #100

INVALID_ARGUMENT: No defined default loss for this combination of label type and task #100

AlirezaSadeghi commented Apr 19, 2022 •

edited

Cheril311 commented Apr 19, 2022

AlirezaSadeghi commented Apr 19, 2022 •

edited

Cheril184 commented Apr 19, 2022

achoum commented Apr 20, 2022

AlirezaSadeghi commented Apr 20, 2022 •

edited

AlirezaSadeghi commented Apr 20, 2022

AlirezaSadeghi commented Apr 22, 2022

rstz commented Apr 22, 2022

AlirezaSadeghi commented Apr 22, 2022

rstz commented Apr 22, 2022

AlirezaSadeghi commented May 4, 2022

achoum commented Jun 22, 2022

INVALID_ARGUMENT: No defined default loss for this combination of label type and task #100

INVALID_ARGUMENT: No defined default loss for this combination of label type and task #100

Comments

AlirezaSadeghi commented Apr 19, 2022 • edited

Cheril311 commented Apr 19, 2022

AlirezaSadeghi commented Apr 19, 2022 • edited

Cheril184 commented Apr 19, 2022

achoum commented Apr 20, 2022

AlirezaSadeghi commented Apr 20, 2022 • edited

AlirezaSadeghi commented Apr 20, 2022

AlirezaSadeghi commented Apr 22, 2022

rstz commented Apr 22, 2022

AlirezaSadeghi commented Apr 22, 2022

rstz commented Apr 22, 2022

AlirezaSadeghi commented May 4, 2022

achoum commented Jun 22, 2022

AlirezaSadeghi commented Apr 19, 2022 •

edited

AlirezaSadeghi commented Apr 19, 2022 •

edited

AlirezaSadeghi commented Apr 20, 2022 •

edited