Class weight support #57
Comments
Usually, there are two ways to handle imbalanced data:
xent = tf.mul(xent, loss_weight) and then pass it via the |
I think this is already implemented here https://github.com/tensorflow/skflow/blob/master/skflow/models.py#L57 and here https://github.com/tensorflow/skflow/blob/master/skflow/ops/losses_ops.py#L52 |
Thanks @lopuhin , so what is the correct way to use it? (in a case of unbalanced dataset, 90% of class A and 10% of class B)? |
@vinhqdang sorry, I was wrong - I don't think that existing implementation is correct, because |
And I am not sure if is is possible to implement it in terms of existing tensorflow loss functions? It seems one will need to defined a loss function similar to |
Ah, no, it should be possible - we just need to multiply |
@lopuhin It's possible, and as you mentioned https://github.com/tensorflow/skflow/blob/master/skflow/ops/losses_ops.py#L52 partially implements this bug (it does multiple xent for each class by weight of the class). The only missing piece is passing it from estimator (e.g. TFLinearClassifier(..., class_weights={1: 0.9, 0: 0.1}) to the models and losses. I didn't think of a good interface yet to do this (right now it would need to be an argument for every model function). |
What's currently there can be used by creating an explicit TF constant and initializing it with your weights: def my_model(X, y):
class_weight = tf.constant([0.9, 0.1]))
return skflow.models.logistic_regression(X, y, class_weight=class_weight)
estimator = skflow.TensorFlowEstimator(model_fn=my_model, n_classes=2, ...other args...) |
Thats what I thought @ilblackdragon , but for me it fails with |
@lopuhin, You are right, |
…he math should work as -weight[class]*x[class] + log( sum ( exp weighted x))
Ok, so instead, I moved it up to multiple logits. |
@ilblackdragon for me a more natural solution would be something like this lopuhin@5c97849 - here I apply weight to xent of each label depending of what labels it is. I think this is different mathematically from scaling logits. But I am still learning, so take this with a grin of salt please :) |
@lopuhin Change importance in cross-entropy is to adjust relative importance of all the classes to each other (skewing distribution) when in your option it will only adjusting the weight of one class. But your option may work in practice. I'll double check with few people what is the best way. |
Hi,
I am using
skflow.ops.dnn
to classify two - classes dataset (True and False). The percentage of True example is very small, so I have an imbalanced dataset.It seems to me that one way to resolve the issue is to use weighted classes. However, when I look to the implementation of
skflow.ops.dnn
, I do not know how could I do weighted classes with DNN.Is it possible to do that with skflow, or is there another technique to deal with imbalanced dataset problem in skflow?
Thanks
The text was updated successfully, but these errors were encountered: