Textclassifier - Multiple labels during training, prediction on only one #768

TDaudert · 2019-05-30T18:46:43Z

Hi all,

I'm currently looking into the text classifier and face a question regarding the use of multiple labels at training time with the goal of predicting only one.

As an example, I borrowed the code of #678

wohlg@wohlg-XPS:~/itmo/misc/cooking_classification/preprocessed$ head cooking.train 
__label__sauce __label__cheese how much does potato starch affect a cheese sauce recipe ? 
__label__food-safety __label__acidity dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove how do i cover up the white spots on my cast iron stove ? 
__label__restaurant michelin three star restaurant; but if the chef is not there
__label__knife-skills __label__dicing without knife skills ,  how can i quickly and accurately dice vegetables ? 
__label__storage-method __label__equipment __label__bread what ' s the purpose of a bread box ?

{
    "TRAIN": {
        "dataset": "TRAIN",
        "total_number_of_documents": 12404,
        "number_of_documents_per_class": {
            "sauce": 332,
            "cheese": 235,
            "food-safety": 967,
            "acidity": 33,
            "cast-iron": 111,
....

Let's say I provide multiple labels since sauce might be correlated with cheese which is of value for a classifier, in addition to providing the text.

However, my final goal is to label texts as sauce. I am not interested in labelling texts using the other four labels. Is there any setting/parameter I can use to tell the classifier to use all provided data (including all labels) but to optimize for a prediction of sauce?

Best,
Tobias

The text was updated successfully, but these errors were encountered:

alanakbik · 2019-06-12T10:05:02Z

Hello @TDaudert that's an interesting use case. We don't currently offer such an option. I wonder how something like this might best be implemented - I guess we could modify the loss function so that errors in the desired class weigh more heavily than the other classes. See a related discussion here.

We could also do data sampling so that data from the relevant class gets upsampled during training. I think this somewhat fits our ongoing work in looking at problems with class imbalance, so maybe if we find a good solution there it could also apply to this use case. We'll definitely keep your use case in mind - please also let us know if you find a good solution.

stale · 2020-04-30T01:10:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

TDaudert added the question Further information is requested label May 30, 2019

stale bot added the wontfix This will not be worked on label Apr 30, 2020

stale bot closed this as completed May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Textclassifier - Multiple labels during training, prediction on only one #768

Textclassifier - Multiple labels during training, prediction on only one #768

TDaudert commented May 30, 2019

alanakbik commented Jun 12, 2019

stale bot commented Apr 30, 2020

Textclassifier - Multiple labels during training, prediction on only one #768

Textclassifier - Multiple labels during training, prediction on only one #768

Comments

TDaudert commented May 30, 2019

alanakbik commented Jun 12, 2019

stale bot commented Apr 30, 2020