Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textclassifier - Multiple labels during training, prediction on only one #768

Closed
TDaudert opened this issue May 30, 2019 · 2 comments
Closed
Labels
question Further information is requested wontfix This will not be worked on

Comments

@TDaudert
Copy link

Hi all,

I'm currently looking into the text classifier and face a question regarding the use of multiple labels at training time with the goal of predicting only one.

As an example, I borrowed the code of #678

wohlg@wohlg-XPS:~/itmo/misc/cooking_classification/preprocessed$ head cooking.train 
__label__sauce __label__cheese how much does potato starch affect a cheese sauce recipe ? 
__label__food-safety __label__acidity dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove how do i cover up the white spots on my cast iron stove ? 
__label__restaurant michelin three star restaurant; but if the chef is not there
__label__knife-skills __label__dicing without knife skills ,  how can i quickly and accurately dice vegetables ? 
__label__storage-method __label__equipment __label__bread what ' s the purpose of a bread box ?
{
    "TRAIN": {
        "dataset": "TRAIN",
        "total_number_of_documents": 12404,
        "number_of_documents_per_class": {
            "sauce": 332,
            "cheese": 235,
            "food-safety": 967,
            "acidity": 33,
            "cast-iron": 111,
....

Let's say I provide multiple labels since sauce might be correlated with cheese which is of value for a classifier, in addition to providing the text.

However, my final goal is to label texts as sauce. I am not interested in labelling texts using the other four labels. Is there any setting/parameter I can use to tell the classifier to use all provided data (including all labels) but to optimize for a prediction of sauce?

Best,
Tobias

@TDaudert TDaudert added the question Further information is requested label May 30, 2019
@alanakbik
Copy link
Collaborator

Hello @TDaudert that's an interesting use case. We don't currently offer such an option. I wonder how something like this might best be implemented - I guess we could modify the loss function so that errors in the desired class weigh more heavily than the other classes. See a related discussion here.

We could also do data sampling so that data from the relevant class gets upsampled during training. I think this somewhat fits our ongoing work in looking at problems with class imbalance, so maybe if we find a good solution there it could also apply to this use case. We'll definitely keep your use case in mind - please also let us know if you find a good solution.

@stale
Copy link

stale bot commented Apr 30, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Apr 30, 2020
@stale stale bot closed this as completed May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants