Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test on unclassified data sets #15

Open
sgiraudot opened this issue Mar 21, 2019 · 3 comments
Open

Test on unclassified data sets #15

sgiraudot opened this issue Mar 21, 2019 · 3 comments
Assignees

Comments

@sgiraudot
Copy link

Hello,

Once a model has been trained, is there a way to apply to unclassified data? As far as I can tell, the configuration file does not differentiate the validation set (which needs to have a valid labeling) from the test set (which, in real life applications, could be an unclassified set). I have tried to include test sets with all labels equal to 0 (unclassified), but in that case precomputing the validation batches does not work.

Did I miss something or is it simply not possible, with the current framework, to classify data sets with unknown labeling?

@tatarchm tatarchm self-assigned this Mar 21, 2019
@tatarchm
Copy link
Owner

Hi,

In the current version of the framework there is no 'proper' way to test on unlabeled data but I think the solution you describe should work. Could you please specify exactly what error you get when you try setting all labels to 0? I can also suggest trying to set them to 1 instead, because by default points with 0 labels correspond to the background class and may be ignored.

@sgiraudot
Copy link
Author

sgiraudot commented Mar 22, 2019

I don't really get any error when I set all labels to 0, the software just gets stuck for a very long time in the function precompute_validation_batches(). If I understand correctly, there is at some point a search for a random point, and the random point is discarded if the label is 0: so with all labels to 0, I imagine it either goes to an infinite loop or to a very very long search that will never find anything.

If I put all labels to 1, am I correct that it also means that I should not use these labels in the validation set? Otherwise the training will wrongly consider these points as ground truth for label 1? Or is it working differently?

@tatarchm
Copy link
Owner

I see. I will update the code to support proper testing.

Sure, you should not use those labels in the validation set. Using unlabeled data for validation would not make sense anyway - you need ground truth there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants