New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initialize_active_learner error #6
Comments
I have also used stratified
error is same but active learner was initialized and
|
Hi, regarding your first post:
(For the future: With such a high number of labels, it might not be practical to require so many initialization samples. I will think what can be done about this.) Regarding your second post:
You could try the class-balanced initialization in the meantime. |
@chschroeder Thanks for the reply. I have tried all the methods of initialization:
All are failing for sure. |
Here is my code
|
@neel17 Okay, if they all don't seem to work, could it be that there are labels that only appear once? If so, we can have splits that don't include every label. Maybe the initialization functions should check for plausibility in this case. Can you have a look at your class distribution? You can use the following snippet to obtain an array which contains the number of samples for each class at the given index:
In your case: |
@chschroeder Thanks for the reply, I have tried to check for the underrepresented labels as you suggested. I don have any label which are less than 30 actually. |
I looked into this briefly, I can take a closer look this weekend. It seems we have two problems:
This is only a quick fix. I think this check should be optional but I want to think about this before taking action. I might remove this check or add an option to disable it. Thanks for your patience. Having this many classes is a new use case to be honest, but if you have some patience I think we can find a solution here (and improve small-text at the same time). |
@neel17: Okay, I think the only actual problem here is the class check, which results in This was required when the number of classes was obtained implicitly in earlier versions of the code, but can now be removed or at least changed to be a warning. If you need a quick fix, you can override the _fit_main method and remove the following lines: small-text/small_text/integrations/transformers/classifiers/classification.py Lines 350 to 353 in bdd004c
I will remove this check in the next version. |
I am trying to initialize a active learner for text classification using transformer. I have 11014 classes which need to be trained by the classification model. My data set is highly imbalanced. While doing the
initialize_active_learner( active_learner, y_train)
I have usedBut I get this error always:
Please help here.
Thanks in advance
The text was updated successfully, but these errors were encountered: