New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train ipet with zero training examples #66
Comments
The above problem is caused by the following method in task.py. def _shuffle_and_restrict(examples: List[InputExample], num_examples: int, seed: int = 42) -> List[InputExample]:
"""
Shuffle a list of examples and restrict it to a given maximum size.
:param examples: the examples to shuffle and restrict
:param num_examples: the maximum number of examples
:param seed: the random seed for shuffling
:return: the first ``num_examples`` elements of the shuffled list
"""
if 0 < num_examples < len(examples):
random.Random(seed).shuffle(examples)
examples = examples[:num_examples]
return examples When the num_examples equals 0, this method will return all the training examples, which next would be used to fine tune the language model. |
Hi @EneruMin, you are absolutely correct, this is an error in the code as it should be The reason why things still worked in our experiments is that we additionally specified the if num_examples is not None:
examples = _shuffle_and_restrict(examples, num_examples, seed)
elif num_examples_per_label is not None:
limited_examples = LimitedExampleList(processor.get_labels(), num_examples_per_label)
for example in examples:
limited_examples.add(example)
examples = limited_examples.to_list() So to fix this issue, you can either (a) replace |
Hi @timoschick , thanks for your suggestion.
In the |
Interesting... I'll check why we didn't get a similar error as soon as I find the time. Regardless, the final accuracy should be much better than the one you've reported. There are a couple of differences between your command and the one that we have used, so I cannot tell what exactly causes the difference. Could you tell me the results after each iteration (the contents of the A couple of notes regarding possible differences:
Finally, you may get different results due to random selection of examples and model initialization (but those should not account for more than 5% difference in performance). If you want to reproduce our exact results and none of the above helps, you can check out the |
The results of each iteration are as shown below.
g1
g2
final
According to the figure 4 in your paper, I think maybe I should use 4 or 5 iterations. |
Hi,
I am training ipet with zero training examples, I run the following command.
python3 cli.py --method ipet --pattern_ids 0 1 2 3 4 --data_dir /share/home/zqzeng/wmni/data/ag_news_csv/ag_news_csv --model_type roberta --model_name_or_path /share/home/zqzeng/transformers/roberta-large --task_name agnews --output_dir /share/home/zqzeng/wmni/data/output/unsupervised-ipet --do_train --do_eval --pet_repetitions 1 --ipet_n_most_likely 100 --reduction mean --train_examples 0
And I got the following result:
2021-11-09 20:22:31,904 - INFO - tasks - Creating features from dataset file at ag_news_csv/ (num_examples=0, set_type=train)
2021-11-09 20:22:34,978 - INFO - tasks - Returning 120000 train examples with label dist.: [('3', 30000), ('4', 30000), ('2', 30000), ('1', 30000)]
I followed the flow of the program and found that the whole train examples(120000) was uesd to train each individual model.
When I used "--train_examples 10", it's normal, as shown below:
2021-11-09 20:19:13,402 - INFO - tasks - Creating features from dataset file at ag_news_csv/ (num_examples=10, set_type=train)
2021-11-09 20:19:16,127 - INFO - tasks - Returning 10 train examples with label dist.: [('1', 3), ('4', 4), ('2', 2), ('3', 1)]
Does the zero training examples don't work?
I would be grateful for your prompt reply.
The text was updated successfully, but these errors were encountered: