Training PET on a new task #3

Mahhos · 2020-10-06T01:39:05Z

Hi. I want to train PET on a new task for which I prepared custom_task_processor.py and custom_task_pvp.py. My question is how should we run/tell the program to read our customized files (instead of the main files) and run the registered new task? It seems that just running the commands under the PET Training and Evaluation section does not do the task.

The text was updated successfully, but these errors were encountered:

timoschick · 2020-10-07T07:28:12Z

Hi @Mahhos, I'm on vacation this week, but I'll try to answer your question early next week.

chris-aeviator · 2020-10-11T11:00:51Z

As I understand the docs, after you wrote your own pvp & task_processor, you call the CLI with --task_name my-task which is defined via TASK_NAME = "my-task" in the task_processor. As it seems to me you'll have to import your custom code in the form of the two example files in /examples into the files https://github.com/timoschick/pet/blob/e20bc195455f817119a86b53284a105f2c906fbd/pet/pvp.py & https://github.com/timoschick/pet/blob/e20bc195455f817119a86b53284a105f2c906fbd/pet/tasks.py (and more?)

chris-aeviator · 2020-10-11T12:43:26Z

I ended up copying the classes to the respective files and can confirm it works.

Due to the use of transformers library I had my language model (roberta) already installed on my machine and training started right away 🚀 👏

Mahhos · 2020-10-11T16:35:49Z

I ended up copying the classes to the respective files and can confirm it works.

Due to the use of transformers library I had my language model (roberta) already installed on my machine and training started right away 🚀 👏

Thanks for your advice. I copied my classes but still the command does not work. This is the command that I am using:
Is there anything I should change?

python3 cli.py --method pet --pattern_ids 0 1 --data_dir ./data --model_type albert --model_name_or_path albert-base-v2 --task_name my-task --output_dir ./OUTPUT_DIR/ --do_train --do_eval --pet_per_gpu_eval_batch_size 8 --pet_per_gpu_train_batch_size 2 --pet_gradient_accumulation_steps 8 --pet_max_steps 250 --pet_max_seq_length 256 --pet_repetitions 3 --sc_per_gpu_train_batch_size 2 --sc_per_gpu_unlabeled_batch_size 2 --sc_gradient_accumulation_steps 8 --sc_max_steps 5000 --sc_max_seq_length 256 --sc_repetitions 1

chris-aeviator · 2020-10-11T16:36:55Z

@Mahhos what's your error ?

Mahhos · 2020-10-11T16:39:50Z

@Mahhos what's your error ?

No error. I run the command from the terminal and it does not do anything without any error.

Mahhos · 2020-10-11T16:44:10Z

@chris-aeviator I found the issue. I run the same command with python instead of python3 and it worked. However, I am getting an error:
File "E:\My Projects\pet-master-updated\pet\tasks.py", line 848, in _create_examples text_a = row[MyTaskDataProcessor.TEXT_A_COLUMN] IndexError: list index out of range

chris-aeviator · 2020-10-11T16:49:56Z

I think this happens if you have unequal columns in your data. Try to do a very minimal example first.
I have started with e.g.

pvp.py

VERBALIZER = {
        "0": ["Bad"],
        "1": ["Good"],
    }

training.csv

0,This is bad,a bad string
1,this is good, a good string

dev.csv

0,something bad, a bad string
1,that's good, good string

unlabeled.csv

,this is supposed to be bad,baddish
,this looks good,goodish

(actually my field B is a category, currently always the same with all examples)

with around 20 training examples (split between training and dev) - I can get pretty good results with real world data by running

python3 cli.py \
--overwrite_output_dir --method pet \
--pattern_ids 0 \
--data_dir data \
--model_type roberta \
--model_name roberta-base \
--task_name my-task \
--output_dir /mnt/[…]/DevRepo/xxxxxxx-pet-model \
--do_eval \
--do_train

Mahhos · 2020-10-11T17:03:10Z

@chris-aeviator I guess there is something wrong with my unlabeled.csv file. Since it can successfully create features from my train.csv and dev.csv. When trying to create features from unlabeled.csv it raised this error. My unlabeled.csv only has one column including the text. I tell the program to consider column 0 as text_a. My unlabeled.csv does not have column 1 including gold labels.

2020-10-11 11:56:33,461 - INFO - tasks - Creating features from dataset file at ./data (num_examples=-1, set_type=unlabeled)
File "E:\My Projects\pet-master-updated\pet\tasks.py", line 848, in _create_examples text_a = row[MyTaskDataProcessor.TEXT_A_COLUMN] IndexError: list index out of range

chris-aeviator · 2020-10-11T17:05:26Z

@Mahhos have you made sure that you keep an empty column in unlabeled.txt at the same space where you have your label in the train.txt?

train: label,text_a
unlabeled [emptyness],text_a

so the , is important

Mahhos · 2020-10-12T02:47:28Z

@chris-aeviator thank you so much. That was a good point!

timoschick self-assigned this Oct 7, 2020

Mahhos closed this as completed Oct 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training PET on a new task #3

Training PET on a new task #3

Mahhos commented Oct 6, 2020

timoschick commented Oct 7, 2020

chris-aeviator commented Oct 11, 2020

chris-aeviator commented Oct 11, 2020

Mahhos commented Oct 11, 2020

chris-aeviator commented Oct 11, 2020

Mahhos commented Oct 11, 2020

Mahhos commented Oct 11, 2020

chris-aeviator commented Oct 11, 2020 •

edited

Mahhos commented Oct 11, 2020

chris-aeviator commented Oct 11, 2020 •

edited

Mahhos commented Oct 12, 2020

Training PET on a new task #3

Training PET on a new task #3

Comments

Mahhos commented Oct 6, 2020

timoschick commented Oct 7, 2020

chris-aeviator commented Oct 11, 2020

chris-aeviator commented Oct 11, 2020

Mahhos commented Oct 11, 2020

chris-aeviator commented Oct 11, 2020

Mahhos commented Oct 11, 2020

Mahhos commented Oct 11, 2020

chris-aeviator commented Oct 11, 2020 • edited

Mahhos commented Oct 11, 2020

chris-aeviator commented Oct 11, 2020 • edited

Mahhos commented Oct 12, 2020

chris-aeviator commented Oct 11, 2020 •

edited

chris-aeviator commented Oct 11, 2020 •

edited