Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training PET on a new task #3

Closed
Mahhos opened this issue Oct 6, 2020 · 11 comments
Closed

Training PET on a new task #3

Mahhos opened this issue Oct 6, 2020 · 11 comments
Assignees

Comments

@Mahhos
Copy link

Mahhos commented Oct 6, 2020

Hi. I want to train PET on a new task for which I prepared custom_task_processor.py and custom_task_pvp.py. My question is how should we run/tell the program to read our customized files (instead of the main files) and run the registered new task? It seems that just running the commands under the PET Training and Evaluation section does not do the task.

@timoschick
Copy link
Owner

Hi @Mahhos, I'm on vacation this week, but I'll try to answer your question early next week.

@timoschick timoschick self-assigned this Oct 7, 2020
@chris-aeviator
Copy link

As I understand the docs, after you wrote your own pvp & task_processor, you call the CLI with --task_name my-task which is defined via TASK_NAME = "my-task" in the task_processor. As it seems to me you'll have to import your custom code in the form of the two example files in /examples into the files https://github.com/timoschick/pet/blob/e20bc195455f817119a86b53284a105f2c906fbd/pet/pvp.py & https://github.com/timoschick/pet/blob/e20bc195455f817119a86b53284a105f2c906fbd/pet/tasks.py (and more?)

@chris-aeviator
Copy link

I ended up copying the classes to the respective files and can confirm it works.

Due to the use of transformers library I had my language model (roberta) already installed on my machine and training started right away 🚀 👏

@Mahhos
Copy link
Author

Mahhos commented Oct 11, 2020

I ended up copying the classes to the respective files and can confirm it works.

Due to the use of transformers library I had my language model (roberta) already installed on my machine and training started right away 🚀 👏

Thanks for your advice. I copied my classes but still the command does not work. This is the command that I am using:
Is there anything I should change?

python3 cli.py --method pet --pattern_ids 0 1 --data_dir ./data --model_type albert --model_name_or_path albert-base-v2 --task_name my-task --output_dir ./OUTPUT_DIR/ --do_train --do_eval --pet_per_gpu_eval_batch_size 8 --pet_per_gpu_train_batch_size 2 --pet_gradient_accumulation_steps 8 --pet_max_steps 250 --pet_max_seq_length 256 --pet_repetitions 3 --sc_per_gpu_train_batch_size 2 --sc_per_gpu_unlabeled_batch_size 2 --sc_gradient_accumulation_steps 8 --sc_max_steps 5000 --sc_max_seq_length 256 --sc_repetitions 1

@Mahhos Mahhos closed this as completed Oct 11, 2020
@chris-aeviator
Copy link

@Mahhos what's your error ?

@Mahhos
Copy link
Author

Mahhos commented Oct 11, 2020

@Mahhos what's your error ?

No error. I run the command from the terminal and it does not do anything without any error.
image

@Mahhos
Copy link
Author

Mahhos commented Oct 11, 2020

@chris-aeviator I found the issue. I run the same command with python instead of python3 and it worked. However, I am getting an error:
File "E:\My Projects\pet-master-updated\pet\tasks.py", line 848, in _create_examples text_a = row[MyTaskDataProcessor.TEXT_A_COLUMN] IndexError: list index out of range

@chris-aeviator
Copy link

chris-aeviator commented Oct 11, 2020

I think this happens if you have unequal columns in your data. Try to do a very minimal example first.
I have started with e.g.

pvp.py

VERBALIZER = {
        "0": ["Bad"],
        "1": ["Good"],
    }

training.csv

0,This is bad,a bad string
1,this is good, a good string

dev.csv

0,something bad, a bad string
1,that's good, good string

unlabeled.csv

,this is supposed to be bad,baddish
,this looks good,goodish

(actually my field B is a category, currently always the same with all examples)

with around 20 training examples (split between training and dev) - I can get pretty good results with real world data by running

python3 cli.py \
--overwrite_output_dir --method pet \
--pattern_ids 0 \
--data_dir data \
--model_type roberta \
--model_name roberta-base \
--task_name my-task \
--output_dir /mnt/[…]/DevRepo/xxxxxxx-pet-model \
--do_eval \
--do_train

@Mahhos
Copy link
Author

Mahhos commented Oct 11, 2020

@chris-aeviator I guess there is something wrong with my unlabeled.csv file. Since it can successfully create features from my train.csv and dev.csv. When trying to create features from unlabeled.csv it raised this error. My unlabeled.csv only has one column including the text. I tell the program to consider column 0 as text_a. My unlabeled.csv does not have column 1 including gold labels.

2020-10-11 11:56:33,461 - INFO - tasks - Creating features from dataset file at ./data (num_examples=-1, set_type=unlabeled)
File "E:\My Projects\pet-master-updated\pet\tasks.py", line 848, in _create_examples text_a = row[MyTaskDataProcessor.TEXT_A_COLUMN] IndexError: list index out of range

@chris-aeviator
Copy link

chris-aeviator commented Oct 11, 2020

@Mahhos have you made sure that you keep an empty column in unlabeled.txt at the same space where you have your label in the train.txt?

train: label,text_a
unlabeled [emptyness],text_a

so the , is important

@Mahhos
Copy link
Author

Mahhos commented Oct 12, 2020

@chris-aeviator thank you so much. That was a good point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants