Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Train my own Metric: #115

Closed
sdlmw opened this issue Feb 23, 2023 · 4 comments
Closed

[QUESTION] Train my own Metric: #115

sdlmw opened this issue Feb 23, 2023 · 4 comments
Labels
question Further information is requested

Comments

@sdlmw
Copy link

sdlmw commented Feb 23, 2023

HI

I downloaded the experiment file and tried to train the model myself. But always get the error below below .
However, I did not find the reason, excuse me, what caused this problem?

Code

ranking_metric:
  class_path: comet.models.RankingMetric
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 5.0e-06
    learning_rate: 1.5e-05
    layerwise_decay: 0.95
    encoder_model: XLM-RoBERTa
    pretrained_model: xlm-roberta-base
    pool: avg
    layer: mix
    dropout: 0.1
    batch_size: 4
    train_data: 
      - /MT-work/COMET/data/apequest/train.csv
    validation_data:
      - /MT-work/COMET/data/apequest/test.csv      
trainer: /MT-work/COMET/configs/trainer.yaml
early_stopping: /MT-work/COMET/configs/early_stopping.yaml
model_checkpoint: /MT-work/COMET/configs/model_checkpoint.yaml

comet-train: error: Parser key "ranking_metric": Problem with given class_path "comet.models.RankingMetric":
  - Parser key "train_data": Value "['/MT-work/COMET/data/apequest/train.csv']" does not validate against any of the types in typing.Union[str, NoneType]:
    - Expected a <class 'str'> but got "['/MT-work/COMET/data/apequest/train.csv']"
    - Expected a <class 'NoneType'> but got "['/MT-work/COMET/data/apequest/train.csv']"

What have you tried?

What's your environment?

  • OS: Ubuntu 18.04
  • Packaging [e.g. pip, conda] Pip and conda
  • Version [e.g. 0.5.2.1] COMET: 1.1.3 Python:3.8
@sdlmw sdlmw added the question Further information is requested label Feb 23, 2023
@ricardorei
Copy link
Collaborator

There is a mismatch between unbabel-comet==1.1.3 and the current master branch.

If you are using version 1.1.3 you can't pass a list of training files.. the config is just:

ranking_metric:
  class_path: comet.models.RankingMetric
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 5.0e-06
    learning_rate: 1.5e-05
    layerwise_decay: 0.95
    encoder_model: XLM-RoBERTa
    pretrained_model: xlm-roberta-base
    pool: avg
    layer: mix
    dropout: 0.1
    batch_size: 4
    train_data: /MT-work/COMET/data/apequest/train.csv
    validation_data:
      - /MT-work/COMET/data/apequest/test.csv      
trainer: /MT-work/COMET/configs/trainer.yaml
early_stopping: /MT-work/COMET/configs/early_stopping.yaml
model_checkpoint: /MT-work/COMET/configs/model_checkpoint.yaml

@sdlmw
Copy link
Author

sdlmw commented Feb 27, 2023

Hi @ricardorei

Thanks for the explanation.

I just pulled the latest version.

git clone https://github.com/Unbabel/COMET

The error has not changed

@ricardorei
Copy link
Collaborator

Hi @sdlmw I just tested the code on master and everything is working fine.

Here is my configs:

ranking_metric:
  class_path: comet.models.RankingMetric
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 1.0e-06
    learning_rate: 1.5e-05
    layerwise_decay: 0.95
    encoder_model: XLM-RoBERTa
    pretrained_model: xlm-roberta-base
    pool: avg
    layer: mix
    layer_transformation: sparsemax
    layer_norm: False
    dropout: 0.1
    batch_size: 4
    train_data: 
      - tests/data/ranking_data.csv
    validation_data:
      - tests/data/ranking_data.csv
      
trainer: ../trainer.yaml
early_stopping: ../early_stopping.yaml
model_checkpoint: ../model_checkpoint.yaml

and for the trainer.yaml:

class_path: pytorch_lightning.trainer.trainer.Trainer
init_args:
  accelerator: gpu
  devices: 1
  accumulate_grad_batches: 4
  amp_backend: native
  amp_level: null
  auto_lr_find: False
  auto_scale_batch_size: False
  auto_select_gpus: False
  benchmark: null
  check_val_every_n_epoch: 1
  default_root_dir: null
  deterministic: False
  fast_dev_run: False
  gradient_clip_val: 1.0
  gradient_clip_algorithm: norm
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  log_every_n_steps: 50
  profiler: null
  overfit_batches: 0
  plugins: null
  precision: 16
  max_epochs: 4
  min_epochs: 1
  max_steps: -1
  min_steps: null
  max_time: null
  num_nodes: 1
  num_sanity_val_steps: 10
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: True
  sync_batchnorm: False
  detect_anomaly: False
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0
  enable_model_summary: True
  move_metrics_to_cpu: True
  multiple_trainloader_mode: max_size_cycle

@ricardorei
Copy link
Collaborator

note that the data I am using is in the tests folder. Make sure that the data you are using for the ranking model is in the same shape

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants