log.txt

SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
  0%|          | 0/2 [00:00<?, ?it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
  0%|          | 0/2 [00:00<?, ?it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
  0%|          | 0/2 [00:00<?, ?it/s] 50%|█████     | 1/2 [00:00<00:00,  3.23it/s] 50%|█████     | 1/2 [00:00<00:00,  8.60it/s]100%|██████████| 2/2 [00:00<00:00,  5.97it/s]100%|██████████| 2/2 [00:00<00:00, 14.11it/s]100%|██████████| 2/2 [00:00<00:00, 19.26it/s]100%|██████████| 2/2 [00:00<00:00, 19.22it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
  0%|          | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 140.93it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
  0%|          | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 140.66it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
  0%|          | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 147.59it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
  0%|          | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 146.93it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
  0%|          | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 150.83it/s]
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow

Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow

Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow

Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow

Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow

Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow

Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow

Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.bias', 'roberta.pooler.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.decoder.weight', 'roberta.pooler.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'roberta.pooler.dense.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.weight', 'classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
***** Running training *****
  Num examples = 1000
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 48
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
***** Running training *****
  Num examples = 1000
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 48
  0%|          | 0/48 [00:00<?, ?it/s]  0%|          | 0/48 [00:00<?, ?it/s][W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
  2%|▏         | 1/48 [00:02<02:10,  2.79s/it]  2%|▏         | 1/48 [00:02<02:11,  2.80s/it]  4%|▍         | 2/48 [00:02<00:58,  1.27s/it]  4%|▍         | 2/48 [00:03<00:58,  1.27s/it]  6%|▋         | 3/48 [00:03<00:35,  1.28it/s]  6%|▋         | 3/48 [00:03<00:35,  1.27it/s]  8%|▊         | 4/48 [00:03<00:24,  1.80it/s]  8%|▊         | 4/48 [00:03<00:24,  1.80it/s] 10%|█         | 5/48 [00:03<00:18,  2.34it/s] 10%|█         | 5/48 [00:03<00:18,  2.34it/s] 12%|█▎        | 6/48 [00:03<00:14,  2.85it/s] 12%|█▎        | 6/48 [00:03<00:14,  2.85it/s] 15%|█▍        | 7/48 [00:04<00:12,  3.31it/s] 15%|█▍        | 7/48 [00:04<00:12,  3.30it/s] 17%|█▋        | 8/48 [00:04<00:10,  3.68it/s] 17%|█▋        | 8/48 [00:04<00:10,  3.68it/s] 19%|█▉        | 9/48 [00:04<00:09,  4.00it/s] 19%|█▉        | 9/48 [00:04<00:09,  3.99it/s] 21%|██        | 10/48 [00:04<00:08,  4.24it/s] 21%|██        | 10/48 [00:04<00:08,  4.24it/s] 23%|██▎       | 11/48 [00:04<00:08,  4.43it/s] 23%|██▎       | 11/48 [00:04<00:08,  4.43it/s] 25%|██▌       | 12/48 [00:05<00:07,  4.59it/s] 25%|██▌       | 12/48 [00:05<00:07,  4.59it/s] 27%|██▋       | 13/48 [00:05<00:07,  4.68it/s] 27%|██▋       | 13/48 [00:05<00:07,  4.68it/s] 29%|██▉       | 14/48 [00:05<00:07,  4.75it/s] 29%|██▉       | 14/48 [00:05<00:07,  4.75it/s] 31%|███▏      | 15/48 [00:05<00:07,  4.68it/s] 31%|███▏      | 15/48 [00:05<00:07,  4.66it/s] 33%|███▎      | 16/48 [00:05<00:06,  4.94it/s] 33%|███▎      | 16/48 [00:05<00:06,  4.94it/s] 35%|███▌      | 17/48 [00:06<00:06,  4.94it/s] 35%|███▌      | 17/48 [00:06<00:06,  4.93it/s] 38%|███▊      | 18/48 [00:06<00:06,  4.95it/s] 38%|███▊      | 18/48 [00:06<00:06,  4.95it/s] 40%|███▉      | 19/48 [00:06<00:05,  4.96it/s] 40%|███▉      | 19/48 [00:06<00:05,  4.95it/s] 42%|████▏     | 20/48 [00:06<00:05,  4.95it/s] 42%|████▏     | 20/48 [00:06<00:05,  4.95it/s] 44%|████▍     | 21/48 [00:06<00:05,  4.96it/s] 44%|████▍     | 21/48 [00:06<00:05,  4.96it/s] 46%|████▌     | 22/48 [00:07<00:05,  4.94it/s] 46%|████▌     | 22/48 [00:07<00:05,  4.93it/s] 48%|████▊     | 23/48 [00:07<00:05,  4.95it/s] 48%|████▊     | 23/48 [00:07<00:05,  4.95it/s] 50%|█████     | 24/48 [00:07<00:04,  4.97it/s] 50%|█████     | 24/48 [00:07<00:04,  4.96it/s] 52%|█████▏    | 25/48 [00:07<00:04,  4.95it/s] 52%|█████▏    | 25/48 [00:07<00:04,  4.94it/s] 54%|█████▍    | 26/48 [00:07<00:04,  4.93it/s] 54%|█████▍    | 26/48 [00:07<00:04,  4.92it/s] 56%|█████▋    | 27/48 [00:08<00:04,  4.95it/s] 56%|█████▋    | 27/48 [00:08<00:04,  4.94it/s] 58%|█████▊    | 28/48 [00:08<00:04,  4.95it/s] 58%|█████▊    | 28/48 [00:08<00:04,  4.93it/s] 60%|██████    | 29/48 [00:08<00:03,  4.95it/s] 60%|██████    | 29/48 [00:08<00:03,  4.95it/s] 62%|██████▎   | 30/48 [00:08<00:03,  4.93it/s] 62%|██████▎   | 30/48 [00:08<00:03,  4.93it/s] 65%|██████▍   | 31/48 [00:08<00:03,  4.94it/s] 65%|██████▍   | 31/48 [00:08<00:03,  4.94it/s] 67%|██████▋   | 32/48 [00:09<00:03,  5.19it/s] 67%|██████▋   | 32/48 [00:09<00:03,  5.19it/s] 69%|██████▉   | 33/48 [00:09<00:02,  5.11it/s] 69%|██████▉   | 33/48 [00:09<00:02,  5.10it/s] 71%|███████   | 34/48 [00:09<00:02,  5.07it/s] 71%|███████   | 34/48 [00:09<00:02,  5.06it/s] 73%|███████▎  | 35/48 [00:09<00:02,  5.03it/s] 73%|███████▎  | 35/48 [00:09<00:02,  5.03it/s] 75%|███████▌  | 36/48 [00:09<00:02,  5.01it/s] 75%|███████▌  | 36/48 [00:09<00:02,  5.01it/s] 77%|███████▋  | 37/48 [00:10<00:02,  4.98it/s] 77%|███████▋  | 37/48 [00:10<00:02,  4.98it/s] 79%|███████▉  | 38/48 [00:10<00:02,  4.98it/s] 79%|███████▉  | 38/48 [00:10<00:02,  4.98it/s] 81%|████████▏ | 39/48 [00:10<00:01,  4.96it/s] 81%|████████▏ | 39/48 [00:10<00:01,  4.96it/s] 83%|████████▎ | 40/48 [00:10<00:01,  4.96it/s] 83%|████████▎ | 40/48 [00:10<00:01,  4.96it/s] 85%|████████▌ | 41/48 [00:10<00:01,  4.95it/s] 85%|████████▌ | 41/48 [00:10<00:01,  4.95it/s] 88%|████████▊ | 42/48 [00:11<00:01,  4.95it/s] 88%|████████▊ | 42/48 [00:11<00:01,  4.95it/s] 90%|████████▉ | 43/48 [00:11<00:01,  4.96it/s] 90%|████████▉ | 43/48 [00:11<00:01,  4.96it/s] 92%|█████████▏| 44/48 [00:11<00:00,  4.97it/s] 92%|█████████▏| 44/48 [00:11<00:00,  4.97it/s] 94%|█████████▍| 45/48 [00:11<00:00,  4.96it/s] 94%|█████████▍| 45/48 [00:11<00:00,  4.96it/s] 96%|█████████▌| 46/48 [00:11<00:00,  4.93it/s] 96%|█████████▌| 46/48 [00:11<00:00,  4.93it/s] 98%|█████████▊| 47/48 [00:12<00:00,  4.94it/s] 98%|█████████▊| 47/48 [00:12<00:00,  4.94it/s]100%|██████████| 48/48 [00:12<00:00,  5.17it/s]

Training completed. Do not forget to share your model on huggingface.co/models =)


100%|██████████| 48/48 [00:12<00:00,  5.15it/s]

Training completed. Do not forget to share your model on huggingface.co/models =)


Done!
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
Done!
Done!
Done!
Done!
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
Done!
                                               {'train_runtime': 12.3165, 'train_samples_per_second': 243.576, 'train_steps_per_second': 3.897, 'train_loss': 1.2376000881195068, 'epoch': 3.0}
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
100%|██████████| 48/48 [00:12<00:00,  5.17it/s]100%|██████████| 48/48 [00:12<00:00,  3.92it/s]Done!

                                               {'train_runtime': 12.3157, 'train_samples_per_second': 243.592, 'train_steps_per_second': 3.897, 'train_loss': 1.2245434919993083, 'epoch': 3.0}
100%|██████████| 48/48 [00:12<00:00,  5.15it/s]100%|██████████| 48/48 [00:12<00:00,  3.92it/s]Done!