-
Notifications
You must be signed in to change notification settings - Fork 0
/
log.txt
299 lines (267 loc) · 33.5 KB
/
log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn4
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
SUTD-hpc-gn3
Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
0%| | 0/2 [00:00<?, ?it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
0%| | 0/2 [00:00<?, ?it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
0%| | 0/2 [00:00<?, ?it/s] 50%|█████ | 1/2 [00:00<00:00, 3.23it/s] 50%|█████ | 1/2 [00:00<00:00, 8.60it/s]100%|██████████| 2/2 [00:00<00:00, 5.97it/s]100%|██████████| 2/2 [00:00<00:00, 14.11it/s]100%|██████████| 2/2 [00:00<00:00, 19.26it/s]100%|██████████| 2/2 [00:00<00:00, 19.22it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
0%| | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 140.93it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
0%| | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 140.66it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
0%| | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 147.59it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
0%| | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 146.93it/s]Found cached dataset yelp_review_full (/home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
0%| | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:00<00:00, 150.83it/s]
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0220b6160bfc7b29.arrow
Loading cached processed dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-9088e0e25706f610.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-0cf83c07babf9a6a.arrow
Loading cached shuffled indices for dataset at /home/users/uat/data/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-576e269e1d6b7221.arrow
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.bias', 'roberta.pooler.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.decoder.weight', 'roberta.pooler.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'roberta.pooler.dense.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.weight', 'classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
***** Running training *****
Num examples = 1000
Num Epochs = 3
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 64
Gradient Accumulation steps = 1
Total optimization steps = 48
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
***** Running training *****
Num examples = 1000
Num Epochs = 3
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 64
Gradient Accumulation steps = 1
Total optimization steps = 48
0%| | 0/48 [00:00<?, ?it/s] 0%| | 0/48 [00:00<?, ?it/s][W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
2%|▏ | 1/48 [00:02<02:10, 2.79s/it] 2%|▏ | 1/48 [00:02<02:11, 2.80s/it] 4%|▍ | 2/48 [00:02<00:58, 1.27s/it] 4%|▍ | 2/48 [00:03<00:58, 1.27s/it] 6%|▋ | 3/48 [00:03<00:35, 1.28it/s] 6%|▋ | 3/48 [00:03<00:35, 1.27it/s] 8%|▊ | 4/48 [00:03<00:24, 1.80it/s] 8%|▊ | 4/48 [00:03<00:24, 1.80it/s] 10%|█ | 5/48 [00:03<00:18, 2.34it/s] 10%|█ | 5/48 [00:03<00:18, 2.34it/s] 12%|█▎ | 6/48 [00:03<00:14, 2.85it/s] 12%|█▎ | 6/48 [00:03<00:14, 2.85it/s] 15%|█▍ | 7/48 [00:04<00:12, 3.31it/s] 15%|█▍ | 7/48 [00:04<00:12, 3.30it/s] 17%|█▋ | 8/48 [00:04<00:10, 3.68it/s] 17%|█▋ | 8/48 [00:04<00:10, 3.68it/s] 19%|█▉ | 9/48 [00:04<00:09, 4.00it/s] 19%|█▉ | 9/48 [00:04<00:09, 3.99it/s] 21%|██ | 10/48 [00:04<00:08, 4.24it/s] 21%|██ | 10/48 [00:04<00:08, 4.24it/s] 23%|██▎ | 11/48 [00:04<00:08, 4.43it/s] 23%|██▎ | 11/48 [00:04<00:08, 4.43it/s] 25%|██▌ | 12/48 [00:05<00:07, 4.59it/s] 25%|██▌ | 12/48 [00:05<00:07, 4.59it/s] 27%|██▋ | 13/48 [00:05<00:07, 4.68it/s] 27%|██▋ | 13/48 [00:05<00:07, 4.68it/s] 29%|██▉ | 14/48 [00:05<00:07, 4.75it/s] 29%|██▉ | 14/48 [00:05<00:07, 4.75it/s] 31%|███▏ | 15/48 [00:05<00:07, 4.68it/s] 31%|███▏ | 15/48 [00:05<00:07, 4.66it/s] 33%|███▎ | 16/48 [00:05<00:06, 4.94it/s] 33%|███▎ | 16/48 [00:05<00:06, 4.94it/s] 35%|███▌ | 17/48 [00:06<00:06, 4.94it/s] 35%|███▌ | 17/48 [00:06<00:06, 4.93it/s] 38%|███▊ | 18/48 [00:06<00:06, 4.95it/s] 38%|███▊ | 18/48 [00:06<00:06, 4.95it/s] 40%|███▉ | 19/48 [00:06<00:05, 4.96it/s] 40%|███▉ | 19/48 [00:06<00:05, 4.95it/s] 42%|████▏ | 20/48 [00:06<00:05, 4.95it/s] 42%|████▏ | 20/48 [00:06<00:05, 4.95it/s] 44%|████▍ | 21/48 [00:06<00:05, 4.96it/s] 44%|████▍ | 21/48 [00:06<00:05, 4.96it/s] 46%|████▌ | 22/48 [00:07<00:05, 4.94it/s] 46%|████▌ | 22/48 [00:07<00:05, 4.93it/s] 48%|████▊ | 23/48 [00:07<00:05, 4.95it/s] 48%|████▊ | 23/48 [00:07<00:05, 4.95it/s] 50%|█████ | 24/48 [00:07<00:04, 4.97it/s] 50%|█████ | 24/48 [00:07<00:04, 4.96it/s] 52%|█████▏ | 25/48 [00:07<00:04, 4.95it/s] 52%|█████▏ | 25/48 [00:07<00:04, 4.94it/s] 54%|█████▍ | 26/48 [00:07<00:04, 4.93it/s] 54%|█████▍ | 26/48 [00:07<00:04, 4.92it/s] 56%|█████▋ | 27/48 [00:08<00:04, 4.95it/s] 56%|█████▋ | 27/48 [00:08<00:04, 4.94it/s] 58%|█████▊ | 28/48 [00:08<00:04, 4.95it/s] 58%|█████▊ | 28/48 [00:08<00:04, 4.93it/s] 60%|██████ | 29/48 [00:08<00:03, 4.95it/s] 60%|██████ | 29/48 [00:08<00:03, 4.95it/s] 62%|██████▎ | 30/48 [00:08<00:03, 4.93it/s] 62%|██████▎ | 30/48 [00:08<00:03, 4.93it/s] 65%|██████▍ | 31/48 [00:08<00:03, 4.94it/s] 65%|██████▍ | 31/48 [00:08<00:03, 4.94it/s] 67%|██████▋ | 32/48 [00:09<00:03, 5.19it/s] 67%|██████▋ | 32/48 [00:09<00:03, 5.19it/s] 69%|██████▉ | 33/48 [00:09<00:02, 5.11it/s] 69%|██████▉ | 33/48 [00:09<00:02, 5.10it/s] 71%|███████ | 34/48 [00:09<00:02, 5.07it/s] 71%|███████ | 34/48 [00:09<00:02, 5.06it/s] 73%|███████▎ | 35/48 [00:09<00:02, 5.03it/s] 73%|███████▎ | 35/48 [00:09<00:02, 5.03it/s] 75%|███████▌ | 36/48 [00:09<00:02, 5.01it/s] 75%|███████▌ | 36/48 [00:09<00:02, 5.01it/s] 77%|███████▋ | 37/48 [00:10<00:02, 4.98it/s] 77%|███████▋ | 37/48 [00:10<00:02, 4.98it/s] 79%|███████▉ | 38/48 [00:10<00:02, 4.98it/s] 79%|███████▉ | 38/48 [00:10<00:02, 4.98it/s] 81%|████████▏ | 39/48 [00:10<00:01, 4.96it/s] 81%|████████▏ | 39/48 [00:10<00:01, 4.96it/s] 83%|████████▎ | 40/48 [00:10<00:01, 4.96it/s] 83%|████████▎ | 40/48 [00:10<00:01, 4.96it/s] 85%|████████▌ | 41/48 [00:10<00:01, 4.95it/s] 85%|████████▌ | 41/48 [00:10<00:01, 4.95it/s] 88%|████████▊ | 42/48 [00:11<00:01, 4.95it/s] 88%|████████▊ | 42/48 [00:11<00:01, 4.95it/s] 90%|████████▉ | 43/48 [00:11<00:01, 4.96it/s] 90%|████████▉ | 43/48 [00:11<00:01, 4.96it/s] 92%|█████████▏| 44/48 [00:11<00:00, 4.97it/s] 92%|█████████▏| 44/48 [00:11<00:00, 4.97it/s] 94%|█████████▍| 45/48 [00:11<00:00, 4.96it/s] 94%|█████████▍| 45/48 [00:11<00:00, 4.96it/s] 96%|█████████▌| 46/48 [00:11<00:00, 4.93it/s] 96%|█████████▌| 46/48 [00:11<00:00, 4.93it/s] 98%|█████████▊| 47/48 [00:12<00:00, 4.94it/s] 98%|█████████▊| 47/48 [00:12<00:00, 4.94it/s]100%|██████████| 48/48 [00:12<00:00, 5.17it/s]
Training completed. Do not forget to share your model on huggingface.co/models =)
100%|██████████| 48/48 [00:12<00:00, 5.15it/s]
Training completed. Do not forget to share your model on huggingface.co/models =)
Done!
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
Done!
Done!
Done!
Done!
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
Done!
{'train_runtime': 12.3165, 'train_samples_per_second': 243.576, 'train_steps_per_second': 3.897, 'train_loss': 1.2376000881195068, 'epoch': 3.0}
/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
100%|██████████| 48/48 [00:12<00:00, 5.17it/s]100%|██████████| 48/48 [00:12<00:00, 3.92it/s]Done!
{'train_runtime': 12.3157, 'train_samples_per_second': 243.592, 'train_steps_per_second': 3.897, 'train_loss': 1.2245434919993083, 'epoch': 3.0}
100%|██████████| 48/48 [00:12<00:00, 5.15it/s]100%|██████████| 48/48 [00:12<00:00, 3.92it/s]Done!