-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[fix] add parameters arg into AdamWMini #10774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Thanks for your contribution! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❌ Your project check has failed because the head coverage (46.77%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #10774 +/- ##
========================================
Coverage 46.77% 46.77%
========================================
Files 802 802
Lines 133646 133651 +5
========================================
+ Hits 62508 62511 +3
- Misses 71138 71140 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
在 aistudio 上测试一下 llama 的测试用例,提示出错了,目前分析:
aistudio@jupyter-942478-8790893:~/PaddleNLP$ python -m pytest tests/llm/test_adamw_mini.py::FinetuneTest_0_llama::test_finetune
F [100%]
====================================================================== FAILURES =======================================================================
_________________________________________________________ FinetuneTest_0_llama.test_finetune __________________________________________________________
self = <tests.llm.test_adamw_mini.FinetuneTest_0_llama testMethod=test_finetune>
def test_finetune(self):
finetune_config = load_test_config(self.config_path, "finetune", self.model_dir)
finetune_config["dataset_name_or_path"] = self.data_dir
finetune_config["output_dir"] = self.output_dir
with argv_context_guard(finetune_config):
from run_finetune import main
> main()
tests/llm/test_adamw_mini.py:53:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
llm/run_finetune.py:478: in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
paddlenlp/trainer/trainer.py:991: in train
return self._inner_training_loop(
paddlenlp/trainer/trainer.py:1240: in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, step_control=step_control)
paddlenlp/trainer/trainer.py:2538: in training_step
self.scaler.scale(loss).backward()
../external-libraries/lib/python3.10/site-packages/decorator.py:235: in fun
return caller(func, *(extras + args), **kw)
../external-libraries/lib/python3.10/site-packages/paddle/base/wrapped_decorator.py:40: in __impl__
return wrapped_func(*args, **kwargs)
../external-libraries/lib/python3.10/site-packages/paddle/base/framework.py:722: in __impl__
return func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=False,
[85196.54687500]), grad_tensor = [], retain_graph = False
@framework.dygraph_only
def backward(
self: Tensor,
grad_tensor: Tensor | None = None,
retain_graph: bool = False,
) -> None:
"""
Run backward of current Graph which starts from current Tensor.
The new gradient will accumulate on previous gradient.
You can clear gradient by ``Tensor.clear_grad()`` .
Args:
grad_tensor(Tensor|None, optional): initial gradient values of the current Tensor. If `grad_tensor` is None,
the initial gradient values of the current Tensor would be Tensor filled with 1.0;
if `grad_tensor` is not None, it must have the same length as the current Tensor.
The default value is None.
retain_graph(bool, optional): If False, the graph used to compute grads will be freed. If you would
like to add more ops to the built graph after calling this method( :code:`backward` ), set the parameter
:code:`retain_graph` to True, then the grads will be retained. Thus, setting it to False is much more memory-efficient.
Defaults to False.
Returns:
None
Examples:
.. code-block:: python
>>> import paddle
>>> x = paddle.to_tensor(5., stop_gradient=False)
>>> for i in range(5):
... y = paddle.pow(x, 4.0)
... y.backward()
... print("{}: {}".format(i, x.grad))
0: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
500.)
1: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
1000.)
2: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
1500.)
3: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
2000.)
4: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
2500.)
>>> x.clear_grad()
>>> print("{}".format(x.grad))
Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
0.)
>>> grad_tensor=paddle.to_tensor(2.)
>>> for i in range(5):
... y = paddle.pow(x, 4.0)
... y.backward(grad_tensor)
... print("{}: {}".format(i, x.grad))
0: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
1000.)
1: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
2000.)
2: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
3000.)
3: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
4000.)
4: Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=False,
5000.)
"""
if framework.in_dygraph_mode():
if in_profiler_mode():
record_event = profiler.RecordEvent(
"Gradient Backward", profiler.TracerEventType.Backward
)
record_event.begin()
if grad_tensor is not None:
assert isinstance(
grad_tensor, core.eager.Tensor
), "The type of grad_tensor must be paddle.Tensor"
assert (
grad_tensor.shape == self.shape
), f"Tensor shape not match, Tensor of grad_tensor [ {grad_tensor.name} ] with shape {grad_tensor.shape} mismatch Tensor [ {self.name} ] with shape {self.shape}"
if grad_tensor is None:
grad_tensor = []
else:
grad_tensor = [grad_tensor]
if _grad_scalar:
# When using amp with Fleet DistributedStrategy, we do loss scaling implicitly.
self = _grad_scalar.scale(self)
> core.eager.run_backward([self], grad_tensor, retain_graph)
E OSError: (External) OSError: (External) Exception: Not supported to retrieve a tensor saved by autograd multiple times that is no need to recompute.Please check your `keys_ignore_to_save`.
E
E At:
E /home/aistudio/PaddleNLP/paddlenlp/transformers/refined_recompute.py(369): inner_pack
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/functional/common.py(2310): linear
E /home/aistudio/PaddleNLP/paddlenlp/transformers/deepseek_v2/fp8_linear.py(75): fp8_linear
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/layer/common.py(223): forward
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/layer/layers.py(1571): __call__
E /home/aistudio/PaddleNLP/paddlenlp/transformers/llama/modeling.py(687): forward
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/layer/layers.py(1571): __call__
E /home/aistudio/PaddleNLP/paddlenlp/transformers/llama/modeling.py(1278): forward
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/layer/layers.py(1571): __call__
E /home/aistudio/PaddleNLP/paddlenlp/transformers/llama/modeling.py(1639): custom_forward
E /home/aistudio/PaddleNLP/paddlenlp/transformers/refined_recompute.py(404): unpack
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/base/dygraph/tensor_patch_methods.py(371): backward
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/base/framework.py(722): __impl__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/base/wrapped_decorator.py(40): __impl__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/decorator.py(235): fun
E /home/aistudio/PaddleNLP/paddlenlp/trainer/trainer.py(2538): training_step
E /home/aistudio/PaddleNLP/paddlenlp/trainer/trainer.py(1240): _inner_training_loop
E /home/aistudio/PaddleNLP/paddlenlp/trainer/trainer.py(991): train
E /home/aistudio/PaddleNLP/./llm/run_finetune.py(478): main
E /home/aistudio/PaddleNLP/tests/llm/test_adamw_mini.py(53): test_finetune
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/unittest/case.py(549): _callTestMethod
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/unittest/case.py(591): run
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/unittest/case.py(650): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/unittest.py(351): runtest
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(178): pytest_runtest_call
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_callers.py(121): _multicall
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_manager.py(120): _hookexec
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_hooks.py(512): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(246): <lambda>
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(344): from_call
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(245): call_and_report
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(136): runtestprotocol
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(117): pytest_runtest_protocol
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_callers.py(121): _multicall
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_manager.py(120): _hookexec
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_hooks.py(512): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/main.py(367): pytest_runtestloop
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_callers.py(121): _multicall
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_manager.py(120): _hookexec
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_hooks.py(512): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/main.py(343): _main
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/main.py(289): wrap_session
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/main.py(336): pytest_cmdline_main
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_callers.py(121): _multicall
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_manager.py(120): _hookexec
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_hooks.py(512): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/config/__init__.py(175): main
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/config/__init__.py(201): console_main
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pytest/__main__.py(9): <module>
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/runpy.py(86): _run_code
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/runpy.py(196): _run_module_as_main
E
E [Hint: ret should not be null.] (at ../paddle/fluid/pybind/eager_utils.cc:2625)
E [operator < linear > error]
E
E At:
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/functional/common.py(2310): linear
E /home/aistudio/PaddleNLP/paddlenlp/transformers/deepseek_v2/fp8_linear.py(75): fp8_linear
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/layer/common.py(223): forward
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/layer/layers.py(1571): __call__
E /home/aistudio/PaddleNLP/paddlenlp/transformers/llama/modeling.py(687): forward
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/layer/layers.py(1571): __call__
E /home/aistudio/PaddleNLP/paddlenlp/transformers/llama/modeling.py(1278): forward
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/nn/layer/layers.py(1571): __call__
E /home/aistudio/PaddleNLP/paddlenlp/transformers/llama/modeling.py(1639): custom_forward
E /home/aistudio/PaddleNLP/paddlenlp/transformers/refined_recompute.py(404): unpack
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/base/dygraph/tensor_patch_methods.py(371): backward
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/base/framework.py(722): __impl__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/paddle/base/wrapped_decorator.py(40): __impl__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/decorator.py(235): fun
E /home/aistudio/PaddleNLP/paddlenlp/trainer/trainer.py(2538): training_step
E /home/aistudio/PaddleNLP/paddlenlp/trainer/trainer.py(1240): _inner_training_loop
E /home/aistudio/PaddleNLP/paddlenlp/trainer/trainer.py(991): train
E /home/aistudio/PaddleNLP/./llm/run_finetune.py(478): main
E /home/aistudio/PaddleNLP/tests/llm/test_adamw_mini.py(53): test_finetune
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/unittest/case.py(549): _callTestMethod
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/unittest/case.py(591): run
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/unittest/case.py(650): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/unittest.py(351): runtest
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(178): pytest_runtest_call
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_callers.py(121): _multicall
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_manager.py(120): _hookexec
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_hooks.py(512): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(246): <lambda>
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(344): from_call
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(245): call_and_report
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(136): runtestprotocol
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/runner.py(117): pytest_runtest_protocol
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_callers.py(121): _multicall
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_manager.py(120): _hookexec
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_hooks.py(512): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/main.py(367): pytest_runtestloop
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_callers.py(121): _multicall
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_manager.py(120): _hookexec
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_hooks.py(512): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/main.py(343): _main
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/main.py(289): wrap_session
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/main.py(336): pytest_cmdline_main
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_callers.py(121): _multicall
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_manager.py(120): _hookexec
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pluggy/_hooks.py(512): __call__
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/config/__init__.py(175): main
E /home/aistudio/external-libraries/lib/python3.10/site-packages/_pytest/config/__init__.py(201): console_main
E /home/aistudio/external-libraries/lib/python3.10/site-packages/pytest/__main__.py(9): <module>
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/runpy.py(86): _run_code
E /opt/conda/envs/python35-paddle120-env/lib/python3.10/runpy.py(196): _run_module_as_main
E
E [Hint: ret should not be null.] (at ../paddle/fluid/pybind/eager_utils.cc:2672)
../external-libraries/lib/python3.10/site-packages/paddle/base/dygraph/tensor_patch_methods.py:371: OSError
---------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------
[2025-06-25 21:17:40,735] [ INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[2025-06-25 21:17:40,735] [ DEBUG] - ============================================================
[2025-06-25 21:17:40,735] [ DEBUG] - Model Configuration Arguments
[2025-06-25 21:17:40,736] [ DEBUG] - paddle commit id : 129b5cec5427ca9d634f490f28263ad274aacdf8
[2025-06-25 21:17:40,736] [ DEBUG] - paddlenlp commit id : 06378bb4e591363e14f44e2e8b0b90f39ee5527e.dirty
[2025-06-25 21:17:40,736] [ DEBUG] - actscale_moving_rate : 0.01
[2025-06-25 21:17:40,736] [ DEBUG] - aistudio_repo_id : None
[2025-06-25 21:17:40,736] [ DEBUG] - aistudio_repo_license : Apache License 2.0
[2025-06-25 21:17:40,736] [ DEBUG] - aistudio_repo_private : True
[2025-06-25 21:17:40,736] [ DEBUG] - aistudio_token : None
[2025-06-25 21:17:40,736] [ DEBUG] - apply_hadamard : False
[2025-06-25 21:17:40,736] [ DEBUG] - apply_online_actscale_step : 200
[2025-06-25 21:17:40,736] [ DEBUG] - attention_probs_dropout_prob : 0.1
[2025-06-25 21:17:40,736] [ DEBUG] - continue_training : True
[2025-06-25 21:17:40,736] [ DEBUG] - flash_mask : False
[2025-06-25 21:17:40,736] [ DEBUG] - fp8_format_type : hybrid
[2025-06-25 21:17:40,736] [ DEBUG] - from_aistudio : False
[2025-06-25 21:17:40,736] [ DEBUG] - fuse_attention_ffn : None
[2025-06-25 21:17:40,736] [ DEBUG] - fuse_attention_qkv : None
[2025-06-25 21:17:40,736] [ DEBUG] - hadamard_block_size : 32
[2025-06-25 21:17:40,737] [ DEBUG] - hidden_dropout_prob : 0.1
[2025-06-25 21:17:40,737] [ DEBUG] - lokr : False
[2025-06-25 21:17:40,737] [ DEBUG] - lokr_dim : 8
[2025-06-25 21:17:40,737] [ DEBUG] - lokr_path : None
[2025-06-25 21:17:40,737] [ DEBUG] - lora : False
[2025-06-25 21:17:40,737] [ DEBUG] - lora_path : None
[2025-06-25 21:17:40,737] [ DEBUG] - lora_plus_scale : 1.0
[2025-06-25 21:17:40,737] [ DEBUG] - lora_rank : 8
[2025-06-25 21:17:40,737] [ DEBUG] - lora_use_mixer : False
[2025-06-25 21:17:40,737] [ DEBUG] - lorapro : False
[2025-06-25 21:17:40,737] [ DEBUG] - lorapro_scaling_factor : 2.0
[2025-06-25 21:17:40,737] [ DEBUG] - lorapro_x_mode : zero
[2025-06-25 21:17:40,737] [ DEBUG] - model_name_or_path : __internal_testing__/tiny-random-llama
[2025-06-25 21:17:40,737] [ DEBUG] - neftune : False
[2025-06-25 21:17:40,737] [ DEBUG] - neftune_noise_alpha : 5.0
[2025-06-25 21:17:40,737] [ DEBUG] - num_prefix_tokens : 128
[2025-06-25 21:17:40,737] [ DEBUG] - pissa : False
[2025-06-25 21:17:40,737] [ DEBUG] - prefix_path : None
[2025-06-25 21:17:40,737] [ DEBUG] - prefix_tuning : False
[2025-06-25 21:17:40,738] [ DEBUG] - qlora_weight_blocksize : 64
[2025-06-25 21:17:40,738] [ DEBUG] - qlora_weight_double_quant : False
[2025-06-25 21:17:40,738] [ DEBUG] - qlora_weight_double_quant_block_size: 256
[2025-06-25 21:17:40,738] [ DEBUG] - quant_input_grad : False
[2025-06-25 21:17:40,738] [ DEBUG] - quant_weight_grad : False
[2025-06-25 21:17:40,738] [ DEBUG] - reft : False
[2025-06-25 21:17:40,738] [ DEBUG] - rope_scaling_factor : 1.0
[2025-06-25 21:17:40,738] [ DEBUG] - rslora : False
[2025-06-25 21:17:40,738] [ DEBUG] - save_to_aistudio : False
[2025-06-25 21:17:40,738] [ DEBUG] - strategy_name : None
[2025-06-25 21:17:40,738] [ DEBUG] - strategy_type : None
[2025-06-25 21:17:40,738] [ DEBUG] - tokenizer_name_or_path : None
[2025-06-25 21:17:40,738] [ DEBUG] - use_fast_layer_norm : False
[2025-06-25 21:17:40,738] [ DEBUG] - use_long_sequence_strategies : False
[2025-06-25 21:17:40,738] [ DEBUG] - use_mora : False
[2025-06-25 21:17:40,738] [ DEBUG] - use_quick_lora : False
[2025-06-25 21:17:40,738] [ DEBUG] - vera : False
[2025-06-25 21:17:40,738] [ DEBUG] - vera_rank : 8
[2025-06-25 21:17:40,739] [ DEBUG] - weight_quantize_algo : None
[2025-06-25 21:17:40,739] [ DEBUG] -
[2025-06-25 21:17:40,739] [ DEBUG] - ============================================================
[2025-06-25 21:17:40,739] [ DEBUG] - Data Configuration Arguments
[2025-06-25 21:17:40,739] [ DEBUG] - paddle commit id : 129b5cec5427ca9d634f490f28263ad274aacdf8
[2025-06-25 21:17:40,739] [ DEBUG] - paddlenlp commit id : 06378bb4e591363e14f44e2e8b0b90f39ee5527e.dirty
[2025-06-25 21:17:40,739] [ DEBUG] - autoregressive : False
[2025-06-25 21:17:40,739] [ DEBUG] - chat_template : None
[2025-06-25 21:17:40,739] [ DEBUG] - dataset_name_or_path : ./tests/fixtures/llm/data/
[2025-06-25 21:17:40,739] [ DEBUG] - eval_with_do_generation : False
[2025-06-25 21:17:40,739] [ DEBUG] - greedy_zero_padding : False
[2025-06-25 21:17:40,739] [ DEBUG] - lazy : False
[2025-06-25 21:17:40,739] [ DEBUG] - max_length : 2048
[2025-06-25 21:17:40,739] [ DEBUG] - pad_to_max_length : False
[2025-06-25 21:17:40,739] [ DEBUG] - pad_to_multiple_of : None
[2025-06-25 21:17:40,739] [ DEBUG] - save_generation_output : False
[2025-06-25 21:17:40,740] [ DEBUG] - src_length : 1024
[2025-06-25 21:17:40,740] [ DEBUG] - task_name : None
[2025-06-25 21:17:40,740] [ DEBUG] - use_pose_convert : False
[2025-06-25 21:17:40,740] [ DEBUG] - zero_padding : False
[2025-06-25 21:17:40,740] [ DEBUG] -
[2025-06-25 21:17:40,740] [ DEBUG] - ============================================================
[2025-06-25 21:17:40,740] [ DEBUG] - Generation Configuration Arguments
[2025-06-25 21:17:40,740] [ DEBUG] - paddle commit id : 129b5cec5427ca9d634f490f28263ad274aacdf8
[2025-06-25 21:17:40,740] [ DEBUG] - paddlenlp commit id : 06378bb4e591363e14f44e2e8b0b90f39ee5527e.dirty
[2025-06-25 21:17:40,740] [ DEBUG] - top_k : 1
[2025-06-25 21:17:40,740] [ DEBUG] - top_p : 1.0
[2025-06-25 21:17:40,740] [ DEBUG] -
[2025-06-25 21:17:40,741] [ INFO] - The global seed is set to 42, local seed is set to 43 and random seed is set to 42.
[2025-06-25 21:17:40,741] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: True
[2025-06-25 21:17:40,743] [ INFO] - Final model config: LlamaConfig {
"alibi": false,
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"context_parallel_degree": -1,
"dpo_config": null,
"dtype": "float16",
"eos_token_id": 2,
"hidden_size": 768,
"immediate_clear_past_key_value": false,
"initializer_range": 0.02,
"intermediate_size": 11008,
"long_sequence_init_args": {},
"long_sequence_strategy_name": null,
"long_sequence_strategy_type": null,
"max_position_embeddings": 2048,
"model_type": "llama",
"num_attention_heads": 8,
"num_hidden_layers": 2,
"num_key_value_heads": 8,
"pad_token_id": 0,
"paddlenlp_version": "3.0.0b4.post20250625",
"pipeline_parallel_degree": -1,
"recompute": true,
"refined_recompute": {
"attention_column_ln": 0,
"attention_row_ln": 0,
"flash_attn": -1,
"mlp_column_ln": 0,
"mlp_row_ln": 0
},
"rms_norm_eps": 1e-06,
"rope_scaling_factor": 1.0,
"rope_scaling_type": null,
"rope_theta": 10000.0,
"sep_parallel_degree": -1,
"seq_length": 2048,
"tensor_parallel_degree": -1,
"tensor_parallel_output": false,
"tie_word_embeddings": false,
"use_fast_layer_norm": false,
"use_flash_attention": true,
"use_flash_attention_for_generation": false,
"use_last_token_for_generation": false,
"use_long_sequence_strategies": false,
"vocab_size": 32000
}
[2025-06-25 21:17:40,743] [ INFO] - Creating model
[2025-06-25 21:17:40,743] [ INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load '__internal_testing__/tiny-random-llama'.
[2025-06-25 21:17:40,744] [ INFO] - Loading weights file from cache at /home/aistudio/.paddlenlp/models/__internal_testing__/tiny-random-llama/model_state.pdparams
[2025-06-25 21:17:41,072] [ INFO] - Loaded weights file from disk, setting weights to model.
W0625 21:17:41.083987 83454 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
[2025-06-25 21:17:45,086] [ INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM.
[2025-06-25 21:17:45,087] [ INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at __internal_testing__/tiny-random-llama.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[2025-06-25 21:17:45,125] [ INFO] - Generation config file not found, using a generation config created from the model config.
[2025-06-25 21:17:45,139] [ INFO] - We are using <class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'> to load '__internal_testing__/tiny-random-llama'.
[2025-06-25 21:17:45,154] [ INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/__internal_testing__/tiny-random-llama/tokenizer_config.json
[2025-06-25 21:17:45,155] [ INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/__internal_testing__/tiny-random-llama/special_tokens_map.json
[2025-06-25 21:17:45,155] [ INFO] - load train
[2025-06-25 21:17:45,201] [ INFO] - load eval
[2025-06-25 21:17:45,208] [ INFO] - load test
[2025-06-25 21:17:45,208] [ INFO] - Trans the dataset text into token ids, please wait for a moment.
[2025-06-25 21:17:45,209] [ INFO] - The global seed is set to 42, local seed is set to 43 and random seed is set to 42.
[2025-06-25 21:17:45,270] [ INFO] - Using half precision
[2025-06-25 21:17:45,293] [ DEBUG] - ============================================================
[2025-06-25 21:17:45,293] [ DEBUG] - Training Configuration Arguments
[2025-06-25 21:17:45,294] [ DEBUG] - paddle commit id : 129b5cec5427ca9d634f490f28263ad274aacdf8
[2025-06-25 21:17:45,294] [ DEBUG] - paddlenlp commit id : 06378bb4e591363e14f44e2e8b0b90f39ee5527e.dirty
[2025-06-25 21:17:45,294] [ DEBUG] - _no_sync_in_gradient_accumulation: True
[2025-06-25 21:17:45,294] [ DEBUG] - adam_beta1 : 0.9
[2025-06-25 21:17:45,294] [ DEBUG] - adam_beta2 : 0.999
[2025-06-25 21:17:45,294] [ DEBUG] - adam_epsilon : 1e-08
[2025-06-25 21:17:45,294] [ DEBUG] - amp_custom_black_list : None
[2025-06-25 21:17:45,294] [ DEBUG] - amp_custom_white_list : None
[2025-06-25 21:17:45,294] [ DEBUG] - amp_master_grad : False
[2025-06-25 21:17:45,294] [ DEBUG] - auto_parallel_resume_form_hybrid_parallel: False
[2025-06-25 21:17:45,294] [ DEBUG] - autotuner_benchmark : False
[2025-06-25 21:17:45,295] [ DEBUG] - benchmark : False
[2025-06-25 21:17:45,295] [ DEBUG] - bf16 : False
[2025-06-25 21:17:45,295] [ DEBUG] - bf16_full_eval : False
[2025-06-25 21:17:45,295] [ DEBUG] - ckpt_quant_stage : O0
[2025-06-25 21:17:45,295] [ DEBUG] - context_parallel_degree : -1
[2025-06-25 21:17:45,295] [ DEBUG] - count_trained_tokens : False
[2025-06-25 21:17:45,295] [ DEBUG] - current_device : gpu:0
[2025-06-25 21:17:45,295] [ DEBUG] - data_parallel_config :
[2025-06-25 21:17:45,295] [ DEBUG] - data_parallel_degree : 1
[2025-06-25 21:17:45,295] [ DEBUG] - data_parallel_rank : 0
[2025-06-25 21:17:45,295] [ DEBUG] - dataloader_drop_last : False
[2025-06-25 21:17:45,295] [ DEBUG] - dataloader_num_workers : 0
[2025-06-25 21:17:45,295] [ DEBUG] - dataloader_shuffle : True
[2025-06-25 21:17:45,295] [ DEBUG] - dataset_batch_size : 1000
[2025-06-25 21:17:45,295] [ DEBUG] - dataset_kwargs : {}
[2025-06-25 21:17:45,296] [ DEBUG] - dataset_num_proc : None
[2025-06-25 21:17:45,296] [ DEBUG] - dataset_rank : 0
[2025-06-25 21:17:45,296] [ DEBUG] - dataset_text_field : text
[2025-06-25 21:17:45,296] [ DEBUG] - dataset_world_size : 1
[2025-06-25 21:17:45,296] [ DEBUG] - ddp_find_unused_parameters : None
[2025-06-25 21:17:45,296] [ DEBUG] - decay_steps : 0
[2025-06-25 21:17:45,296] [ DEBUG] - device : gpu
[2025-06-25 21:17:45,296] [ DEBUG] - disable_tqdm : True
[2025-06-25 21:17:45,296] [ DEBUG] - distributed_dataloader : False
[2025-06-25 21:17:45,296] [ DEBUG] - do_eval : True
[2025-06-25 21:17:45,296] [ DEBUG] - do_export : False
[2025-06-25 21:17:45,296] [ DEBUG] - do_predict : False
[2025-06-25 21:17:45,296] [ DEBUG] - do_train : True
[2025-06-25 21:17:45,296] [ DEBUG] - enable_auto_parallel : False
[2025-06-25 21:17:45,297] [ DEBUG] - enable_zero_cost_checkpoint : False
[2025-06-25 21:17:45,297] [ DEBUG] - eval_accumulation_steps : 16
[2025-06-25 21:17:45,297] [ DEBUG] - eval_batch_size : 8
[2025-06-25 21:17:45,297] [ DEBUG] - eval_packing : None
[2025-06-25 21:17:45,297] [ DEBUG] - eval_steps : None
[2025-06-25 21:17:45,297] [ DEBUG] - evaluation_strategy : IntervalStrategy.EPOCH
[2025-06-25 21:17:45,297] [ DEBUG] - expert_max_capacity : 4294967296
[2025-06-25 21:17:45,297] [ DEBUG] - expert_min_capacity : 1
[2025-06-25 21:17:45,297] [ DEBUG] - expert_parallel_degree : -1
[2025-06-25 21:17:45,297] [ DEBUG] - expert_tensor_parallel_degree : -1
[2025-06-25 21:17:45,297] [ DEBUG] - flash_device_save_steps : 0
[2025-06-25 21:17:45,297] [ DEBUG] - flatten_param_grads : False
[2025-06-25 21:17:45,297] [ DEBUG] - force_reshard_pp : False
[2025-06-25 21:17:45,298] [ DEBUG] - fp16 : True
[2025-06-25 21:17:45,298] [ DEBUG] - fp16_full_eval : False
[2025-06-25 21:17:45,298] [ DEBUG] - fp16_opt_level : O2
[2025-06-25 21:17:45,298] [ DEBUG] - fuse_sequence_parallel_allreduce: False
[2025-06-25 21:17:45,298] [ DEBUG] - gradient_accumulation_steps : 4
[2025-06-25 21:17:45,298] [ DEBUG] - greater_is_better : True
[2025-06-25 21:17:45,298] [ DEBUG] - hybrid_parallel_topo_order : pp_first
[2025-06-25 21:17:45,298] [ DEBUG] - ignore_data_skip : False
[2025-06-25 21:17:45,298] [ DEBUG] - ignore_load_lr_and_optim : False
[2025-06-25 21:17:45,298] [ DEBUG] - ignore_save_lr_and_optim : True
[2025-06-25 21:17:45,298] [ DEBUG] - label_names : None
[2025-06-25 21:17:45,298] [ DEBUG] - lazy_data_processing : True
[2025-06-25 21:17:45,298] [ DEBUG] - learning_rate : 3e-05
[2025-06-25 21:17:45,298] [ DEBUG] - load_best_model_at_end : True
[2025-06-25 21:17:45,298] [ DEBUG] - load_sharded_model : False
[2025-06-25 21:17:45,298] [ DEBUG] - local_process_index : 0
[2025-06-25 21:17:45,298] [ DEBUG] - local_rank : -1
[2025-06-25 21:17:45,299] [ DEBUG] - log_level : -1
[2025-06-25 21:17:45,299] [ DEBUG] - log_level_replica : -1
[2025-06-25 21:17:45,299] [ DEBUG] - log_on_each_node : True
[2025-06-25 21:17:45,299] [ DEBUG] - logging_dir : /tmp/tmp2wz24yqo/runs/Jun25_21-17-40_jupyter-942478-8790893
[2025-06-25 21:17:45,299] [ DEBUG] - logging_first_step : False
[2025-06-25 21:17:45,299] [ DEBUG] - logging_steps : 1
[2025-06-25 21:17:45,299] [ DEBUG] - logging_strategy : IntervalStrategy.STEPS
[2025-06-25 21:17:45,299] [ DEBUG] - logical_process_index : 0
[2025-06-25 21:17:45,299] [ DEBUG] - lr_end : 1e-07
[2025-06-25 21:17:45,299] [ DEBUG] - lr_scheduler_type : SchedulerType.LINEAR
[2025-06-25 21:17:45,299] [ DEBUG] - max_evaluate_steps : -1
[2025-06-25 21:17:45,299] [ DEBUG] - max_grad_norm : 1.0
[2025-06-25 21:17:45,299] [ DEBUG] - max_seq_length : 2048
[2025-06-25 21:17:45,299] [ DEBUG] - max_steps : -1
[2025-06-25 21:17:45,299] [ DEBUG] - metric_for_best_model : accuracy
[2025-06-25 21:17:45,300] [ DEBUG] - metrics_output_path : None
[2025-06-25 21:17:45,300] [ DEBUG] - min_lr : 0.0
[2025-06-25 21:17:45,300] [ DEBUG] - minimum_eval_times : None
[2025-06-25 21:17:45,300] [ DEBUG] - model_init_kwargs : None
[2025-06-25 21:17:45,300] [ DEBUG] - no_cuda : False
[2025-06-25 21:17:45,300] [ DEBUG] - no_recompute_layers : None
[2025-06-25 21:17:45,300] [ DEBUG] - num_cycles : 0.5
[2025-06-25 21:17:45,300] [ DEBUG] - num_train_epochs : 3.0
[2025-06-25 21:17:45,300] [ DEBUG] - offload_optim : False
[2025-06-25 21:17:45,300] [ DEBUG] - offload_recompute_inputs : False
[2025-06-25 21:17:45,300] [ DEBUG] - optim : OptimizerNames.ADAMW_MINI
[2025-06-25 21:17:45,300] [ DEBUG] - optimizer_name_suffix : None
[2025-06-25 21:17:45,300] [ DEBUG] - ordered_save_group_size : 0
[2025-06-25 21:17:45,300] [ DEBUG] - output_dir : /tmp/tmp2wz24yqo
[2025-06-25 21:17:45,301] [ DEBUG] - output_signal_dir : /tmp/tmp2wz24yqo
[2025-06-25 21:17:45,301] [ DEBUG] - overwrite_output_dir : False
[2025-06-25 21:17:45,301] [ DEBUG] - pad_token_id : 0
[2025-06-25 21:17:45,301] [ DEBUG] - past_index : -1
[2025-06-25 21:17:45,301] [ DEBUG] - pdc_download_ckpt : False
[2025-06-25 21:17:45,301] [ DEBUG] - pdc_download_timeout : 300
[2025-06-25 21:17:45,301] [ DEBUG] - per_device_eval_batch_size : 8
[2025-06-25 21:17:45,301] [ DEBUG] - per_device_train_batch_size : 4
[2025-06-25 21:17:45,301] [ DEBUG] - pipeline_parallel_config :
[2025-06-25 21:17:45,301] [ DEBUG] - pipeline_parallel_degree : -1
[2025-06-25 21:17:45,301] [ DEBUG] - pipeline_parallel_rank : 0
[2025-06-25 21:17:45,301] [ DEBUG] - power : 1.0
[2025-06-25 21:17:45,301] [ DEBUG] - pp_recompute_interval : 1
[2025-06-25 21:17:45,302] [ DEBUG] - prediction_loss_only : False
[2025-06-25 21:17:45,302] [ DEBUG] - process_index : 0
[2025-06-25 21:17:45,302] [ DEBUG] - recompute : True
[2025-06-25 21:17:45,302] [ DEBUG] - recompute_granularity : full
[2025-06-25 21:17:45,302] [ DEBUG] - recompute_use_reentrant : False
[2025-06-25 21:17:45,302] [ DEBUG] - refined_recompute : {'mlp_row_ln': 0, 'attention_row_ln': 0, 'attention_column_ln': 0, 'mlp_column_ln': 0, 'flash_attn': -1}
[2025-06-25 21:17:45,302] [ DEBUG] - release_grads : False
[2025-06-25 21:17:45,302] [ DEBUG] - remove_unused_columns : True
[2025-06-25 21:17:45,302] [ DEBUG] - report_to : ['visualdl']
[2025-06-25 21:17:45,302] [ DEBUG] - resume_from_checkpoint : None
[2025-06-25 21:17:45,302] [ DEBUG] - run_name : /tmp/tmp2wz24yqo
[2025-06-25 21:17:45,302] [ DEBUG] - save_on_each_node : False
[2025-06-25 21:17:45,302] [ DEBUG] - save_rng_states : True
[2025-06-25 21:17:45,302] [ DEBUG] - save_sharded_model : False
[2025-06-25 21:17:45,302] [ DEBUG] - save_sharding_stage1_model_include_freeze_params: False
[2025-06-25 21:17:45,303] [ DEBUG] - save_steps : 500
[2025-06-25 21:17:45,303] [ DEBUG] - save_strategy : IntervalStrategy.EPOCH
[2025-06-25 21:17:45,303] [ DEBUG] - save_tokenizer : True
[2025-06-25 21:17:45,303] [ DEBUG] - save_total_limit : 1
[2025-06-25 21:17:45,303] [ DEBUG] - scale_loss : 32768
[2025-06-25 21:17:45,303] [ DEBUG] - seed : 42
[2025-06-25 21:17:45,303] [ DEBUG] - sep_parallel_degree : -1
[2025-06-25 21:17:45,303] [ DEBUG] - sequence_parallel : False
[2025-06-25 21:17:45,303] [ DEBUG] - sequence_parallel_config :
[2025-06-25 21:17:45,303] [ DEBUG] - sharding : []
[2025-06-25 21:17:45,304] [ DEBUG] - sharding_comm_buffer_size_MB : -1
[2025-06-25 21:17:45,304] [ DEBUG] - sharding_degree : -1
[2025-06-25 21:17:45,304] [ DEBUG] - sharding_parallel_config :
[2025-06-25 21:17:45,304] [ DEBUG] - sharding_parallel_degree : -1
[2025-06-25 21:17:45,304] [ DEBUG] - sharding_parallel_mesh_dimension: dp
[2025-06-25 21:17:45,304] [ DEBUG] - sharding_parallel_rank : 0
[2025-06-25 21:17:45,304] [ DEBUG] - should_load_dataset : True
[2025-06-25 21:17:45,304] [ DEBUG] - should_load_sharding_stage1_model: False
[2025-06-25 21:17:45,304] [ DEBUG] - should_log : True
[2025-06-25 21:17:45,304] [ DEBUG] - should_save : True
[2025-06-25 21:17:45,304] [ DEBUG] - should_save_model_state : True
[2025-06-25 21:17:45,305] [ DEBUG] - should_save_model_with_tensor_fusion: False
[2025-06-25 21:17:45,305] [ DEBUG] - should_save_sharding_stage1_model: False
[2025-06-25 21:17:45,305] [ DEBUG] - skip_data_intervals : None
[2025-06-25 21:17:45,305] [ DEBUG] - skip_memory_metrics : True
[2025-06-25 21:17:45,305] [ DEBUG] - skip_profile_timer : True
[2025-06-25 21:17:45,305] [ DEBUG] - split_inputs_sequence_dim : True
[2025-06-25 21:17:45,305] [ DEBUG] - split_norm_comm : False
[2025-06-25 21:17:45,305] [ DEBUG] - ssa_group_size_ratio : 0.25
[2025-06-25 21:17:45,305] [ DEBUG] - tensor_parallel_config :
[2025-06-25 21:17:45,305] [ DEBUG] - tensor_parallel_degree : -1
[2025-06-25 21:17:45,305] [ DEBUG] - tensor_parallel_output : False
[2025-06-25 21:17:45,306] [ DEBUG] - tensor_parallel_rank : 0
[2025-06-25 21:17:45,306] [ DEBUG] - tensorwise_offload_optimizer : False
[2025-06-25 21:17:45,306] [ DEBUG] - to_static : False
[2025-06-25 21:17:45,306] [ DEBUG] - train_batch_size : 4
[2025-06-25 21:17:45,306] [ DEBUG] - unified_checkpoint : False
[2025-06-25 21:17:45,306] [ DEBUG] - unified_checkpoint_config :
[2025-06-25 21:17:45,306] [ DEBUG] - use_async_save : False
[2025-06-25 21:17:45,306] [ DEBUG] - use_expert_parallel : False
[2025-06-25 21:17:45,306] [ DEBUG] - use_flash_attention : True
[2025-06-25 21:17:45,306] [ DEBUG] - use_fused_dropout_add : False
[2025-06-25 21:17:45,306] [ DEBUG] - use_fused_linear : False
[2025-06-25 21:17:45,307] [ DEBUG] - use_fused_linear_cross_entropy: False
[2025-06-25 21:17:45,307] [ DEBUG] - use_fused_rms_norm : False
[2025-06-25 21:17:45,307] [ DEBUG] - use_fused_rope : False
[2025-06-25 21:17:45,307] [ DEBUG] - use_hybrid_parallel : False
[2025-06-25 21:17:45,307] [ DEBUG] - use_lowprecision_moment : False
[2025-06-25 21:17:45,307] [ DEBUG] - use_ssa : False
[2025-06-25 21:17:45,307] [ DEBUG] - virtual_pp_degree : 1
[2025-06-25 21:17:45,307] [ DEBUG] - wandb_api_key : None
[2025-06-25 21:17:45,307] [ DEBUG] - wandb_http_proxy : None
[2025-06-25 21:17:45,307] [ DEBUG] - warmup_ratio : 0.0
[2025-06-25 21:17:45,307] [ DEBUG] - warmup_steps : 30
[2025-06-25 21:17:45,307] [ DEBUG] - weight_decay : 0.0
[2025-06-25 21:17:45,308] [ DEBUG] - weight_name_suffix : None
[2025-06-25 21:17:45,308] [ DEBUG] - world_size : 1
[2025-06-25 21:17:45,308] [ DEBUG] - zcc_ema_interval : 1
[2025-06-25 21:17:45,308] [ DEBUG] - zcc_pipeline_hooks_capacity_usage: 0.6
[2025-06-25 21:17:45,308] [ DEBUG] - zcc_save_ema_coef : None
[2025-06-25 21:17:45,308] [ DEBUG] - zcc_workers_num : 3
[2025-06-25 21:17:45,308] [ DEBUG] -
[2025-06-25 21:17:45,308] [ INFO] - Starting training from resume_from_checkpoint : None
[2025-06-25 21:17:45,309] [ WARNING] - Warning: `named_parameters` is None, AdamWMini will use `parameters` instead, which may be incorrect.
[2025-06-25 21:17:45,310] [ INFO] - [timelog] checkpoint loading time: 0.00s (2025-06-25 21:17:45)
[2025-06-25 21:17:45,310] [ INFO] - ***** Running training *****
[2025-06-25 21:17:45,310] [ INFO] - Num examples = 20
[2025-06-25 21:17:45,310] [ INFO] - Num Epochs = 3
[2025-06-25 21:17:45,310] [ INFO] - Instantaneous batch size per device = 4
[2025-06-25 21:17:45,310] [ INFO] - Total train batch size (w. parallel, distributed & accumulation) = 16
[2025-06-25 21:17:45,310] [ INFO] - Gradient Accumulation steps = 4
[2025-06-25 21:17:45,310] [ INFO] - Total optimization steps = 3
[2025-06-25 21:17:45,310] [ INFO] - Total num train samples = 60
[2025-06-25 21:17:45,311] [ DEBUG] - Number of trainable parameters = 104,599,296 (per device)
W0625 21:17:46.049122 83454 multiply_fwd_func.cc:76] got different data type, run type promotion automatically, this may cause data type been changed.
W0625 21:17:46.073673 83454 backward.cc:462] While running Node (MatmulGradNode) raises an EnforceNotMet exception
=============================================================== short test summary info ===============================================================
FAILED tests/llm/test_adamw_mini.py::FinetuneTest_0_llama::test_finetune - OSError: (External) OSError: (External) Exception: Not supported to retrieve a tensor saved by autograd multiple times that is no need to recomput...
1 failed in 10.86s
|
Before submitting
tests
folder. If there are codecov issues, please add tests cases first.PR types
Bug fixes
PR changes
APIs
Description
parameters
参数,以兼容 paddle 中的 optimizers本地测试通过:
注意:上述测试需要修改
paddlenlp/trainer/trainer_utils.py
,去掉from ..transformers import get_gpt_pp_schedule, get_llama_pp_schedule
,否则提示引入错误:这个 #10759 PR 对 import 做的修改,
会导致导入错误,至少我这里是这样 ... ...
关联 #10413
@DrownFish19 帮忙看看,感谢 ~