You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 0 examples [00:00, ? examples/s]
Traceback (most recent call last):
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\paddlenlp\datasets\dataset.py", line 202, in load_dataset
reader_cls = import_main_class(path_or_read_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\paddlenlp\datasets\dataset.py", line 99, in import_main_class
module = importlib.import_module(module_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\wind\AppData\Local\Programs\Python\Python311\Lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1140, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'paddlenlp.datasets.json'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\projects\py-paddle-uie\PaddleNLP-3.0.0-beta4\llm\run_finetune.py", line 717, in<module>main()
File "C:\projects\py-paddle-uie\PaddleNLP-3.0.0-beta4\llm\run_finetune.py", line 295, in main
train_ds, dev_ds, test_ds = create_dataset(data_args, training_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\py-paddle-uie\PaddleNLP-3.0.0-beta4\llm\run_finetune.py", line 660, in create_dataset
train_ds = load_dataset(
^^^^^^^^^^^^^
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\paddlenlp\datasets\dataset.py", line 204, in load_dataset
datasets = load_from_hf(
^^^^^^^^^^^^^
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\paddlenlp\datasets\dataset.py", line 123, in load_from_hf
hf_datasets = load_hf_dataset(path, name=name, split=splits, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\paddlenlp\datasets\dataset.py", line 59, in load_from_ppnlp
return origin_load_dataset(path, trust_remote_code=True, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\datasets\load.py", line 2151, in load_dataset
builder_instance.download_and_prepare(
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\datasets\builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\datasets\builder.py", line 1000, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\datasets\builder.py", line 1741, in _prepare_split
forjob_id, done, contentin self._prepare_split_single(
File "C:\projects\py-paddle-uie\venv\Lib\site-packages\datasets\builder.py", line 1897, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
LAUNCH INFO 2025-05-06 17:22:58,771 Pod failed
LAUNCH ERROR 2025-05-06 17:22:58,772 Container failed !!!
Container rank 0 status failed cmd ['C:\\projects\\py-paddle-uie\\venv\\Scripts\\python.exe', '-u', 'run_finetune.py', './config/qwen/sft_argument_test.json'] code 1 log log\workerlog.0
LAUNCH INFO 2025-05-06 17:22:58,772 ------------------------- ERROR LOG DETAIL -------------------------
ages\paddlenlp\datasets\dataset.py", line 99, in import_main_class module = importlib.import_module(module_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Uh oh!
There was an error while loading. Please reload this page.
软件环境
重复问题
错误描述
参照的操作文档:https://paddlenlp.readthedocs.io/zh/latest/llm/application/information_extraction/README.html 在doccano标注和根据doccano.py生成训练数据后,调用llm/run-finetune.py脚本时,报错找不到'paddlenlp.datasets.json'模块。 先前的UIE版本的funtune.py文件中,load_dataset函数传入了一个read函数,函数在UIE目录下的utils.py中定义。但是现在的UIE文档推荐使用llm/run_finetune.py进行大模型精调,在加载代码时会报错,找不到'paddlenlp.datasets.json'模块。 不确定如何使用load_dataset函数加载到json模块。
稳定复现步骤 & 代码
问题复现
doccano只进行了几行简单的标注,生成train.json、dev.json、sample_index.json、test.json文件。
运行的训练命令
python -u -m paddle.distributed.launch --gpus "0" run_finetune.py ./config/qwen/sft_argument_test.json
sft_argument_test.json配置
主要输出错误
问题排查
run_finetune.py代码中,创建数据集的create_dataset函数里,如果路径下存在train.json或者dev.json,则用load_dataset函数的第一个参数为“json”,具体代码如下:
json模块默认作为paddlenlp.datasets.json进行加载,但是没有找到对应的实现,代码仓库中没有json.py文件,全局也没有搜到"json"字符串在
__all__
中。The text was updated successfully, but these errors were encountered: