Inference failed with custom finetuned model

### 
- Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.58s/it]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:08<00:00,  1.66s/it]
Traceback (most recent call last):
  File "/workspace/CogVideo/inference/cli_demo.py", line 181, in <module>
    generate_video(
  File "/workspace/CogVideo/inference/cli_demo.py", line 85, in generate_video
    pipe.fuse_lora(lora_scale=1 / lora_rank)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_pipeline.py", line 2888, in fuse_lora
    super().fuse_lora(
  File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_base.py", line 445, in fuse_lora
    raise ValueError(f"{fuse_component} is not found in {self._lora_loadable_modules=}.")
ValueError: text_encoder is not found in self._lora_loadable_modules=['transformer'].
E1112 07:57:11.357000 4751 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 4816) of binary: /usr/bin/python3.10
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 919, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
inference/cli_demo.py FAILED
------------------------------------------------------------
- Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
- Root Cause (first observed failure):
[0]:
  time      : 2024-11-12_07:57:11
  host      : sg17
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 4816)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================


I executed 'sh finetune_single_rank.sh' with my custom dataset which contains videos.txt, videos/, prompts.txt
and MODEL_PATH="THUDM/CogVideoX-2b"

Next, I executed torchrun --nnodes=1 --nproc_per_node=1 --master_port=29506 inference/cli_demo.py

lora_path: '/workspace/CogVideo/finetune/cogvideox-lora-single-node-1/checkpoint-X000/'
lora_rank: 128
which fits the finetune config.

and error occured : ValueError: text_encoder is not found in self._lora_loadable_modules=['transformer'].

### Information / 问题信息

- [ ] The official example scripts / 官方的示例脚本
- [X] My own modified scripts / 我自己修改的脚本和任务

### Reproduction / 复现过程

[accelerate_config_machine_single.yaml]
num_processes: 2



### Expected behavior / 期待表现

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference failed with custom finetuned model #490

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference failed with custom finetuned model #490

Description

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions