Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error occurs when running train.py #8

Open
Celina-521 opened this issue Oct 26, 2023 · 1 comment
Open

Error occurs when running train.py #8

Celina-521 opened this issue Oct 26, 2023 · 1 comment

Comments

@Celina-521
Copy link

Celina-521 commented Oct 26, 2023

When I run train.py, the error occurs as following:
UserWarning: DataLoader returned 0 length. Please make sure this was your intention.
rank_zero_warn(
/data1/fyy/anaconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/utilities/data.py:110: UserWarning: Total length of CombinedLoader across ranks is zero. Please make sure this was your intention.
rank_zero_warn( rank_zero_warn(
Training: 0it [00:00, ?it/s]Trainer.fit stopped: No training batches.
Training: 0it [00:00, ?it/s]

Beacuse
[Errno 2] No such file or directory: './data/interhuman_processed/ignore_list.txt'
[Errno 2] No such file or directory: './data/interhuman_processed/ignore_list.txt'
[Errno 2] No such file or directory: './data/interhuman_processed/ignore_list.txt'
[Errno 2] No such file or directory: './data/interhuman_processed/train.txt'
[Errno 2] No such file or directory: './data/interhuman_processed/train.txt'
[Errno 2] No such file or directory: './data/interhuman_processed/train.txt'

How can I get the interhuman_processed?

@bring-nirachornkul
Copy link

I have the same errors as well for python tools/train.py

Here is the error :

(intergen) blinkdrive@blinkdrive-System-Product-Name:~/Documents/Projects/Summer2024/InterGen$ python tools/train.py
/home/blinkdrive/Documents/Projects/Summer2024/InterGen
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

[Errno 2] No such file or directory: './data/interhuman_processed/ignore_list.txt'
[Errno 2] No such file or directory: './data/interhuman_processed/train.txt'
total dataset:  0
/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:613: UserWarning: Checkpoint directory ./checkpoints/IG-S-8/model exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Adjusting learning rate of group 0 to 1.0000e-05.

  | Name  | Type     | Params
-----------------------------------
0 | model | InterGen | 305 M 
-----------------------------------
182 M     Trainable params
123 M     Non-trainable params
305 M     Total params
1,221.769 Total estimated model params size (MB)
Traceback (most recent call last):
  File "tools/train.py", line 162, in <module>
    trainer.fit(model=litmodel, datamodule=datamodule)
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 36, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 88, in launch
    return function(*args, **kwargs)
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1112, in _run
    results = self._run_stage()
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1191, in _run_stage
    self._run_train()
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1214, in _run_train
    self.fit_loop.run()
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/loops/loop.py", line 194, in run
    self.on_run_start(*args, **kwargs)
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/loops/fit_loop.py", line 206, in on_run_start
    self.trainer.reset_train_dataloader(self.trainer.lightning_module)
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1529, in reset_train_dataloader
    self.train_dataloader = self._data_connector._request_dataloader(RunningStage.TRAINING)
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/data_connector.py", line 446, in _request_dataloader
    dataloader = source.dataloader()
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/data_connector.py", line 524, in dataloader
    return method()
  File "/home/blinkdrive/Documents/Projects/Summer2024/InterGen/tools/../datasets/__init__.py", line 59, in train_dataloader
    return torch.utils.data.DataLoader(
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/lightning/fabric/utilities/data.py", line 323, in wrapper
    init(obj, *args, **kwargs)
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 344, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-type]
  File "/home/blinkdrive/miniconda3/envs/intergen/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 107, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

This issue has not been solved yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants