Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

callVariant couldn't finish because of a large alt splice deletion #600

Closed
lydiayliu opened this issue Oct 27, 2022 · 3 comments · Fixed by #601
Closed

callVariant couldn't finish because of a large alt splice deletion #600

lydiayliu opened this issue Oct 27, 2022 · 3 comments · Fixed by #601
Assignees
Labels
priority: now Issue to be fixed immediately

Comments

@lydiayliu
Copy link
Collaborator

Error executing process > 'call_caller_wrapper (1)'

Caused by:
  Process `call_caller_wrapper (1)` terminated with an error exit status (1)

Command executed:

  set -euo pipefail
  
  NXF_WORK=/scratch \
  nextflow run /hot/users/yiyangliu/project-MissingPeptides-Method/pipelines/pipeline-meta-call-NonCanonicalPeptide/modules/call_NonCanonicalPeptide/main.nf \
      --input_csv input_CPCG0100-CPCG0183-CPCG0196.csv \
      --exprs_table_csv exprs_table_CPCG0100-CPCG0183-CPCG0196.csv \
      --variant_fasta_csv variant_fasta_CPCG0100-CPCG0183-CPCG0196.csv \
      --output_dir pipeline-meta-call-NonCanonicalPeptide-0.0.1 \
      --entrypoint gvf \
      --call_variant_ncpus 22 \
      --call_variant_memory_GB '45 GB' \
      --config_json config.json

Command exit status:
  1

Command output:
      (wrapper_remote pid=3128) [ 2022-10-27 19:08:56 ] [ !!! moPepGen WARNING !!! ] Hypermutated region detected from graph: 'ENST00000649622.1'. The argument max_variants_per_node = 7 and additional_variants_per_misc = 2 was used to reduce complexity.
      (wrapper_remote pid=3140) [ 2022-10-27 20:10:05 ] [ !!! moPepGen WARNING !!! ] Hypermutated region detected from graph: 'ENST00000649622.1'. The argument max_variants_per_node = 7 and additional_variants_per_misc = 2 was used to reduce complexity.
    Command error:
      Unable to find image 'ghcr.io/uclahs-cds/mopepgen:dev' locally
      dev: Pulling from uclahs-cds/mopepgen
      fb0b3276a519: Pulling fs layer
      ccf3a013c265: Pulling fs layer
      ccf3a013c265: Verifying Checksum
      ccf3a013c265: Download complete
      fb0b3276a519: Download complete
      fb0b3276a519: Pull complete
      ccf3a013c265: Pull complete
      Digest: sha256:939a21111036dcf1e6c35043e7638c2e58bfd5958a7470c6c5f2a2b80ef143d0
      Status: Downloaded newer image for ghcr.io/uclahs-cds/mopepgen:dev
      2022-10-27 16:45:46,221   INFO worker.py:1518 -- Started a local Ray instance.
      2022-10-27 17:54:15,677   WARNING worker.py:1829 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: a39744f00e82ea64e7c91e692149859d7d09d23901000000 Worker ID: b0fa5222445c183bc4a4a9945fe9bf6fde1e9e6899ade324cb192a48 Node ID: bb4cd40237bd4359d3de38ef699a1a63a01bf4b1b2f8fa588182b4a7 Worker IP address: 172.17.0.3 Worker port: 36948 Worker PID: 3130 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
      2022-10-27 19:08:55,430   WARNING worker.py:1829 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: 289dc5b6bbe00a0913fbd119925528105a63e29c01000000 Worker ID: b318dd6582eb65bd6e9135046ec257179965aec758efae35f191dfd5 Node ID: bb4cd40237bd4359d3de38ef699a1a63a01bf4b1b2f8fa588182b4a7 Worker IP address: 172.17.0.3 Worker port: 35535 Worker PID: 3145 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
      2022-10-27 20:10:04,567   WARNING worker.py:1829 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, 
check the logs for the dead worker. RayTask ID: 1e64a5fed5434436f919bfd4bad5aae0c2a712c501000000 Worker ID: f6654cfb7a5ff0d12c2db194295324e030a9dbea4da4ab92b40bb0a9 Node ID: bb4cd40237bd4359d3de38ef699a1a63a01bf4b1b2f8fa588182b4a7 Worker IP address: 172.17.0.3 Worker port: 43606 Worker PID: 3128 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
      2022-10-27 21:31:19,639   WARNING worker.py:1829 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: 68bce816d97f927480f7d1d60e82feb886a33c2701000000 Worker ID: 7fd4d7ba9fcf74c6678c693654d42ab308942813233da7a39e19e3be Node ID: bb4cd40237bd4359d3de38ef699a1a63a01bf4b1b2f8fa588182b4a7 Worker IP address: 172.17.0.3 Worker port: 45694 Worker PID: 3140 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer
 due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
      Traceback (most recent call last):
        File "/usr/local/bin/moPepGen", line 8, in <module>
          sys.exit(main())
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 89, in main
          args.func(args)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 351, in call_variant_peptide
          results = ray.get([wrapper_remote.remote(d) for d in dispatches])
        File "/usr/local/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
          return func(*args, **kwargs)
        File "/usr/local/lib/python3.8/site-packages/ray/_private/worker.py", line 2282, in get
          raise value
      ray.exceptions.WorkerCrashedError: The worker died unexpectedly while executing this task. Check python-core-worker-*.log files for more information.
    
    Work dir:
      work/49/ca33f61b5f05b327ad116c54dd8385
    
    Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Not sure what this is, I just resubmitted for now

@lydiayliu lydiayliu added the priority: now Issue to be fixed immediately label Oct 27, 2022
@zhuchcn
Copy link
Member

zhuchcn commented Oct 28, 2022

What's the working directory?

@lydiayliu
Copy link
Collaborator Author

It happened again, this might be a real problem. This time the workdir is
Work dir: /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/CPCGENE/processed/noncanonical-database/call-nonCanonicalPeptide/work/e5/68f9b907df2fb630d7713cf0caa137

@zhuchcn zhuchcn self-assigned this Oct 31, 2022
@zhuchcn
Copy link
Member

zhuchcn commented Nov 1, 2022

This is due to a large alt splice deletion. We changed it in 8d0ebef that nodes of alt splice deletion are no longer treated as a subgraph, so than all variants inside this huge bubble have to be aligned, causing an extreme complexity. Will have to go back to treating it as a subgraph.

@zhuchcn zhuchcn changed the title cpcg unexpected workder death callVariant couldn't finish because of a large alt splice deletion Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: now Issue to be fixed immediately
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants