Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck at Recombination hit list... #3

Closed
GRGong opened this issue Apr 28, 2023 · 3 comments
Closed

Stuck at Recombination hit list... #3

GRGong opened this issue Apr 28, 2023 · 3 comments

Comments

@GRGong
Copy link

GRGong commented Apr 28, 2023

Thanks for the magic tool! However, I met an issue when I use this tool for some genome assemblies. The tool stucked at "Recombination hit list..." stage. Please help me to figure it out. Here is a example log:

Internal pipeline statistics summary:

Query model(s): 1 (933 nodes)
Target sequences: 10042 (1643500208 residues searched)
Residues passing SSV filter: 230121808 (0.14); expected (0.02)
Residues passing bias filter: 137319152 (0.0836); expected (0.02)
Residues passing Vit filter: 23211691 (0.0141); expected (0.003)
Residues passing Fwd filter: 1005657 (0.000612); expected (3e-05)
Total number of hits: 120 (5.89e-05)

CPU time: 284.11u 0.93s 00:04:45.04 Elapsed: 00:00:03.61

Mc/sec: 424531.06

//
[ok]
284.17user 1.04system 0:03.78elapsed 7527%CPU (0avgtext+0avgdata 1324568maxresident)k
0inputs+64outputs (0major+424205minor)pagefaults 0swaps
2023-04-28 17:20:24,357 Functions.py:INFO run_nhmmer use 3.792 seconds

NHMMER program is complete.

2023-04-28 17:20:24,357 nhmmer.py:INFO ###Program nhmmer.py finish###
2023-04-28 17:20:24,407 Functions.py:INFO ###FindOR program starts running###
2023-04-28 17:20:24,408 Functions.py:INFO function platform_info() is running
The system you use is Linux.
Python3 is used.
2023-04-28 17:20:24,411 Functions.py:INFO platform_info use 0.003 seconds
Process nhmmer output file...
2023-04-28 17:20:24,411 Functions.py:INFO function proc_nhmmer_out() is running
2023-04-28 17:20:24,411 Functions.py:INFO 2 truncated gene(s) was discovered
2023-04-28 17:20:24,412 Functions.py:INFO proc_nhmmer_out use 0.001 seconds
Extract cds from genomic file...
2023-04-28 17:20:24,412 Functions.py:INFO function extract_cds() is running
2023-04-28 17:20:26,673 Functions.py:INFO extract_cds use 2.262 seconds
Find ATG and STOP codons for each sequence...
2023-04-28 17:20:26,673 Functions.py:INFO function find_cds() is running
2023-04-28 17:20:26,779 Functions.py:INFO find_cds use 0.105 seconds
Write data to file...
2023-04-28 17:20:26,786 Functions.py:INFO Merge pseudogene fragement
2023-04-28 17:20:26,787 Functions.py:INFO ###The result as follows###
2023-04-28 17:20:26,787 Functions.py:INFO Clarias_batrachus processing completed
2023-04-28 17:20:26,787 Functions.py:INFO 118 OR fragments found by nhmmer.
2023-04-28 17:20:26,787 Functions.py:INFO 91 potential functional ORs were discover.
2023-04-28 17:20:26,787 Functions.py:INFO -48 pseudogenes fragment were merged
2023-04-28 17:20:26,787 Functions.py:INFO 66 potential pseudogene ORs were discover.
2023-04-28 17:20:26,787 Functions.py:INFO 2 pseudogenes cause by too short sequence length.
2023-04-28 17:20:26,787 Functions.py:INFO 1 pseudogenes cause by insert or delect base.
2023-04-28 17:20:26,787 Functions.py:INFO 63 pseudogenes cause by contains termination codons.
2023-04-28 17:20:26,787 Functions.py:INFO ###The program finish###
2023-04-28 17:20:26,787 FindOR.py:INFO ###Program FindOR.py finish###
2023-04-28 17:20:26,831 Functions.py:INFO ###IdentityFunc program starts running###
2023-04-28 17:20:26,831 Functions.py:INFO function platform_info() is running
The system you use is Linux.
Python3 is used.
2023-04-28 17:20:26,835 Functions.py:INFO platform_info use 0.003 seconds
Process hit sequence file...
2023-04-28 17:20:26,835 Functions.py:INFO function refact_hitfile() is running
2023-04-28 17:20:26,835 Functions.py:INFO refact_hitfile use 0.000 seconds
Recombination hit list...
2023-04-28 17:20:26,835 Functions.py:INFO function refact_list() is running

@ToHanwei
Copy link
Owner

Thank you for your feedback.
Based on your log file, I have located the function that may be causing the problem.
https://github.com/ToHanwei/Genome2OR/blob/639aff37aa7c8a8cc04f5f6ce0ea4590d7346530/scripts/src/Functions.py#L761
The main function of this step is to merge the hit sequence with the template sequence and perform multiple sequence alignment. In order to speed up this process, I use the multiprocessing module to do pseudo-parallel processing. In your case, the process may be blocked for some unknown reason.

@ToHanwei
Copy link
Owner

I have solved this problem in the new version.

@GRGong
Copy link
Author

GRGong commented May 3, 2023

Thank you very much! It did work.

@GRGong GRGong closed this as completed May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants