Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeHiC Bugs current #6

Open
Ruiqing-Wang opened this issue Jun 14, 2022 · 2 comments
Open

FreeHiC Bugs current #6

Ruiqing-Wang opened this issue Jun 14, 2022 · 2 comments

Comments

@Ruiqing-Wang
Copy link

I have encountered some bugs and failed to run them on several machines. The first problem is the dynamic library, but this is not the main problem. You can see the difficulties encountered in running demo during simulation.

projDir="/home/wrq/software/FreeHiC"
#fastqFile="ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1215nnn/GSM1215593/suppl/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2" #"${projDir}/data/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2"
fastqFile="${projDir}/demoData/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2"
ref="${projDir}/data/PlasmoDB-9.0_Pfalciparum3D7_Genome.fasta"
refrag="${projDir}/data/MboI_resfrag_plasmodium.bed"
simuName="results"
outDir="${projDir}/results"
summaryFile="${projDir}/results/summary/${simuName}_FreeHiC.summary"

bwa="bwa"
samtools="samtools"
bedtools="bedtools"

train=0
simulate=1
postProcess=1
coreN=30
mismatchN=3
gapN=1
mismatchP=""
gapP=""
chimericP=""
simuN=2400000
readLen=40
resolution=10000
lowerBound=$((resolution*2))
refragU=500
ligateSite="GATCGATC"
##################

*****************************************

II. Simulating interaction read sequences

*****************************************

II. Simulating!
RE fragment = /home/wrq/software/FreeHiC/data/MboI_resfrag_plasmodium.bed
Fragment interaction file = /home/wrq/software/FreeHiC/results/rawDataTraining/s2_validPairs/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2.validPairs.fragFreq
Output directory = /home/wrq/software/FreeHiC/results/results/simuSequence
Output file name = GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2
Total number of interactions to simulate = 2400000
Read length to simulate = 40
Maximum distance away from restriction site = 500
Path to bedtools = bedtools
Path to reference genome fasta file = /home/wrq/software/FreeHiC/data/PlasmoDB-9.0_Pfalciparum3D7_Genome.fasta
Path to summary file = /home/wrq/software/FreeHiC/results/summary/results_FreeHiC.summary
Verbose = True
Read in restriction fragment dictionary!
Read in interaction frequency!
Convert dictionary and list into C structure!
Sample fragment interaction!
Read in trained parameters!
10129112
777700
51566
28458
Simulate end 1!
0
500000
1000000
1500000
2000000
Get fasta from the reference genome!
Add mutation!
Traceback (most recent call last):
File "scripts/freeHiC_cython.pyx", line 450, in freeHiC_cython.simulation
rid = int(ridTmp)
ValueError: invalid literal for int() with base 10: '23::chr7:164903-164943(-)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "scripts/freeHiC_cython.pyx", line 452, in freeHiC_cython.simulation
rid = int(ridTmp[0:-3])
ValueError: invalid literal for int() with base 10: '23::chr7:164903-164943'
Exception ignored in: 'freeHiC_cython.simulation'
Traceback (most recent call last):
File "scripts/freeHiC_cython.pyx", line 452, in freeHiC_cython.simulation
rid = int(ridTmp[0:-3])
ValueError: invalid literal for int() with base 10: '23::chr7:164903-164943'
II. Simulation Post-processing: 1 - Alignment!
(We assume the reads length are the same within the same fastq file.)......
Start alignment of read end 1
Step1.1 - BWA alignment
[main] Version: 0.7.17-r1188
[main] CMD: bwa samse /home/wrq/software/FreeHiC/data/PlasmoDB-9.0_Pfalciparum3D7_Genome.fasta /home/wrq/software/FreeHiC/results/results/simuProcess/s1_alignment/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_1.sai /home/wrq/software/FreeHiC/results/results/simuSequence/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_1.fastq
[main] Real time: 13.558 sec; CPU: 12.875 sec
Choose not to rescue chimeric reads or the chimeric reads have been trimmed before alignment!
Start alignment of read end 2
cat: /home/wrq/software/FreeHiC/results/results/simuSequence/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_2.fastq: No such file or directory
Step1.1 - BWA alignment
[bwa_seq_open] fail to open file '/home/wrq/software/FreeHiC/results/results/simuSequence/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_2.fastq' : No such file or directory
[fread] Unexpected end of file
Choose not to rescue chimeric reads or the chimeric reads have been trimmed before alignment!
[bam_sort_core] merging from 0 files and 30 in-memory blocks...
II. Simulation Post-processing: 2 - Joining read-ends!
Read 1 file = /home/wrq/software/FreeHiC/results/results/simuProcess/s1_alignment/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_1.sam
Read 2 file = /home/wrq/software/FreeHiC/results/results/simuProcess/s1_alignment/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_2.sam
Output file = /home/wrq/software/FreeHiC/results/results/simuProcess/s1_alignment/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2.sam
Summary of alignment results = True
Summary file path = /home/wrq/software/FreeHiC/results/summary/results_FreeHiC.summary
Training = 0
Verbose = True
------------Begin joining ends from read 1 file and read 2 file------------
Traceback (most recent call last):
File "/home/wrq/software/FreeHiC/scripts/joinEnds.py", line 165, in
with pysam.Samfile(args.readEnd1, "rb") as readEnd1, pysam.Samfile(args.readEnd2, "rb") as readEnd2:
File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
II. Simulation Post-processing: 3 - Categorize read-pairs!
Paired-ended read alignmeng file does not exist!
II. Simulation Post-processing: 4 - Binning!
awk: fatal: cannot open file `/home/wrq/software/FreeHiC/results/results/simuProcess/s2_validPairs/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2.validPairs' for reading (No such file or directory)

@yezhengSTAT
Copy link
Owner

Hello Ruiqing,
Apologize for the delayed rely! I think the issue of
Traceback (most recent call last): File "scripts/freeHiC_cython.pyx", line 450, in freeHiC_cython.simulation rid = int(ridTmp) ValueError: invalid literal for int() with base 10: '23::chr7:164903-164943(-)'
is due to line 430 in freeHiC_cython.pyx:

call(bedtools + ' getfasta -fi ' + genome + ' -bed ' + outdir + '/' + fileName + '.readPosMut1 -name -s -tab -fo ' + outdir + "/" + fileName + '.readSeqMut1', shell = True)

May I know which version of bedtools did you use? bedtools seems to updated its getfasta functions several times. Therefore, we recommended using an older version (<=2.25.0) for stability.

Hope it can address your problem.

Best,
Ye

@Ruiqing-Wang
Copy link
Author

When I tried to modify code, we found more error. Thanks, It could work now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants