Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaff_bwa segmentation fault #18

Open
shannonekj opened this issue Jul 3, 2020 · 3 comments
Open

scaff_bwa segmentation fault #18

shannonekj opened this issue Jul 3, 2020 · 3 comments

Comments

@shannonekj
Copy link

shannonekj commented Jul 3, 2020

Hi there,

I am encountering a segmentation fault when running Scaff10X.

[main] Version: 0.7.17-r1188
[main] CMD: /home/sejoslin/miniconda3/envs/scaffold_10x/bin/Scaff10X/scaff-bin/bwa mem -t 16 tarseq.fastq /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/00-raw_data/data-10X_M/Male2_S63_L004_R1_001.fastq.gz /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/00-raw_data/data-10X_M/Male2_S63_L004_R2_001.fastq.gz
[main] Real time: 140745.891 sec; CPU: 2245179.580 sec
Segmentation fault
Error running command: /home/sejoslin/miniconda3/envs/scaffold_10x/bin/Scaff10X/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out

Here is the stderr file associated with the error:
test_assembly.j23957420.run_scaff10x.hi.err.txt

Here is the stdout file associated with the job:
test_assembly.j23957420.run_scaff10x.hi.out.txt

I have the following lines in the tmp_rununik_10443

-rw-rw-r-- 1 sejoslin millermrgrp    0 Jul  3 06:46 align2.dat
-rw-rw-r-- 1 sejoslin millermrgrp  23G Jul  3 06:46 align.dat
-rw-rw-r-- 1 sejoslin millermrgrp 930M Jul  1 15:31 tarseq.fastq
-rw-rw-r-- 1 sejoslin millermrgrp  257 Jul  1 15:38 tarseq.fastq.amb
-rw-rw-r-- 1 sejoslin millermrgrp 201K Jul  1 15:38 tarseq.fastq.ann
-rw-rw-r-- 1 sejoslin millermrgrp 465M Jul  1 15:38 tarseq.fastq.bwt
-rw-rw-r-- 1 sejoslin millermrgrp 117M Jul  1 15:38 tarseq.fastq.pac
-rw-rw-r-- 1 sejoslin millermrgrp 233M Jul  1 15:40 tarseq.fastq.sa
-rw-rw-r-- 1 sejoslin millermrgrp 187K Jul  1 15:31 tarseq.tag
-rw-rw-r-- 1 sejoslin millermrgrp    0 Jul  1 15:31 try.out

and my align.dat file looks like this:

(base) sejoslin@farm:tmp_rununik_10443$ head align.dat
A00351:291:HVMC5DSXX:4:1101:1307:1000 69 tarseq_648 132029 0
A00351:291:HVMC5DSXX:4:1101:1398:1000 99 tarseq_462 123666 60
A00351:291:HVMC5DSXX:4:1101:1524:1000 81 tarseq_236 424566 0
A00351:291:HVMC5DSXX:4:1101:1597:1000 97 tarseq_874 104495 0
A00351:291:HVMC5DSXX:4:1101:1687:1000 99 tarseq_16 418743 60
A00351:291:HVMC5DSXX:4:1101:1940:1000 97 tarseq_2149 50794 0
A00351:291:HVMC5DSXX:4:1101:2284:1000 99 tarseq_338 173397 60
A00351:291:HVMC5DSXX:4:1101:2302:1000 83 tarseq_302 99246 60
A00351:291:HVMC5DSXX:4:1101:2483:1000 83 tarseq_5085 17812 60
A00351:291:HVMC5DSXX:4:1101:2591:1000 99 tarseq_102 116777 41
(base) sejoslin@farm:tmp_rununik_10443$ tail align.dat
A00351:291:HVMC5DSXX:4:2678:13132:36949 99 tarseq_1383 26746 60
A00351:291:HVMC5DSXX:4:2678:13277:36949 65 tarseq_1446 66935 60
A00351:291:HVMC5DSXX:4:2678:13313:36949 99 tarseq_569 12813 60
A00351:291:HVMC5DSXX:4:2678:13639:36949 83 tarseq_1401 40168 27
A00351:291:HVMC5DSXX:4:2678:13747:36949 99 tarseq_84 92647 60
A00351:291:HVMC5DSXX:4:2678:14091:36949 99 tarseq_445 203003 60
A00351:291:HVMC5DSXX:4:2678:14561:36949 99 tarseq_512 71920 11
A00351:291:HVMC5DSXX:4:2678:14597:36949 83 tarseq_411 18587 60
A00351:291:HVMC5DSXX:4:2678:14868:36949 97 tarseq_2112 1850 10
A00351:291:HVMC5DSXX:4:2678:15157:36949 83 tarseq_613 158418 60

I ran Scaff10X with the following parameters:

#SBATCH -J hi_scf10
#SBATCH -e slurm/test_assembly.j%j.run_scaff10x.hi.err
#SBATCH -o slurm/test_assembly.j%j.run_scaff10x.hi.out
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --mem=480G
#SBATCH --time=06-10:08:07
#SBATCH -p bigmemh

and used the following command to run scaff10x:

${scaf_bin}/scaff10x \
    -nodes ${threads} \
    -align bwa \
    -matrix 2000 \
    -reads 12 \
    -link 10 \
    -plot barcode_length.png \
    ${asm} ${R1} ${R2} ${output}"

I didn't think this step much memory and indeed it fails at the same place if I use a partition with less memory (62000M/node).

Please advise! Thank you for your time :)

@jensbast
Copy link

Hi, I get exactly the same error with exactly the same files in the tmp (with file size 0 for both try.out and align2.dat).
Was there a solution?

Best and thanks

@zning-sanger
Copy link
Member

Many thanks for all your bug reporting efforts. I spent some time on this and sorry to say I haven't found the bug or the problem. VGP used a lot on scaff10x and it seems that they didn't have this issue and also I never had this problem. If any of the users had this problem and also put the data (reads and assembly) in a location for download, I would be really grateful! My problem is that I need to repeat this issue and then fix the bug. For the time being, please use scaffolding_reads to decode the barcodes and then use the paired reads for scaffolding.

Best regards,

Zemin

@shannonekj
Copy link
Author

Hi all,

Apologies for not posting when I got things up and running. I believe I "solved" this (or at least I no longer run into the) problem by supplying the input 10X fastq's through -data input.dat (see format below) and sym-linking the reference genome to be in the working directory .

input.dat file format:

q1=/group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/00-raw_data/data-10X_M/Male2_S63_L004_R1_001.fastq.gz
q2=/group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/00-raw_data/data-10X_M/Male2_S63_L004_R2_001.fastq.gz

Run command:

scaff10x -nodes $((SLURM_CPUS_PER_TASK-2)) -longread 1 -gap 100 -matrix 2000 -reads 10 -link 8 -score 20 -edge 50000 -block 50000 -data input.dat sym-linked.reference.fasta output.scaff10x.fasta

which was run in the following working directory : /group/millermrgrp2/shannon/projects/assembly_genome_Hypomesus-transpacificus/03-assemblies/sandbox_hicanu/

As a note I installed scaff10x with conda.

@zning-sanger I'd be happy to upload the original files I used to a server if you'd like/it would be useful. I haven't tried to recreate the error since it was happening but maybe knowing how I fixed it helps??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants