Skip to content

zning-sanger/Scaff10x

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Scaff10x

Genome scaffolding using 10x data Description Scaff10x is a pipeline for genome scaffolding using 10x data. Barcoded tags are extracted from raw sequencing reads and appended to read names for further processing. Alignments are carried out with either BWA or SMALT. Barcodes are sorted together with contigs as well as mapping coordinates. A relation matrix is constructed to record the shared barcodes among the contigs which may be linked. Order and orientation are conducted after nearest neighbours are searched. Say if you have an assembly with 10x reads: genome-assembly.fasta read_1.fastq read_2.fastq (1) Barcode extraction: /tmp/scaff10x/scaff_BC-reads-1 read_1.fastq read-BC_1.fastq read-BC_1.name > try.out /tmp/scaff10x/scaff_BC-reads-2 read-BC_1.name read_2.fastq read-BC_2.fastq > try.out (2) Run Scaff10x Using SMALT: https://sourceforge.net/projects/smalt/ /tmp/scaff10x/scaff10x -nodes 30 -matrix 2000 -reads 8 -edge 50000 -link 6 -block 2500 -align smalt genome-assembly.fasta read-BC_1.fastq read-BC_2.fastq scaffolds.fasta > try.out nodes (30) - number of CPUs requested matrix (2000) - relation matrix size reads (10) - minimum number of reads per barcode edge (50000) - edge length of mapped reads to consider for scaffolding link (8) - minimum number of shared barcodes block (2500) - length to determine for nearest neighbours or BWA /tmp/scaff10x/scaff10x -nodes 30 -matrix 2000 -reads 8 -edge 50000 -link 6 -block 2500 -align bwa genome-assembly.fasta read-BC_1.fastq read-BC_2.fastq scaffolds.fasta > try.out If you have a sam file already, try this /tmp/scaff10x/scaff10x -nodes 30 -matrix 2000 -reads 8 -edge 50000 -link 6 -block 2500 -sam /lustre/scratch117/sciops/team117/hpag/zn1/project/human/10xg/new/human.sam genome-assembly.fasta read-BC_1.fastq read-BC_2.fastq scaffolds.fasta > try.out Note: a. To run scaff10x, you need to give the full path /tmp/scaff10x/scaff10x. b. You also need to give the full path of the sam file. However, the target assembly file and two read files should be in your working directory (full path won't work). c. SMALT is notably slower than BWA. So your first try goes to BWA. d. The block value is very important. The default value of 2500 is very conservative and you may increase this value to say 5000 or 10000 to improve the length of scaffolds. e. The default numbers of "reads" and "link" are based on 30X coverage of the data. You need to increase these values if you have say 60X reads. f. To get the best results, you may run the pipeline twice, here the new genome-assembly.fasta input file as the scaffolded file from last time. (3) Install gunzip scaff10x.tar.gz tar xvf scaff10x.tar make Please contact Zemin Ning ( zn1@sanger.ac.uk ) for any further information.

About

Genome scaffolding using 10x data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published