Analysis tools for split-seq

Requirements

Requires python 3.

Additional software needed:

To install all dependencies, try running install_dependencies.sh, which installs dependencies to ~/split_seq_reqs/.

To install the package: run pip install -e . (might need sudo).

Generating a reference genome

Download human reference genome

wget ftp://ftp.ensembl.org/pub/release-93/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

Download human reference gtf file:

wget ftp://ftp.ensembl.org/pub/release-93/gtf/homo_sapiens/Homo_sapiens.GRCh38.93.gtf.gz
gunzip Homo_sapiens.GRCh38.93.gtf.gz

Download mouse reference genome

wget ftp://ftp.ensembl.org/pub/release-93/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

Download mouse reference gtf file:

wget ftp://ftp.ensembl.org/pub/release-93/gtf/mus_musculus/Mus_musculus.GRCm38.93.gtf.gz
gunzip Mus_musculus.GRCm38.93.gtf.gz

Generate split-seq reference:

split-seq mkref --genome hg38 mm10 \
                --fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa Mus_musculus.GRCm38.dna.primary_assembly.fa \
                --genes Homo_sapiens.GRCh38.93.gtf Mus_musculus.GRCm38.93.gtf 
                --output_dir <ref_path>/hg38_mm10/ 
                --nthreads 16

Running the pipeline

To see all options, run split-seq -h.

split-seq all --fq1 input_R1.fastq.gz \
              --fq2 input_R2.fastq.gz \
              --output_dir <output_dir> \
              --chemistry v2 \
              --genome_dir <path_to_ref>/hg38_mm10/ \
              --nthreads 16 \
              --sample sample_name1 A1:B6 \
              --sample sample_name2 A7:B12 \
              --sample sample_name3 C1:D6 \
              --sample sample_name4 C7:D12

Merging Sublibraries into a Single Matrix

split-seq combine --output_dir <output_dir> \
                  --sublibraries <path_to_sublibrary1> <path_to_sublibrary2> ...
                  --chemistry v2
                  --genome_dir <path_to_genome_dir>
                  --sample sample_name1 <wells>

Outputs

Running split-seq all with --output_dir <output_dir> generates three output folders: <output_dir>, <output_dir>DGE_filtered, and <output_dir>DGE_unfiltered

The first folder contains the read mappings and read assignments. Some important files:

read_assignments.csv - this is a table that contains the gene assignment for every cell barcode-UMI combination.
single_cells_barcoded_head.fastq.gz - this contains all reads in read1 that have a valid cell barcode. Each read is labeled with its barcode-UMI combination.
single_cells_barcoded_headAligned.sorted.bam - this file contains the alignment to the genome for all reads in single_cells_barcoded_head.fastq.gz.

The DGE_filtered and DGE_unfiltered folders contain digital gene expression matrices. In DGE_filtered, the cells are filtered by a minimum read threshold, and only cells pasing that threshold are included.

In these two folders, DGE.mtx is a sparse matrix (Matrix Market format) of shape cells by genes that contains the gene expression of every gene for each cell. genes.csv contains the name of each gene, where the index is the same as in DGE.mtx.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
__pycache__		__pycache__
split_seq.egg-info		split_seq.egg-info
split_seq		split_seq
split_seq_pipeline		split_seq_pipeline
src		src
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
install_dependencies.sh		install_dependencies.sh
process_single_cell_mm10_20170514.sh		process_single_cell_mm10_20170514.sh
setup.py		setup.py
split-seq		split-seq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis tools for split-seq

Requirements

Generating a reference genome

Running the pipeline

Merging Sublibraries into a Single Matrix

Outputs

References

About

Releases

Packages

Contributors 4

Languages

License

yjzhang/split-seq-pipeline

Folders and files

Latest commit

History

Repository files navigation

Analysis tools for split-seq

Requirements

Generating a reference genome

Running the pipeline

Merging Sublibraries into a Single Matrix

Outputs

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages