Skip to content

yszhou2016/Cas13

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project is about mining Cas13 proteins from metagenomic data and RNA off-target detection of Cas13/Cas13-ADAR2dd.

a. Identify the Cas13 proteins from metagenomic samples.

a.1. Install the “prodigal.linux” , "bedtools" and “pilercr” in the default environment. bedtools: https://sourceforge.net/projects/bedtools/ pilecr: http://www.drive5.com/pilercr/

a.2. run “perl 0.Cas-Finder $sample.fasta” to obtain the Cas proteins and generate "$sample.pep.cas.fasta" file.

a.3. run " perl 1.Cas13-Finder.pl $sample.pep.cas.fasta" to otain the Cas13 proteins and generate "$sample.pep.cas.RxxxxH.fa" file.

a.4. Multiple alignment of Cas13 proteins with mafft.

mafft --maxiterate 1000 --thread 12 --globalpair Cas13.fa > Cas13.mafft.fasta

b. RNAseq off-Target analysis of Cas13.

b.1. align the RNAseq to reference genome

hisat2 -p 18 --dta-cufflinks -q -x GRCh38.genome.fa -1 {$sample}_1.clean.fq.gz -2 {$sample}_2.clean.fq.gz -S $sample.sam

samtools view -Su -q 30 $sample.sam | samtools sort -@ 18 - > $sample.sorted.bam

samtools index $sample.sorted.bam

b.2. Calculate the read counts.

htseq-count -f bam $sample.sorted.bam GRCh38.gtf > $sample.HTSeq.out

b.3. Convert the read counts into FPKM values.

perl HTSeq2FPKM.pl $sample.HTSeq.out GRCh38.gtf

b.4. Calculate differently expressed gene using DEseq2

perl ~/path/trinityrnaseq-Trinity-v2.8.5/Analysis/DifferentialExpression/run_DE_analysis.pl --matrix gene_count_matrix.csv.xls --method DESeq2 --output DESeq2_out --

samples_file sample.txt --contrasts contrast.txt --dispersion 0.04

perl 4.get_DEG-DESeq2.pl gene_count_matrix.csv.xls.Control_vs_Treat.DESeq2.DE_results

b.5. Predict the off-target site of Hg38 genome and transcriptome of the spacers with no more than eight mismatches.

perl Mismatch-search.pl Spacer.fa Hg38.genome.fa 8

perl OffTarget_gene.pl Spacer.Mismatch.fa Hg38.gtf

c. RNAseq off-target analysis of RNA base editor.

c.1. align the RNAseq to reference genome

hisat2 -p 18 --dta-cufflinks -q -x GRCh38.genome.fa -1 $sample_1.clean.fq.gz -2 $sample_2.clean.fq.gz -S $sample.sam

samtools view -Su $sample.sam | samtools sort -@ 18 - > $sample.sorted.bam

c.2. Install the REDItool fom "https://sourceforge.net/projects/reditools/" .

python ~/software/anaconda3/envs/py2/bin/REDItoolDenovo.py -o $sample.REDtools -i $sample.sorted.bam -f GRCh38.genome.fa -t 24 -e -d -l -U [AG,TC,CT,GA] -p -u -m60 -T5-5 -W -v 1 -n 0.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published