iCLIP and miCLIP mapping pipeline to account for repetitive ncRNAs, especially snRNA, rRNA and tRNA. Currently available as a Snakemake pipeline. You need demultiplexed CLIP fastq files to begin - the code takes care of the rest.
- Clone the repo into your local directory.
git clone https://github.com/ulelab/ncawareclip.git
- Move into the newly downloaded directory, and create a conda environment with all the dependencies you will require.
cd ncawareclip
conda env create -f environment.yml
- Run the test data to make sure everything is working.
conda activate ncawareclip
- Create the full annotation/sequence databases you will need, and get your configuration file started using the handy helper tool. Supported genomes are currently: "Hs", - human. Note: you only need to run this once for each species you analyse. Note2: in this pipeline all annotation origins and processing code is all in this repo, so certain files could be replaced if you wanted.
For example, to create all the annotation indexes you'll need for human mapping, run:
cd Snakemake/prepare-annotation
snakemake --configfile species-specific-configs/Hs_config.yaml
To use the SLURM cluster settings and submit your jobs to a SLURM compute cluster, use extra flags in your snakemake command:
snakemake --keep-going --cluster 'sbatch {params.cluster}' --jobs 200 --latency-wait 60 --rerun-incomplete --configfile species-specific-configs/Hs_config.yaml
-
Edit
config.yaml
to provide paths to your demultiplexed fastq files and path where you are running the pipeline. For UMI removal make sure that the random barcode is moved to the fastq header as "rbc:NNNNN". I would highly recommend the speedy demultiplexer Ultraplex for completely unbiased reasons. -
Run the Snakemake pipeline.
For any questions please feel free to raise a GitHub issue or contact me at charlotte.capitanchik@crick.ac.uk.