This repository contains all the scripts used to assemble and annotate the Earwig genome. The pipeline is presented in three parts:
- Genome assembly
- Denovo repeat library
- Genome annotation
Genome is assembled using linked reads from 10x chromium and long reads from Oxford nanopore. Long and linked reads were individually assembled and then merged together. After multiple iterations of scaffolding, gapclosing, and haplotigs and contaminants removal, assembly was polished with mRNA-seq reads to obtain final assembly. Schematic representation in figure below (Created with BioRender.com).
Workflow and Scripts:
- Linked read assembly
- Long read assembly
- Merging two assemblies
- Further processing with long reads
- Further processing with linked reads
- Processing with RNA-seq reads
- Final bits: Haplotigs removal, cleaning and polishing
A comprehensive denovo repeat library is prepared for the assembled genome. It was used for repeat content analysis, repeat masking and as input for annotation pipeline.
Workflow:
- Repeat library preparation
- Concatenating, filtering, and classifying repeats
- Repeat masking the genome
Maker2 pipeline is used for genome annotation. mRNA-seq data is denovo assembled using Trinity. Other relavant publicly available datasets were downloaded and used as input.