Skip to content

This repository contains all the scripts used to assemble and annotate the Earwig genome

License

Notifications You must be signed in to change notification settings

upendrabhattarai/Earwig_genome_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Earwig_genome_project

This repository contains all the scripts used to assemble and annotate the Earwig genome. The pipeline is presented in three parts:

  1. Genome assembly
  2. Denovo repeat library
  3. Genome annotation

1. Genome assembly

Genome is assembled using linked reads from 10x chromium and long reads from Oxford nanopore. Long and linked reads were individually assembled and then merged together. After multiple iterations of scaffolding, gapclosing, and haplotigs and contaminants removal, assembly was polished with mRNA-seq reads to obtain final assembly. Schematic representation in figure below (Created with BioRender.com).

Alt text

Workflow and Scripts:

  1. Linked read assembly
  2. Long read assembly
  3. Merging two assemblies
  4. Further processing with long reads
  5. Further processing with linked reads
  6. Processing with RNA-seq reads
  7. Final bits: Haplotigs removal, cleaning and polishing

2. Denovo repeat library

A comprehensive denovo repeat library is prepared for the assembled genome. It was used for repeat content analysis, repeat masking and as input for annotation pipeline.

Workflow:

  1. Repeat library preparation
    1. Repeatmoduler
    2. LTRharvest & LTRdigest
    3. TransposonPSI
    4. Sine database
  2. Concatenating, filtering, and classifying repeats
    1. RepeatClassifier
  3. Repeat masking the genome
    1. RepeatMasker

3. Genome annotation

Maker2 pipeline is used for genome annotation. mRNA-seq data is denovo assembled using Trinity. Other relavant publicly available datasets were downloaded and used as input.

  1. Processing mRNA-seq data
  2. GeneMark-ES
  3. Braker
  4. Configuring and running Maker2 pipeline

About

This repository contains all the scripts used to assemble and annotate the Earwig genome

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages