Linked-read sequence simulator
Originally known as XENIA from the VISOR project, Mimick is a simulator for linked-read FASTQ data. Mimick allows you to simulate an arbitrary number of haplotypes, set overall coverage, molecule coverage, and mix-match barcodes with linked-read chemistries.
- 10X
- Haplotagging
- stLFR
- TELLseq
- input chemistry and output FASTQ format
- overall coverage depth
- average molecule length
- molecule coverage / reads per molecule
- molecules per barcode (barcode clashing)
- proportion of singletons (unlinked barcodes)
- standard Illumina read characteristics e.g. read length, insert size, etc.
Other than the fun name and logo, Mimick is an improvement over existing linked-read simulators in multiple ways:
- It's the only simulator (we are aware of) that isn't configured for discontinued-in-2019 10X linked-read chemistry and is instead generalized for existing options, both in terms of data formats and the simulation process itself.
- Circular DNA support. Yay prokaryotes!
- Mimick provides more parameters to tune your simulations for realistic linked-read library simulation in the form of singletons and molecule coverage. These characteristics are very important regarding the performance of a linked-read library.
- As of version 2.0, Mimick uses a barcode-first simulation approach, which allows barcodes to be shared across chromosomes/contigs and haplotypes. This form of barcode sharing is a common phenomenon in real linked-read libraries, but a characteristic existing simulators don't capture (e.g. XENIA only allowed barcode sharing within a chromosome within a haplotype). The documentation explains this in better detail.
- It's parallelized to simulate reads from one molecule per thread, taking full advantage of threads from start to finish, accounting for back-pressure, RAM, and disk usage.
@pdimens (Mimick)
@davidebolo1993 (VISOR)
Note
Why name it "mimick"? Well, this software mimics linked-read data, I have an affinity for naming software after
fictional monsters and "mimick" (with a "k") is the old-English
spelling of the word, leaving mimic
available for some other bioinformatician to use for a less farcical reason. Despite the
lore of mimics being deadly traps, this software is anything but, I promise.