Skip to content

Index example

Pierre Peterlongo edited this page Oct 27, 2021 · 3 revisions

This example shows how to build a k-mer index from 2 samples, D1 and D2.

data
├── 1.fasta
├── 2.fasta
└── kmtricks.fof
> cat data/kmtricks.fof
D1: data/1.fasta
D2: data/2.fasta

Build Bloom filters

kmtricks pipeline --file ./data/kmtricks.fof \
                  --run-dir ./index_example  \
                  --kmer-size 31             \
                  --mode hash:bft:bin        \
                  --hard-min 2               \
                  --soft-min 3               \
                  --share-min 1              \
                  --bloom-size 100000        \
                  --bf-format howdesbt       \
                  --cpr
  • --hard-min 2 -> All k-mers with an abundance >= 2 are kept.
  • --soft-min 3 -> During merging, a k-mer in a sample with an abundance less than 3 will be kept only if it is solid in other sample.
  • --share-min 1 -> Keep a non-solid k-mer in a sample if it is solid in one other sample.
  • --bloom-size 100000 -> Requested Bloom filter size, final size = ROUND_UP(size/nb_parts, 8) * nb_parts
  • --bf-format howdesbt -> Dump Bloom filters in HowDeSBT format.

If the rescue is not necessary, the parameter --skip-merge can used to save space and time. In this case, hashes are represented by bit-vector from the counting stage.

Build HowDeSBT index

kmtricks index --run-dir ./index_example --howde
  • --howde -> Build a determined brief tree, see kmtricks index for other options.

Query index

kmtricks query --run-dir ./index_example --query query.fasta --threshold 0.8 --sort > results.txt
  • --query query.fasta -> a set of queries.
  • --threshold 0.8 -> 80% of query kmers must be present in a leaf top consider a match.
  • --sort -> sorted results with additonal informations.