Skip to content

Count matrix example

Téo Lemane edited this page May 11, 2022 · 4 revisions

This example shows how to build a k-mer count matrix from 2 samples, D1 and D2.

data
├── 1.fasta
├── 2.fasta
└── kmtricks.fof
> cat data/kmtricks.fof
D1: data/1.fasta
D2: data/2.fasta

Build matrix

kmtricks pipeline --file ./data/kmtricks.fof \
                  --run-dir ./matrix_example \
                  --mode kmer:count:bin      \
                  --hard-min 1    \
                  --soft-min 3    \
                  --share-min 1                \
                  --cpr
  • --hard-min 1 -> During partition counting, all k-mers are kept.
  • --soft-min 3 -> During merging, a k-mer in a sample with an abundance less than 3 will be kept only if it is solid in other sample.
  • --share-min 1 -> Keep a non-solid k-mer in a sample if it is solid in one other sample.

Exploit matrix

Each sub-matrix can then be processed thanks to kmtricks API or they can be aggregated using kmtricks aggregate:

kmtricks aggregate --matrix kmer --format text --cpr-in > final_matrix.txt # Concatenate sorted partitions
kmtricks aggregate --matrix kmer --format text --cpr-in --sorted > final_matrix.txt # whole matrix is sorted

Sub-matrices can also be directly dumped into text by replacing --mode kmer:count:bin by --mode kmer:count:text.