-
Notifications
You must be signed in to change notification settings - Fork 7
Count matrix example
Téo Lemane edited this page May 11, 2022
·
4 revisions
This example shows how to build a k-mer count matrix from 2 samples, D1 and D2.
data
├── 1.fasta
├── 2.fasta
└── kmtricks.fof
> cat data/kmtricks.fof
D1: data/1.fasta
D2: data/2.fasta
kmtricks pipeline --file ./data/kmtricks.fof \
--run-dir ./matrix_example \
--mode kmer:count:bin \
--hard-min 1 \
--soft-min 3 \
--share-min 1 \
--cpr
-
--hard-min 1
-> During partition counting, all k-mers are kept. -
--soft-min 3
-> During merging, a k-mer in a sample with an abundance less than 3 will be kept only if it is solid in other sample. -
--share-min 1
-> Keep a non-solid k-mer in a sample if it is solid in one other sample.
Each sub-matrix can then be processed thanks to kmtricks API or they can be aggregated using kmtricks aggregate:
kmtricks aggregate --matrix kmer --format text --cpr-in > final_matrix.txt # Concatenate sorted partitions
kmtricks aggregate --matrix kmer --format text --cpr-in --sorted > final_matrix.txt # whole matrix is sorted
Sub-matrices can also be directly dumped into text by replacing --mode kmer:count:bin
by --mode kmer:count:text
.