My (Max's?) Minimal Fasta Toolkit

Minimal, simple fasta tools.

Each program is self-contained in the ./src/fasta directory, and follows similar boilerplate code, related to file handling. So if you feel like contributing and/or adding your own subcommand, please do.

Usage

Typing mmft (shows subcommands) or mmft <subcommand> -h (shows specific subcommand) will show the usage of the tool in question.

Commands are added only as and when I need them. If you like what you see, please feel free to contribute a PR with your favourite subcommand.

Calculations

mmft len <fasta(s)> or cat <fasta(s)> | mmft len. Calculates lengths of each fasta record.
mmft gc <fasta(s)> or cat <fasta(s)> | mmft gc. Calculates GC content of each fasta record.
mmft n50 <fasta(s)> or cat <fasta(s)> | mmft n50. Calculates n50 of a fasta record (or stream of fasta files combined).
mmft num <fasta(s)> or cat <fasta(s)> | mmft num. Calculates number of sequences, and total number of base pairs in the fasta file input(s).
mmft revcomp <fasta(s)> or cat <fasta(s) | mmft revcomp. Reverse complements each record in the fasta file.
mmft min <fasta(s)> or cat <fasta(s) | mmft min. Minimally lexicographically rotated string returned. Takes into account reverse complement too.

File manipulations

mmft regex -r "<regex>" <fasta(s)> or cat <fasta> | mmft regex -r "<regex>". Extracts fasta records from one or multiple fasta files with headers matching the regex.
mmft extract -r 1-100 <fasta(s)> or cat <fasta> | mmft extract -r 1-100. Extracts first 100 nucleotides from each fasta record. You can of course choose any range, using a dash to separate the numbers.
mmft filter -f <file> <fasta(s)>. Supply a text file of one ID per line and filter will extract the corresponding fasta records.
mmft merge <fastas>. Will merge multiple fasta files together into the same record.
mmft sample <fasta(s)> -n <N>. Will randomly sample a fasta file (or stream of fasta files) to a specified number of records.
mmft split (-d <DIR>) -n <N> <fasta(s)>. Splits fasta into equal chunks with the last chunk the remainder if record number not perfectly divisible by chunk number.

Careful when piping into mmft as fasta files are not treated separately, they are treated as a continuum of fasta records. Hence, while mmft n50 1.fasta 2.fasta shows the n50 of each fasta file separately, cat *.fasta | mmft n50 will calculate the n50 of both files combined. In addition, mmft sample loads the entire STDIN into memory, so be careful when piping large files. Some functions don't support piping (filter, merge, sample, split).

All printed to STDOUT.

TODO's

I'll add stuff as and when I have time, or they are of use. Maybe:

Simple pattern matching, returning positions.
Potential ORFs
Any kmer stuff?
Testing
Better documentation

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

LICENSE

LICENSE

README.md

README.md

Repository files navigation

My (Max's?) Minimal Fasta Toolkit

Usage

Calculations

File manipulations

TODO's

About

Releases

Packages

Languages

License

ARU-life-sciences/mmft

Folders and files

Latest commit

History

Repository files navigation

My (Max's?) Minimal Fasta Toolkit

Usage

Calculations

File manipulations

TODO's

About

Resources

License

Stars

Watchers

Forks

Languages