FASTJ

Structured metadata for your FASTA sequences.

Format

The FASTJ format is a convention for including structured sequence metadata inside a FASTA file while remaining interchangeable with most software that consumes FASTA.

In FASTJ, metadata is stored as a single-line JSON object in the description field of the typical FASTA >id description definition line. FASTA parsers treat everything after the first whitespace in the definition line as the description. (Everything before, minus the > prefix, is the id.)

A simple example of FASTJ:

>specimenA {"date":"2017-05-04", "virus":"flu"}
ATCG…
>specimenB {"date":"2017-05-13", "virus":"flu"}
CGAT…

Missing descriptions should be treated as an empty JSON empty. That is, the following two FASTJ sequence records are equivalent:

>seqA
ATCG
>seqA {}
ATCG

That's all!

Command

The command-line program fastj provides tools to encode and decode FASTJ files and to otherwise work with them.

fastj encode

Converts other formats, such as FASTA with delimited fields in the sequence id or FASTA + TSV/CSV, to FASTJ.

Input is from named files, if given with the input flag, otherwise stdin.

Output is always FASTJ written to stdout.

Examples:

`fastj encode --fasta file.fasta --delimiter="|" --fields virus date id`

file.fasta

>flu|2017-05-04|specimenA
ATCG…
>flu|2017-05-13|specimenB
CGAT…

output

>specimenA {"date":"2017-05-04", "virus":"flu"}
ATCG…
>specimenB {"date":"2017-05-13", "virus":"flu"}
CGAT…

`fastj encode --fasta file.fasta --metadata file.tsv`

file.fasta

>specimenA
ATCG…
>specimenB
CGAT…

file.tsv

id,virus,date
specimenA,flu,2017-05-04
specimenB,flu,2017-05-13

output

>specimenA {"date":"2017-05-04", "virus":"flu"}
ATCG…
>specimenB {"date":"2017-05-13", "virus":"flu"}
CGAT…

`fastj encode --json file.json`

file.json (output from fastj decode)

[{ "id": "specimenA", "sequence": "ATCG…", "date": "2017-05-04", "virus": "flu" }
,{ "id": "specimenB", "sequence": "CGAT…", "date": "2017-05-13", "virus": "flu" }
]

output

>specimenA {"date":"2017-05-04", "virus":"flu"}
ATCG…
>specimenB {"date":"2017-05-13", "virus":"flu"}
CGAT…

fastj decode

Converts FASTJ sequences to another format.

Input is from the listed files, if any, otherwise stdin.

Output defaults to JSON. Supported output formats are:

json: The top-level value will always be an array, even if there is only one sequence record.
fasta: Plain FASTA with delimited sequence ids constructed from the FASTJ fields.

Examples:

file.fastj for all examples

>specimenA {"date":"2017-05-04", "virus":"flu"}
ATCG…
>specimenB {"date":"2017-05-13", "virus":"flu"}
CGAT…

`fastj decode [file.fastj [file2.fastj […]]]`

[{ "id": "specimenA", "sequence": "ATCG…", "date": "2017-05-04", "virus": "flu" }
,{ "id": "specimenB", "sequence": "CGAT…", "date": "2017-05-13", "virus": "flu" }
]

`fastj decode --output=fasta --fields virus date id -- [file.fastj [file2.fastj […]]]`

>flu|2017-05-04|specimenA
ATCG…
>flu|2017-05-13|specimenB
CGAT…

`fastj decode --output=fasta --delimiter=/ --fields virus date id -- [file.fastj [file2.fastj […]]]`

>flu/2017-05-04/specimenA
ATCG…
>flu/2017-05-13/specimenB
CGAT…

fastj index

Tentative.

fastj search

Tentative.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FASTJ

Format

Command

fastj encode

`fastj encode --fasta file.fasta --delimiter="|" --fields virus date id`

`fastj encode --fasta file.fasta --metadata file.tsv`

`fastj encode --json file.json`

fastj decode

`fastj decode [file.fastj [file2.fastj […]]]`

`fastj decode --output=fasta --fields virus date id -- [file.fastj [file2.fastj […]]]`

`fastj decode --output=fasta --delimiter=/ --fields virus date id -- [file.fastj [file2.fastj […]]]`

fastj index

fastj search

About

Releases

Packages

tsibley/fastj

Folders and files

Latest commit

History

Repository files navigation

FASTJ

Format

Command

fastj encode

fastj encode --fasta file.fasta --delimiter="|" --fields virus date id

fastj encode --fasta file.fasta --metadata file.tsv

fastj encode --json file.json

fastj decode

fastj decode [file.fastj [file2.fastj […]]]

fastj decode --output=fasta --fields virus date id -- [file.fastj [file2.fastj […]]]

fastj decode --output=fasta --delimiter=/ --fields virus date id -- [file.fastj [file2.fastj […]]]

fastj index

fastj search

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

`fastj encode --fasta file.fasta --delimiter="|" --fields virus date id`

`fastj encode --fasta file.fasta --metadata file.tsv`

`fastj encode --json file.json`

`fastj decode [file.fastj [file2.fastj […]]]`

`fastj decode --output=fasta --fields virus date id -- [file.fastj [file2.fastj […]]]`

`fastj decode --output=fasta --delimiter=/ --fields virus date id -- [file.fastj [file2.fastj […]]]`

Packages