-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ex-294 (jebene) moved each command desc into a separate .rst file,
modified toctree, edited other .rst files
- Loading branch information
Showing
8 changed files
with
408 additions
and
406 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
.. _expand-command: | ||
|
||
Expand | ||
====== | ||
The expand command explodes a VCF file into a tab-separated file. It is not | ||
caller-dependent and will work with any VCF file. | ||
|
||
.. figure:: images/expand_columns.jpg | ||
|
||
**Expanding Columns :** *The INFO column and sample-specific FORMAT tags from | ||
the input VCF file are separated into distinct columns in the output file.* | ||
|
||
Usage | ||
----- | ||
``usage: jacquard expand <input_file> <output_file> [OPTIONS]`` | ||
|
||
|
||
*positional arguments:* | ||
|
||
+--------+---------------------------------------------------------------------+ | ||
| input | | A VCF file. Other file types ignored | | ||
+--------+---------------------------------------------------------------------+ | ||
| output | | A TXT file | | ||
+--------+---------------------------------------------------------------------+ | ||
|
||
|
||
*optional arguments:* | ||
|
||
+----------------------------------+-------------------------------------------+ | ||
| -s, --selected_columns_file FILE | | File containing an ordered list of | | ||
| | column names to be included | | ||
| | | in the output file; column names can | | ||
| | include regular expressions | | ||
+----------------------------------+-------------------------------------------+ | ||
|
||
Description | ||
----------- | ||
The expand command converts a VCF file into a tab-delimited file in a tabular | ||
format. This format is more suitable than a VCF for analysis and visualization | ||
in R, Pandas, Excel, or another third-party application. | ||
|
||
.. figure:: images/expand_tabular.jpg | ||
|
||
**Tabular Format of Jacquard Output :** *Jacquard transforms the dense VCF | ||
format into a tabular format.* | ||
|
||
The 'fixed' fields (i.e. CHROM, POS, ID, REF, ALT, QUAL, FILTER) are directly | ||
copied from the input VCF file. Based on the metaheaders, each field in the | ||
INFO column is expanded into a separate column named after its tag ID. Also, | ||
based on the metaheaders, each FORMAT tag is expanded into a set of columns, | ||
one for each sample, named as <FORMAT tag ID>|<sample column name>. By default, | ||
all INFO fields and FORMAT tags are expanded; specific INFO fields and FORMAT | ||
tags can be selected using a flag. | ||
|
||
This command also emits a tab-delimited glossary file, created based on the | ||
metaheaders in the input VCF file. FORMAT and INFO tag IDs are listed in the | ||
glossary and are defined by their metaheader description. | ||
|
||
.. figure:: images/expand_excel.jpg | ||
|
||
**Pattern Identification :** *The expanded output file can be visualized in a | ||
third-party tool to identify patterns in the dataset.* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
.. _merge-command: | ||
|
||
Merge | ||
===== | ||
The merge command integrates a directory of VCFs into a single VCF. It is | ||
caller-agnostic and can be used on any set of VCF files. | ||
|
||
.. figure:: images/merge_join_step.jpg | ||
|
||
**The Merging Process :** *Sample-specific information is grouped together | ||
for each patient.* | ||
|
||
Usage | ||
----- | ||
``usage: jacquard merge <input_dir> <output_file> [OPTIONS]`` | ||
|
||
|
||
*positional arguments:* | ||
|
||
+--------+---------------------------------------------------------------------+ | ||
| input | | Directory containing VCF files. Other file types ignored | | ||
+--------+---------------------------------------------------------------------+ | ||
| output | | A single VCF file | | ||
+--------+---------------------------------------------------------------------+ | ||
|
||
|
||
*optional arguments:* | ||
|
||
+-----------------------+------------------------------------------------------+ | ||
| --include_format_tags | | Comma-separated user-defined list of regular | | ||
| | expressions for format tags | | ||
| | | to be included in output. | | ||
+-----------------------+------------------------------------------------------+ | ||
| --include_cells | | valid: Only include valid variants | | ||
| | | all: Include all variants | | ||
| | | passed: Only include variants which passed their | | ||
| | respective filter | | ||
| | | somatic: Only include somatic variants | | ||
+-----------------------+------------------------------------------------------+ | ||
| --include_rows | | at_least_one_somatic: Include all variants at | | ||
| | loci where at least one | | ||
| | variant | | ||
| | | was somatic | | ||
| | | all_somatic: Include all variants at loci where | | ||
| | all variants were somatic | | ||
| | | at_least_one_passed: Include all variants at loci | | ||
| | where at least one variant | | ||
| | | passed | | ||
| | | all_passed: Include all variants at loci where | | ||
| | all variants passed | | ||
| | | all: Include all variants at loci | | ||
+-----------------------+------------------------------------------------------+ | ||
|
||
Description | ||
----------- | ||
Conceptually, merge has four basic steps, each described in detail below: | ||
#. Integrate matching loci from different VCFs into common rows | ||
#. Combine matching samples from different VCFs into common columns | ||
#. Filter tag values and rows | ||
#. Assemble the subset of FORMAT tags to be included in the final VCF | ||
|
||
Integrate matching loci | ||
^^^^^^^^^^^^^^^^^^^^^^^ | ||
Jacquard first develops the superset of all loci (CHROM, POS, REF, and ALT) | ||
across the set of all input VCFs. For each locus, the input VCF FORMAT tags and | ||
values are merged into a single row. Input variant record-level fields (such as | ||
FILTER, INFO, etc.) are ignored. | ||
|
||
MERGE_LOCI_IMAGE_HERE | ||
|
||
|
||
Combine matching samples | ||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||
In the input directory, an individual sample could be called by more than one | ||
variant caller. When merging, Jacquard combines results from the same sample | ||
into a single column. Merged sample names are constructed by concatenating the | ||
filename prefix and the VCF column header. | ||
|
||
+--------------------+-----------------------------------+---------------------+ | ||
| Filename | VCF Column header | Merged sample names | | ||
+--------------------+-----------------------------------+---------------------+ | ||
| case_A.strelka.vcf | #CHROM ... FORMAT SAMPLE1 SAMPLE2 | | case_A:SAMPLE1 | | ||
| | | | case_A:SAMPLE2 | | ||
+--------------------+-----------------------------------+---------------------+ | ||
| case_A.mutect.vcf | #CHROM ... FORMAT SAMPLE1 SAMPLE2 | | case_A:SAMPLE1 | | ||
| | | | case_A:SAMPLE2 | | ||
+--------------------+-----------------------------------+---------------------+ | ||
| case_B.strelka.vcf | #CHROM ... FORMAT SAMPLE3 SAMPLE4 | | case_B:SAMPLE3 | | ||
| | | | case_A:SAMPLE4 | | ||
+--------------------+-----------------------------------+---------------------+ | ||
| case_B.mutect.vcf | #CHROM ... FORMAT SAMPLE3 SAMPLE4 | | case_B:SAMPLE3 | | ||
| | | | case_A:SAMPLE4 | | ||
+--------------------+-----------------------------------+---------------------+ | ||
|
||
Given the input VCFs above, the resulting merged VCF will have four sample | ||
columns: | ||
case_A|SAMPLE1, case_A|SAMPLE2, case_B|SAMPLE1, case_B|SAMPLE2. | ||
|
||
|
||
Filter tag values and rows | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
By default, merge contains only Jacquard-translated format tags (JQ\_\.*) and | ||
includes all variants with valid syntax at loci where at least one variant was | ||
somatic. The resulting filtered files contain fewer rows, yet higher quality | ||
data than the input files. | ||
|
||
Though most variant callers have their own distinct set of format tags, some | ||
tag names are common across multiple callers. If there are any format tag name | ||
collisions, merge will add a prefix (e.g. JQ1_<original_tag>) in order to | ||
disambiguate the format tags. | ||
|
||
|
||
.. figure:: images/merge_filter_step.jpg | ||
|
||
**The Filtering Process :** *Rows and specific cells in the VCF files are | ||
filtered based on the command-line options.* | ||
|
||
After filtering, the merge command combines all of the input VCFs into a single, | ||
merged VCF that includes all necessary information for continuing your analysis. | ||
|
||
The resulting VCF files contain the distinct set of all coordinates (CHROM, POS, | ||
REF, and ALT) and samples from the input files, provided they pass the filters. | ||
Each coordinate from the input VCF files is added to the output file, which | ||
increases the file length. Additionally, sample columns are merged for each | ||
patient, adding sample specific information and leading to increased column and | ||
file width. | ||
|
||
.. note:: Importantly, rather than giving caller-wise sample columns in the | ||
output VCF file, merge emits patient-wise sample columns. For each | ||
patient, the merge command joins the set of corresponding sample | ||
columns into a single column. The grouping of sample-specific | ||
information for each patient helps to easily analyze the data. | ||
|
||
|
||
Assemble the subset of FORMAT tags | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
TODO | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
.. _summarize-command: | ||
|
||
Summarize | ||
========= | ||
The summarize command adds new INFO fields and FORMAT tags that combine variant | ||
data from the merged VCF. It will only work with VCF files that have been | ||
translated. | ||
|
||
.. figure:: images/summarize.jpg | ||
|
||
**Summarizing Format Tags :** *The Jacquard-translated format tags from | ||
each caller are aggregated and processed together to create consensus format | ||
tags.* | ||
|
||
Usage | ||
----- | ||
``usage: jacquard summarize <input_file> <output_file>`` | ||
|
||
|
||
*positional arguments:* | ||
|
||
+--------+---------------------------------------------------------------------+ | ||
| input | | Jacquard-merged VCF file (or any VCF with Jacquard tags; e.g. | | ||
| | JQ_SOM_MT) | | ||
+--------+---------------------------------------------------------------------+ | ||
| output | | A single VCF file | | ||
+--------+---------------------------------------------------------------------+ | ||
|
||
Description | ||
----------- | ||
The summarize command uses the Jacquard-specific tags to aggregate caller | ||
information from the file, providing a summary-level view. The inclusion of | ||
summary fields, such as averages, helps you to easily determine which are the | ||
true variants. | ||
|
||
The summarized format tags contain the prefix 'JQ_SUMMARY'. |
Oops, something went wrong.