Skip to content

Commit

Permalink
ex-294 (cgates): update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
cgates committed Sep 21, 2015
1 parent b1f99ce commit 5f6c459
Show file tree
Hide file tree
Showing 13 changed files with 135 additions and 144 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@ Changelog

0.42 (X/X/XXXX)
---------------
- Added docs on readthedocs.
- Improved workflow documentation with example data
- Merge will now disambiguate tag collisions from multiple VCs
- Translate/summarize now support GT tags

- Extended precision to 4 decimal places to support analysis of gene-panels.

0.41 (5/7/2015)
---------------
Expand Down
4 changes: 4 additions & 0 deletions INSTALL.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,7 @@ If you don't have root permissions, you can install locally:

``$ pip install --user jacquard``

(You may need to modify your path to include the Python install dir (e.g.
/Users/<username>/.local/bin)


3 changes: 2 additions & 1 deletion TODO.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
Future Directions
=================
- Parallelize [translate]
- Improve performance of [merge]
- Add [weave] command to combine [translate, merge, summarize]
- Extend [expand] to parse SnpEff/Annovar annotated results
- Extend [expand] to generate formatted results
- Improve command validation (check source tags, check "shape" of inputs)
- Enable 4.2 VCF support
- Enable 4.2/4.3 VCF support
- Add support for new somatic callers
- Add support for Germline workflows
- Add support for Galaxy integration
Expand Down
46 changes: 31 additions & 15 deletions doc/command_details.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,25 +37,41 @@ messages are only written to the log file unless logger is initialized as
verbose (in which case debug is also echoed to console).


General usage
^^^^^^^^^^^^^
``usage: jacquard <SUBCOMMAND> [ARGUMENTS] [OPTIONS]``


For help on a specific command:


``jacquard <SUBCOMMAND> --help``


* Jacquard first writes output files to a temporary directory and only copies
the files upon successful completion of each subcommand.
* Error, warning, and info messages are written to console and log file. Debug
messages are only written to the log file (unless --verbose specified).


Input File Conventions
^^^^^^^^^^^^^^^^^^^^^^
Jacquard assumes that the first element of the filename (up to the first dot)
is a patient identifier.

| patientA-113.mutect.vcf
| patientA-113.strelka.snv.vcf
| patientA-113.strelka.indel.vcf
* Jacquard assumes that the first element of the filename (up to the first dot)
is a patient identifier. For example:

This set of three files all have the same patient identifier (patientA-113) to
represent the same tumor-normal pair; these files will be combined into a
single pair of tumor-normals in the merged VCF. See
:ref:`merge <merge-command>` for more details.
* patientA-113.mutect.vcf
* patientA-113.strelka.snv.vcf
* patientA-113.strelka.indel.vcf

This set of three files all have the same patient identifier (patientA-113).
The tumor-normal sample pairs will be combined into a single pair of
tumor-normals columns in the merged VCF. See :ref:`merge <merge-command>` for
more details.

To translate a specific VCF dialect, Jacquard determines the source variant
caller based on the VCF metaheaders. For this reason it is essential that you
preserve all metaheaders in the source VCF.
* To translate a specific VCF dialect, Jacquard determines the source variant
caller based on the VCF metaheaders. For this reason it is essential that you
preserve all metaheaders in the source VCF.

* For a specific source VCF, Jacquard automatically determines the tumor and
normal samples based on the column header and the metaheaders.

For a specific source VCF, Jacquard automatically determines the tumor and
normal samples based on the column header and the metaheaders.
35 changes: 21 additions & 14 deletions doc/expand.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ Usage
*positional arguments:*

+--------+---------------------------------------------------------------------+
| input | | A VCF file. Other file types ignored |
| input | | A VCF file. |
+--------+---------------------------------------------------------------------+
| output | | A TXT file |
| output | | A tab separated text file. |
+--------+---------------------------------------------------------------------+


Expand All @@ -39,22 +39,29 @@ The expand command converts a VCF file into a tab-delimited file in a tabular
format. This format is more suitable than a VCF for analysis and visualization
in R, Pandas, Excel, or another third-party application.


.. figure:: images/expand_tabular.jpg

**Tabular Format of Jacquard Output :** *Jacquard transforms the dense VCF
format into a tabular format.*
**Tabular format of Jacquard output:** *Jacquard transforms the dense VCF format
into a tabular format.*


Note
-----
* The 'fixed' fields (i.e. CHROM, POS, ID, REF, ALT, QUAL, FILTER) are directly
copied from the input VCF file.
* Based on the metaheaders, each field in the INFO column is expanded into a
separate column named after its tag ID.
* Each FORMAT tag is expanded into a set of columns, one for each sample, named
as <FORMAT tag ID>|<sample column name>.
* By default, all INFO fields and FORMAT tags are expanded; specific INFO
fields and FORMAT tags can be selected using the --selected_columns_file
option.
* Expand also emits a tab-delimited glossary file, based on the metaheaders
in the input VCF file. FORMAT and INFO tag IDs are listed in the
glossary and are defined by their metaheader description.

The 'fixed' fields (i.e. CHROM, POS, ID, REF, ALT, QUAL, FILTER) are directly
copied from the input VCF file. Based on the metaheaders, each field in the
INFO column is expanded into a separate column named after its tag ID. Also,
based on the metaheaders, each FORMAT tag is expanded into a set of columns,
one for each sample, named as <FORMAT tag ID>|<sample column name>. By default,
all INFO fields and FORMAT tags are expanded; specific INFO fields and FORMAT
tags can be selected using a flag.

This command also emits a tab-delimited glossary file, created based on the
metaheaders in the input VCF file. FORMAT and INFO tag IDs are listed in the
glossary and are defined by their metaheader description.

.. figure:: images/expand_excel.jpg

Expand Down
26 changes: 15 additions & 11 deletions doc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,39 @@ Frequently Asked Questions
==========================

**Is Jacquard a variant caller?**

No, Jacquard is not a variant caller.
Jacquard is not a variant caller. It accepts VCF output from variant callers
and integrates them for simplified annotation and analysis.



**Can Jacquard annotate data?**

No, Jacquard cannot annotate data; however the output from translate, merge,
and summarize can be run through an annotation tool such as SnpEff or
Annovar.



**Can I use Jacquard with any Variant Caller?**

**Can I use Jacquard with any variant caller?**
Merge and expand are able to process VCF files from any variant caller.
Translate and summarize, however, must be run with VCF files from one or
more of the supported variant callers. Currently, Jacquard supports
MuTect, VarScan, and Strelka.



**Can I use Jacquard to show all the results from different callers without
standardization of the input data?**
**I'd like to merge my VCFs, but my caller isn't supported by Jacquard.**
Both merge and expand commands can be used to show all of the results from
different callers without standardization of the input data. However, it is
recommended that the input data be standardized whenever possible to
directly compare data across callers.



**Does Jacquard work with Germline callers?**
The translate command is optimized to work with tumor-normal sample pairs.
Germline VCFs can be used with merge and expand commands. Better support for
germline and pedgigree VCFs is coming soon.

Both merge and expand can be used (either individually or together) to
show all of the results from different callers without standardization of
the input data. However, it is recommended that the input data be
standardized whenever possible to directly compare data across callers.

Still Have Questions?
^^^^^^^^^^^^^^^^^^^^^
Expand Down
81 changes: 0 additions & 81 deletions doc/general_usage.rst

This file was deleted.

Binary file modified doc/images/summarize.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 24 additions & 8 deletions doc/implementation_details.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,29 +27,41 @@ Test Conventions
- Every command should have a functional test
- Prefer unit tests to functional tests
- Prefer tests on public methods, but note that it is sometimes easier to test
a private method.
- Attempt PEP8 compliance
- Make tests independent.
a private method. Use good judgement.
- Attempt PEP8 compliance.
- Make unit tests independent.


General Architecture:
---------------------
Modules are typically:
Modules are typically one of these:
- commands (like *translate*): these modules are invoked from the command line;
they follow a simple command pattern.
- variant caller transforms (like *mutect*): these modules contain classes that
add Jacquard annotations to a native VCF record.
- utilities (like *vcf* or *logger*): these modules provide a common method or
class used by other modules.


Note that translate is the only command that should understand variant caller
dialects; other commands should be caller agnostic.


Extending and adapting existing patterns will ensure commands/transforms stay
consistent. Here are some guidelines on how to extend functionality:


How to add a new format tag:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For all variant callers that support the new tag, you will need to extend each
variant caller transform to:
* define the new tag (set the metaheader and how the new value is derived)


* define the new tag (set the metaheader and how the new value is derived); by
convention, tags ID values are JQ_<caller_abbreviation>_<tag_name>
* add the new tag to the variant caller's reader


.. note:: If the new tag can be summarized, you will also need to add a
corresponding tag to *summarize_rollup_transform*.

Expand All @@ -63,9 +75,11 @@ How to add a new variant caller:
* Add a new class named for the variant caller; define a claim method to
recognize and claim VCF files.
* Add the new variant caller class to *variant_caller_factory*.
.. note:: The variant caller should have no dependencies on other packages
(except utils and vcf) and classes should only refer to variant
callers through *variant_caller_factory* (except tests).


..note:: The variant caller should have no dependencies on other packages
(except utils and vcf) and classes should only refer to variant
callers through *variant_caller_factory* (except tests).

How to add a new command:
^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -77,6 +91,8 @@ How to add a new command:
* validate_args(args).
* report_prediction
* execute(args, execution_context).


.. note:: Commands are independent and should not refer to other commands.

|
Expand Down
Loading

0 comments on commit 5f6c459

Please sign in to comment.