Skip to content

Commit

Permalink
ex-269 (dkriti/jebene) modified architecture document, UML diagram and
Browse files Browse the repository at this point in the history
added citations
  • Loading branch information
jebene committed May 22, 2015
1 parent a59f173 commit dd0f440
Show file tree
Hide file tree
Showing 9 changed files with 125 additions and 39 deletions.
15 changes: 13 additions & 2 deletions doc/new_tag.py → doc/abstract_classes.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
"""This file contains abstract classes and methods which reflect the standard
architecture of Jacquard.
The following signatures are outlined:
* FORMAT tags
* variant callers
* commands
"""


#pylint: disable=pointless-string-statement
import abc

Expand All @@ -14,7 +24,7 @@ def add_tag_values(self, vcf_record):


"""Abstract class outlining requirements for adding a new variant caller to
Jacquard."""
Jacquard. The claim() method in this class calls _NewVcfReader()"""
class NewVariantCaller(object):
#pylint:disable=too-few-public-methods,abstract-class-not-used
__metaclass__ = abc.ABCMeta
Expand All @@ -36,7 +46,8 @@ def claim(self,file_readers):


"""Abstract class outlining requirements for adding a new variant
caller-specific VcfReader object to Jacquard."""
caller-specific VcfReader object to Jacquard. This class is called by
the claim() method in NewVariantCaller()."""
class _NewVcfReader(object):
#pylint:disable=abstract-class-not-used
__metaclass__ = abc.ABCMeta
Expand Down
96 changes: 78 additions & 18 deletions doc/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,50 +4,110 @@ This overview is intended for contributers.

Coding Standards & Guidelines
-----------------------------
- Uses pylint to format the code.
- Uses '_' to separate the words both in variable and method names.
- Uses caps to separate the words in the class names.
- Uses absolute imports.
- Supports Python 2.7 and 3.x.
- Supports Python 2.7 and 3.x
- Uses pylint to format the code
- Separates words both in variable and method names with "_"
- Separate words in the class names with CAPS
- Uses absolute imports
- Utilizes nosetests to run unit- and functional-tests

|
Commands
--------
Jacquard is a suite of Python command line tools. Each tool is contained in a
module of the same name and called in the jacquard module. Only jacquard.py
is aware of all of the possible commands.
Each command transforms files or directories and is indirectly executed through
the jacquard module.

Note that only jacquard.py is aware of all of the possible commands.

|
.. figure:: images/translate_uml_sequence.jpg

**UML Sequence Diagram :** *An example UML sequence diagram for Translate.
Other commands follow a similar, yet unique sequence.*


Translate
^^^^^^^^^
Translate standardizes VCF files by adding Jacquard-specific format tags.

There are two main functions that Translate implements:
Translate standardizes VCF files by adding Jacquard-specific FORMAT tags.

**1. Filter flags are added to anomalous VCF records.**
*Filter flags are added to anomalous VCF records*
Translate initializes several private classes that label anomalous records
as such.

**2. New Jacquard-specific format tags are added to VCF records.**
Translate calls out to the variant_caller_factory.py, which then adds new
format tag values for each relevant variant caller. Only
variant_caller_factory.py is aware of all of the possible variant callers.
*New Jacquard-specific FORMAT tags are added to VCF records*
Translate calls out to the utils/variant_caller_factory.py, which then adds
new FORMAT tag values for each relevant variant caller.


Merge
^^^^^
Merge filters data from VCF files and then merges the files together.

*VCF records are filtered*
Merge initializes a private class to filter the VCF records.

*VCF records are merged*
Merge joins the VCF records together within its own module.


Summarize
^^^^^^^^^
Summarize aggregates sample-specific data based on Jacquard-specific FORMAT
tags.

*Sample-specific data is aggregated across callers*
Summarize calls out to modules in utils/ to transform Jacquard-specific
FORMAT tags into summarized FORMAT tags.


Expand
^^^^^^
Expand converts a VCF into a tab-delimited file in a tabular format.

Transforms and Tags
-------------------
*VCF fields are expanded*
Expand separates INFO and FORMAT tags into distinct columns within its
own module.

|
Variant Caller Transforms
-------------------------
Within this package are modules which transform VcfRecords. Each module
typically has a collection of tag classes, where each tag holds the metaheader
and code to transform a single VcfRecord.

Note that:

* A caller could manipulate any aspect of a VcfRecord, but (by strong
convention) typically only adds information rather than deleting it. For
example a sample-format tag, info field, filter field could be added.

* The only module in Jacuard that knows about all of the variant callers is
variant_caller_factory.

|
Utils
-----
This package contains modules with methods that are relevant to multiple
commands.

vcf
^^^
The vcf module contains multiple classes that handle input and output files,
i.e., VcfReader and VcfRecord.

|
Test Conventions
----------------
Both unit- and functional-tests are written for Jaquard using the nosetests
framework.

Note that:

* Every method must have at least one corresponding test method
* Every class must have a TestCase
* Every command must have a functional test
4 changes: 2 additions & 2 deletions doc/expand.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ The 'fixed' fields (i.e. CHROM, POS, ID, REF, ALT, QUAL, FILTER) are directly
copied from the input VCF file. Based on the metaheaders, each field in the
INFO column is expanded into a separate column named after its tag ID. Also,
based on the metaheaders, each FORMAT tag is expanded into a set of columns,
one for each sample, named as <format tag ID>|<sample column name>.
one for each sample, named as <FORMAT tag ID>|<sample column name>.

This command also emits a tab-delimited glossary file, created based on the
metaheaders in the input VCF file. FORMAT and INFO tag IDs are listed in the
glossary and are defined by their metaheader description.

.. figure:: images/expand_columns.jpg

**Expanding Columns :** *The INFO column and sample-specific format tags from
**Expanding Columns :** *The INFO column and sample-specific FORMAT tags from
the input VCF file are separated into distinct columns in the output file.*

|
Expand Down
2 changes: 1 addition & 1 deletion doc/faq.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Frequently Asked Questions
==========================

**Can I use Jacquard with other Variant Callers?**
**Can I use Jacquard with any Variant Caller?**

Merge and Expand are able to process VCF files from any variant caller.
Translate and Summarize, however, must be run with VCF files from one or
Expand Down
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Contents:
Changelog <include_changelog>
Future Directions <include_todo>

References <citations>
License <license>


28 changes: 20 additions & 8 deletions doc/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,42 @@ biological researchers.


Most variant callers have embraced the Variant Call Format (VCF) standard
[Reference]_, which clearly and succinctly describes variants from a single
[ii]_, which clearly and succinctly describes variants from a single
tumor-normal pair. However, while many callers follow the standard, they often
adopt different ways to partition results (e.g. somatic file vs. germline file,
or SNP vs. indel); likewise, each caller creates its own dialect of VCF fields
and tags. Jacquard transforms the dialects of different variant callers into a
and tags [iii]_ [v]_ [vii]_.


Each variant caller follows its own algorithms, thus producing a distinct
output. Because of this, it is valuable to run data through multiple variant
callers and compare the outputs [iii]_ [v]_ [vii]_. However, since each caller has
its own dialect, direct comparisons are difficult to make.


Jacquard transforms the dialects of different variant callers into a
controlled vocabulary of tags with a consistent representation of values.
Furthermore, it intelligently merges VCFs from different patients and callers
to create a single, unified VCF across your dataset.


The consistent tag names and represntations expedite downstream analysis, and
the ingerated VCF highlights both the prevelance of specific variants and the
overall mutation loads across samples.

|
.. figure:: images/overview_Diagram.jpg

**Overview of Jacquard Workflow :** *Jacquard transforms different caller
dialects into a uniform VCF format.*

At this time, the Jacquard-supported variant callers are MuTect, VarScan, and
Strelka. A subset of the Jacquard commands support VCFs from other variant
At this time, the Jacquard-supported variant callers are:

* MuTect [i]_
* VarScan [iv]_
* Strelka [vi]_

A subset of the Jacquard commands support VCFs from other variant
callers.


Expand All @@ -51,7 +67,3 @@ Contact Us
Email bfx-jacquard@umich.edu for support and questions.

**UM BRCF Bioinformatics Core**


.. [Reference] Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et
al. The variant call format and VCFtools.Bioinformatics 2011; 27: 2156–8.
15 changes: 10 additions & 5 deletions doc/translate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Translate
---------
The translate command accepts a directory of VCF files and creates a new
directory of "translated" VCF files, which include several Jacquard-specific
format tags and their corresponding metaheaders. Were a variant in the source
FORMAT tags and their corresponding metaheaders. Were a variant in the source
VCF to be malformed, the translated VCF file would label it as such in the
FILTER column.

Expand All @@ -13,14 +13,19 @@ run translate once for each input directory. When partitioning into separate
input directories, all file names must be unique.


The translated format tags contain a caller specific prefix; example: 'JQ_SK'
The translated FORMAT tags contain a caller specific prefix; example: 'JQ_SK'
for Strelka, 'JQ_VS' for VarScan and 'JQ_MT' for MuTect.

Currently, Translate adds Jacquard-specific FORMAT tags for:
* Allele Frequency
* Depth
* Somatic Status

.. figure:: images/translate_pic.jpg

**Addition of the Jacquard-Specific Format Tags :** *The translated VCF files
contain the original format tags from the input files as well as the
Jacquard-specific format tags.*
**Addition of the Jacquard-Specific FORMAT Tags :** *The translated VCF files
contain the original FORMAT tags from the input files as well as the
Jacquard-specific FORMAT tags.*

|
Expand Down
1 change: 0 additions & 1 deletion doc/workflows.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
Workflows and Supported Variant Callers
=======================================
|
Workflows
---------
Jacquard is a suite of tools that can be either run in succession or
Expand Down
2 changes: 0 additions & 2 deletions jacquard/variant_caller_transforms/variant_caller_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@
"""
from __future__ import print_function, absolute_import, division

import jacquard.utils.logger as logger
import jacquard.utils.utils as utils
import jacquard.variant_caller_transforms.mutect as mutect
import jacquard.variant_caller_transforms.strelka as strelka
import jacquard.variant_caller_transforms.varscan as varscan
Expand Down

0 comments on commit dd0f440

Please sign in to comment.