Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Given a MAF or folder of MAF files, find the optimal linear combination of COSMIC mutational signatures that best describe the sample's SNV trinucleotide context distribution.

A Python implementation of the DeconstructSigs algorithm described in Modeled after the R implementation coded by Rachel Rosenthal which can be found at Available as a workflow on FireCloud:

From the GenomeBiology description:

The deconstructSigs approach determines the linear combination of pre-defined signatures that most accurately reconstructs the mutational profile of a single tumor sample. It uses a multiple linear regression model with the caveat that any coefficient must be greater than 0, as negative contributions make no biological sense.


Click the green Clone or Download button above, copy the presented web URL, and then paste it into your terminal, preceded by the command 'git clone'.

git clone

Or install via pip: pip install git+


The following parameters can be provided to a DeconstructSigs object upon initialization:

  • maf: a) Optional path to a single MAF file. If provided, analysis will be conducted on this file. Either this parameter or mafs_folder must be supplied -- OR -- b) optional path to a folder containing multiple MAF file. If provided, analysis will be conducted on all MAF files within the given folder, weighting the mutation context counts according to how many total mutations are present in the originating sample MAF. Either this parameter or maf_file_path must be supplied.

  • context_counts: Optional. This argument can be used to provide a dictionary of context counts rather than a MAF file. Keys are of the form 'A[C>A]A', 'A[C>A]C', etc., and values are integer counts.

  • cutoff: Optional, default value of 0.06. The weights of all signatures calculated to be contributing less than the cutoff will be zeroed. These signatures will be considered 'Other' in the pie chart generated by a call to figures()

  • analysis_handle: Optional. If provided, analysis_handle will be used in the titles of all plots and figures generated.

  • hg19_fasta_path: Optional. If provided, analysis will determine trinucleotide context by using samtools to search within provided fasta file for each SNP. Requires local installation of samtools, as samtools is run through subprocess. If not provided, DeconstructSigs assumes that the MAF file contains a ref_context column.

  • output_folder: Optional. If provided, calculated signature weights will be output here upon calling which_signatures(), and plot figures will be saved here as well upon calling figures(). If not provided, figures will simply display as they are generated.

The which_signatures() method outputs the vector of weights calculated for each signature. It takes a few parameters as well:

  • signatures_limit: Optional, default None. If provided, number of signatures allowed to contribute to solution is capped at signatures_limit. Otherwise up to 30 COSMIC signatures could potentially be used.

  • associated: Optional, default None, list of integer indices of COSMIC signatures in range 0-29. Useful when it is known that only a pre-determined subset of COSMIC signatures should be tried.

  • verbose: Optional, default False. If True then logs describing weight updates on each iteration will be output to stdout.

Finally, to generate figures, use the figures() method, with the following parameters:

  • weights: Required. The vector of weights generated by the which_signatures() method.

  • explanations: Optional, default is False. If explanations is set to True, the signatures pie chart will include curated information about each COSMIC signature selected directly on the pie chart in the legend.


from deconstructSigs import DeconstructSigs

def main():
    fasta_path = '/path/to/Homo_sapiens_assembly19.fasta'
    ds = DeconstructSigs(maf_file_path='/path/to/snvs.maf',

    weights = ds.which_signatures(verbose=True)
    ds.plot_signatures(weights, explanations=True)

if __name__ == '__main__':


Output of figures():

  • Tumor profile Tumor Profile
  • Reconstructed tumor profile Reconstructed Tumor Profile
  • Difference between original and reconstructed tumor profile Difference Tumor Profile
  • COSMIC signatures breakdown with explanations Cosmic Signature Pie Chart
  • COSMIC signatures breakdown

Cosmic Signature Pie Chart

Data Sources

Recent modifications:

  1. Modified to allow installation via pip: pip install git+
  2. Used pyfaidx to get context sequence of mutations from fasta file instead of samtools to speed up reading maf files.
  3. Added Settings to set global settings:
    from deconstructSigs import Settings, DeconstructSigs
    # font family used to plot
    Settings.font_family = 'Arial'
    # font weight used to plot
    Settings.font_weight = 'bold'
    # Print verbose information/logging
    Settings.verbose     = True
    # Cutoff below which calculated signatures will be discarded
    Settings.sig_cutoff  = 0.05
    # When the iteration stops
    Settings.err_thres   = 1e-3
    # logging format
    Settings.log_format  = '[%(asctime)s %(levelname)-.1s] %(message)s'
    Settings.log_time    = '%Y-%m-%d %H:%M:%S'
  4. Combined mafs_folder and maf_file_path into a maf argument, which could be a directory containing maf files or a maf file itself.
  5. Allowed gziped maf input files.
  6. Allowed '#' as comment in maf files.
  7. Formatted logging/verbose messages.
  8. Made it compatible with python2 and python3.


DeconstructSigs algorithm in python.






No packages published