Skip to content

weghornlab/SigNet

 
 

Repository files navigation

SigNet

SigNet is a package to study genetic mutational processes. Check out our theoretical background page for further information on this topic. As of now, it contains 3 solutions:

Readme contents

You can use SigNet in 3 different ways depending on your workflow:

  1. Python Package

    1. Python Package Installation
    2. Python Package Usage
  2. Command Line Interface (CLI)

  3. Source Code

    1. Downloading Source Code
    2. Code-Basics

Python Package

Recommended if you want to integrate SigNet as part of your python workflow, or intending to re-train models on custom data with limited ANN architectural changes. You can install the python package running:

pip install signet

NOTE: The package hasn't yet been published to pypi. Please refer to Source Code to use it for now.*

Once installed, you can run Signet Refitter like so:

import pandas as pd
import signet

# Read your mutational data
mutations = pd.read_csv("your_input", header=0, index_col=0)

# Load & Run signet
signet = SigNet(opportunities_name_or_path="your_normalization_file")
results = signet(mutation_dataset=mutations)

# Extract results
w, u, l, c, _ = results.get_output()

# Store results
results.save(path='Output', name="this_experiment_filename")

# Plot figures
results.plot_results(save=True)

For a more usage examples: Check out the examples folder:

NOTE: It is recommended that you work on a custom python virtualenvironment to avoid package version mismatches.

Command Line Interface

Recommended if only interested in running SigNet modules independently and not willing to retrain models or change the source code.
NOTE: This option is only tested on Debian-based Linux distributions. Steps:

  1. Download the signet exectuable
  2. Change directory to wherever you downloaded it: cd <wherever/you/downloaded/the/executable/>
  3. Make it executable by your user: sudo chmod u+x signet

Refitter:

The following example shows how to use SigNet Refitter.

cd <wherever/you/downloaded/the/executable/>
./signet refitter  [--input_format {counts, bed, vcf}]
                   [--input_data INPUTFILE]
                   [--reference_genome REFGENOME]
                   [--normalization {None, exome, genome, PATH_TO_ABUNDANCES}] 
                   [--only_nnls ONLYNNLS]
                   [--cutoff CUTOFF]
                   [--output OUTPUT]
                   [--plot_figs False]
  • --input_format: Name of the format of the input. The default is 'counts'. Please refer to Mutations Input for further details.

  • --input_data: Path to the file containing the mutational counts. Please refer to Mutations Input for further details.

  • --reference_genome: Name or path to the reference genome. Needed when input_format is bed or vcf.

  • --normalization: As the INPUTFILE contain counts, we need to normalize them according to the abundances of each trinucleotide on the genome region we are counting the mutations.

    • Choose None (default): If you don't want any normalization.
    • Choose exome: If the data that is being input comes from Whole Exome Sequencing. This will normalize the counts according to the trinucleotide abundances in the exome.
    • Choose genome: If the data comes from Whole Genome Sequencing.
    • Set a PATH_TO_ABUNDANCES to use a custom normalization file. Please refer to Normalization Input for further details on the input format.
  • --only_nnls: Whether to use NNLS mode only (the finetuner is not run). Default: False.

  • --cutoff: Cutoff to be applied to the final weights. Default: 0.01.

  • --output Path to the folder where all the output files (weights guesses and figures) will be stored. By default, this folder will be called "Output" and will be created in the current directory. Please refer to SigNet Refitter Output for further details on the output format.

  • --plot_figs Whether to generate output plots or not. Possible options are True or False.

Detector:

cd <wherever/you/downloaded/the/executable/>
./signet detector  [--input_data INPUTFILE]
                   [--normalization {None, exome, genome, PATH_TO_ABUNDANCES}] 
                   [--output OUTPUT]

(Same arguments as before)

Generator:

cd <wherever/you/downloaded/the/executable/>
./signet generator  [--n_datapoints INT]
                    [--output OUTPUT]
  • --n_datapoints: Number of signature weight combinations to generate.

Source Code

Is the option which gives more flexibility. Recommended if you want to play around with the code, re-train custom models or do contributions.

Downloading Source Code

Clone the repo and install it as an editable pip package like so:

git clone git@github.com:OleguerCanal/SigNet.git
cd SigNet
pip install -e .

Refer here for the project code organization.

About

Mutational signature fitting with an ANN

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.0%
  • R 5.0%