SigNet is a package to study genetic mutational processes. Check out our theoretical background page for further information on this topic. As of now, it contains 3 solutions:
- SigNet Refitter: Tool for signature decomposition.
- SigNet Generator: Tool for realistic mutational data generation.
- SigNet Detector: Tool for mutational vector out-of-distribution detection.
You can use SigNet in 3 different ways depending on your workflow:
-
- Python Package Installation
- Python Package Usage
-
Command Line Interface (CLI)
-
- Downloading Source Code
- Code-Basics
Recommended if you want to integrate SigNet as part of your python workflow, or intending to re-train models on custom data with limited ANN architectural changes. You can install the python package running:
pip install signet
NOTE: The package hasn't yet been published to pypi. Please refer to Source Code to use it for now.*
Once installed, you can run Signet Refitter like so:
import pandas as pd
import signet
# Read your mutational data
mutations = pd.read_csv("your_input", header=0, index_col=0)
# Load & Run signet
signet = SigNet(opportunities_name_or_path="your_normalization_file")
results = signet(mutation_dataset=mutations)
# Extract results
w, u, l, c, _ = results.get_output()
# Store results
results.save(path='Output', name="this_experiment_filename")
# Plot figures
results.plot_results(save=True)
For a more usage examples: Check out the examples folder:
- refitter_example.py for a usage example.
- generator_example.py for a usage example.
- detector_example.py for a usage example.
NOTE: It is recommended that you work on a custom python virtualenvironment to avoid package version mismatches.
Recommended if only interested in running SigNet modules independently and not willing to retrain models or change the source code.
NOTE: This option is only tested on Debian-based Linux distributions. Steps:
- Download the signet exectuable
- Change directory to wherever you downloaded it:
cd <wherever/you/downloaded/the/executable/>
- Make it executable by your user:
sudo chmod u+x signet
Refitter:
The following example shows how to use SigNet Refitter.
cd <wherever/you/downloaded/the/executable/>
./signet refitter [--input_format {counts, bed, vcf}]
[--input_data INPUTFILE]
[--reference_genome REFGENOME]
[--normalization {None, exome, genome, PATH_TO_ABUNDANCES}]
[--only_nnls ONLYNNLS]
[--cutoff CUTOFF]
[--output OUTPUT]
[--plot_figs False]
-
--input_format
: Name of the format of the input. The default is 'counts'. Please refer to Mutations Input for further details. -
--input_data
: Path to the file containing the mutational counts. Please refer to Mutations Input for further details. -
--reference_genome
: Name or path to the reference genome. Needed when input_format is bed or vcf. -
--normalization
: As the INPUTFILE contain counts, we need to normalize them according to the abundances of each trinucleotide on the genome region we are counting the mutations.- Choose
None
(default): If you don't want any normalization. - Choose
exome
: If the data that is being input comes from Whole Exome Sequencing. This will normalize the counts according to the trinucleotide abundances in the exome. - Choose
genome
: If the data comes from Whole Genome Sequencing. - Set a
PATH_TO_ABUNDANCES
to use a custom normalization file. Please refer to Normalization Input for further details on the input format.
- Choose
-
--only_nnls
: Whether to use NNLS mode only (the finetuner is not run). Default:False
. -
--cutoff
: Cutoff to be applied to the final weights. Default: 0.01. -
--output
Path to the folder where all the output files (weights guesses and figures) will be stored. By default, this folder will be called "Output" and will be created in the current directory. Please refer to SigNet Refitter Output for further details on the output format. -
--plot_figs
Whether to generate output plots or not. Possible options areTrue
orFalse
.
Detector:
cd <wherever/you/downloaded/the/executable/>
./signet detector [--input_data INPUTFILE]
[--normalization {None, exome, genome, PATH_TO_ABUNDANCES}]
[--output OUTPUT]
(Same arguments as before)
Generator:
cd <wherever/you/downloaded/the/executable/>
./signet generator [--n_datapoints INT]
[--output OUTPUT]
--n_datapoints
: Number of signature weight combinations to generate.
Is the option which gives more flexibility. Recommended if you want to play around with the code, re-train custom models or do contributions.
Clone the repo and install it as an editable pip package like so:
git clone git@github.com:OleguerCanal/SigNet.git
cd SigNet
pip install -e .
Refer here for the project code organization.