Skip to content
waqasuddinkhan edited this page May 18, 2020 · 26 revisions

Welcome to the Wiki Page of MACARON

Table of Contents

Interpretation of MACARON Output

Let's try to explore the MACARON_output.txt:

CHROM	POS	ID	REF	ALT	Gene_Name	QUAL	sample01.GT	sample02.GT	sample03.GT	sample01.GT	sample02.GT	sample03.GT	Protein_coding_Gene_Name	AA-Change	REF-codon	ALT-codon	ALT-codon_merge-2VAR	AA-Change-2VAR	ALT-codon_merge-3VAR	AA-Change-3VAR
chr22	21349676	rs412470	T	A	LZTR1	423.0	T/T	T/A	T/T	0/0	0/1	0/0	MISSENSE	S92T	Tct	Act	ATt	I	.	.
chr22	21349677	rs376419	C	T	LZTR1	423.0	C/C	C/T	C/C	0/0	0/1	0/0	MISSENSE	S92F	tCt	tTt	.	I	.	.
chr22	23247169	rs527511481	T	G	IGLJ3	719.0	T/T	T/G	T/T	0/0	0/1	0/0	MISSENSE	W39G	Tgg	Ggg	GTg	V	.	.
chr22	23247170	rs540954398	G	T	IGLJ3	716.0	G/G	G/T	G/G	0/0	0/1	0/0	MISSENSE	W39L	tGg	tTg	.	V	.	.

Overall, two pairs of pcSNV are observed (Line 1 and 2; first pair chr22:21349676-21349677, and Line 3 and 4; second pair chr22:23247169-23247170). Since we learned this from available SNVs annotation callers, we start to interpret each SNV (individually) of first pair. chr22:21349676 part of the genetic codon Tct codes for the amino acid position S92 for the protein product of gene LZTR1. For allele A, Tct changes to Act that now codes for T92. chr22:21349677 also part of the genetic codon tCt so codes for the same amino acid position S92 for the protein product of gene LZTR1. For allele T, tCt changes to tTt that now codes for F92.

Now, according to MACARON:

Rather than considering these two SNVs as two different variation events, it should be regarded as combined chr22:23247169-23247170 variation event as both SNVs affecting the same genetic codon. So, the re-annotation should be the merging of two ALT-codon, that is, Act and tTt to form a new codon ATt that codes for I.

Sample Allelic or Genotype Status of SnpCluster

Continuing our focus on first pair of pcSNV, here, we will access how many samples have:

  • this pcSNV (chr22:23247169-23247170), and
  • What is the allelic (or genotype) status of this pcSNV?

sample01 and sample03 both are homozygous reference to this pcSNV but sample02 is heterozygous reference. So, the user needs to focus sample02 for validating pcSNV.

Interpretation of MACARON Validation Output

Let's try to understand the output of MACARON_validate.sh:

sub1 chr22:21349676-21349677 sample02
       1 AA
       1 T
       11 AT
       14 TC

The first line sub1 chr22:21349676-21349677 sample02 indicates the numbering (do not confuse with sub, it is just the substitution of REF codon to ALT codon in first pair of pcSNV in as observed in MACARON_output.txt, chr coordinates with sample name.

As we understood from Interpretation of MACARON Output, the ALT bases we are looking at position chr22:21349676-21349677 should be AT. So, total we have 29 read counts for this pcSNV. 1 for AA, 1 for T, 11 for AT and 14 for TC. Since sample02 is heterozygous reference for this pcSNV, out of 29 reads, 11 reads have AT pcSNV existed on the same read.

/home/wuk/Pictures/for_macaron_wiki.png

Updates

version 0.4 [06/13/18] after suggested patches of DYLAN

  • Python 2 and 3 compatibility (Done),
  • Deprecated GATK3 option --allowMissingData is removed,
  • Handles GATK versions >= 4.0 via new option: --gatk4,
  • MACARON renaming and respect to UNIX standard (Done).

version 0.5 [06/14/18]

  • Changed shell output aesthetics ;)
  • Check that SNPEFF_HG is not empty,
  • Check that files set in GATK, HG_REF and SNPEFF exist,
  • Added verbosity option: -v , enables output from GATK and SNPEff.

version 0.6 [06/14/18]

  • All temporary files are now stored in "macaron_tmp" directory, which is removed at the end of the process,
  • Option -d has been removed,
  • Added mode eco-friendly with option -c or --eco-friendly, which disable animation but save a thread,
  • Added possibility to set GATK path, SnpEff path, SnpEff human genome annotation database version, and human reference genome paths as optional arguments, the user can still set default values directly in the script.

version 1.0 [05/18/20]

Update to MACARON, now version 1.0

Some news:

Now compatible with latest version of GATK4 (4.1.7) - GATK3 is no longer supported (the previous version of MACARON will still work great with GATK3) GATK4 is now used as default, and MACARON relies on gatk wrapper instead of the .jar file For version of GATK4 before 4.1.4.1, the option of IndexFeatureFile is different. The option --gatk4_previous must be added when using these older versions of GATK4. MACARON can also handle the snpEff wrapper (available via bioconda). The extension of the file (either '.jar' or no extension) will allow MACARON to determine if the wrapper is used or the .jar file. If gatk and snpEff are accessible via $PATH, there is no need for the user to provide the path of these programs to MACARON. Animation is now an option, use -c option to visualize the wheel turning (great to pass the time!) A bug with grep has been fixed to allow the use of MACARON with MacOS ! Global refactoring (classes dropped, and some more) Option --keep_temp was added to not delete temporary files (useful for debugging)

To do:

GATK4 is not completely silent despite setting QUIET and verbosity parameters. Need to find a way to hide it in the next version...