## Quick demo on how to use the `A2TEA_finished.RData` output

- currently, the fastest way to execute the pipeline is to build a conda environment with ALL software included 
- using more fine-grained individual yaml files (--use-conda option) for rules/sets of rules leads to  sequential run (takes much longer) in the checkpoints steps for some reason...
- therefore for now: building big environment with -> workflow/envs/complete_environment.yaml
  
e.g.:  
`mamba env create --name A2TEA --file workflow/envs/complete_environment.yaml`  
activate this environment; then run A2TEA with:  
`snakemake --cores xx`

**the final output of A2TEA is found under `tea/A2TEA_finished.RData`**

### 2.) working with A2TEA output for the moment..

#### 2.1 - load libraries - define three classes
-> important for your R instance to understand my new custom data structs

In [10]:
library(DESeq2)
library(tidyverse)
library(ggtree)
library(ggtreeExtra)
library(Biostrings)
library(seqinr)
library(UpSetR)
library(cowplot)
library(ggplotify)

In [11]:
# class for the expanded_OG - containing all different types of data we have on it
setClass("expanded_OG", slots=list(genes="spec_tbl_df",
                                   blast_table="tbl_df",
                                   nrow_table="numeric",
                                   num_genes_HOG="numeric",
                                   num_genes_extend="numeric",
                                   num_genes_complete="numeric",
                                   genes_HOG="tbl_df",
                                   genes_extend_hits="tbl_df",
                                   fasta_files="list",
                                   msa="AAStringSet",
                                   tree="phylo"))

# class for the hypotheses
setClass("hypothesis",
         slots=list(description="character",
                                  number="character",
                                  expanded_in ="character",
                                  compared_to="character",
                                  expanded_OGs="list",
                                  species_tree="phylo"))

# class for extended BLAST hits info
setClass("extended_BLAST_hits",
         slots=list(blast_table="tbl_df",
                    num_genes_HOG="numeric",
                    num_genes_extend="numeric",
                    num_genes_complete="numeric",
                    genes_HOG="tbl_df",
                    genes_extend_hits="tbl_df")
         )

#### 2.2 - read-in the A2TEA output .RData file

In [12]:
load("../A2TEA_finished.RData")

#### 2.3 - currently included: 4 files

- `hypotheses`     
- `HYPOTHESES.a2tea`
- `HOG_DE.a2tea`
- `HOG_level_list`

In [49]:
hypotheses

hypothesis,name,expanded_in,compared_to
<dbl>,<chr>,<chr>,<chr>
1,Expanded in Arabidopsis compared to Monocots,Arabidopsis_thaliana,Zea_mays;Hordeum_vulgare
2,Expanded in barley compared to maize,Hordeum_vulgare,Zea_mays


In [48]:
summary(HYPOTHESES.a2tea)
HYPOTHESES.a2tea$hypothesis_1@description
HYPOTHESES.a2tea$hypothesis_1@number
HYPOTHESES.a2tea$hypothesis_1@expanded_in
HYPOTHESES.a2tea$hypothesis_1@compared_to
HYPOTHESES.a2tea$hypothesis_1@species_tree

#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@genes
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@blast_table
## ? HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000308@nrow_table
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@num_genes_HOG
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@num_genes_extend
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@num_genes_complete
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@genes_HOG
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@genes_extend_hits
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@fasta_files
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@msa
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@tree

             Length Class      Mode
hypothesis_1 1      hypothesis S4  
hypothesis_2 1      hypothesis S4  


Phylogenetic tree with 3 tips and 2 internal nodes.

Tip labels:
  Arabidopsis_thaliana, Hordeum_vulgare, Zea_mays
Node labels:
  N0, N1

Rooted; includes branch lengths.

In [19]:
str(HOG_DE.a2tea)
head(HOG_DE.a2tea)

'data.frame':	113939 obs. of  10 variables:
 $ species       : chr  "Arabidopsis_thaliana" "Arabidopsis_thaliana" "Arabidopsis_thaliana" "Arabidopsis_thaliana" ...
 $ gene          : chr  "AT1G01010" "AT1G01020" "AT1G01030" "AT1G01040" ...
 $ baseMean      : num  41.2 248.8 98.6 349.3 490.9 ...
 $ log2FoldChange: num  -0.1422 0.0483 0.1554 0.1579 -0.0106 ...
 $ lfcSE         : num  0.263 0.225 0.242 0.211 0.182 ...
 $ stat          : num  -0.5397 0.2147 0.6431 0.7492 -0.0586 ...
 $ pvalue        : num  0.589 0.83 0.52 0.454 0.953 ...
 $ padj          : num  1 1 1 1 1 ...
 $ significant   : chr  "no" "no" "no" "no" ...
 $ HOG           : chr  "N0.HOG0007254" "N0.HOG0005513" "N0.HOG0016503" "N0.HOG0011661" ...


Unnamed: 0_level_0,species,gene,baseMean,log2FoldChange,lfcSE,stat,pvalue,padj,significant,HOG
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
1,Arabidopsis_thaliana,AT1G01010,41.19541,-0.14215259,0.2633998,-0.5396837,0.5894152,0.9998987,no,N0.HOG0007254
2,Arabidopsis_thaliana,AT1G01020,248.8242,0.04825906,0.2247451,0.214728,0.8299794,0.9998987,no,N0.HOG0005513
3,Arabidopsis_thaliana,AT1G01030,98.5576,0.15535117,0.2415571,0.6431241,0.5201436,0.9998987,no,N0.HOG0016503
4,Arabidopsis_thaliana,AT1G01040,349.29246,0.15792841,0.2108101,0.74915,0.4537668,0.9998987,no,N0.HOG0011661
5,Arabidopsis_thaliana,AT1G01050,490.8879,-0.01063666,0.1815235,-0.0585966,0.9532734,0.9998987,no,N0.HOG0000926
6,Arabidopsis_thaliana,AT1G01060,504.75488,-0.18652888,0.2859985,-0.6522023,0.5142706,0.9998987,no,N0.HOG0004937


In [22]:
str(HOG_level_list)
head(HOG_level_list$hypothesis_1)
head(HOG_level_list$hypothesis_2)

List of 2
 $ hypothesis_1: tibble [20,913 × 11] (S3: tbl_df/tbl/data.frame)
  ..$ HOG                       : chr [1:20913] "N0.HOG0000236" "N0.HOG0000865" "N0.HOG0000887" "N0.HOG0000914" ...
  ..$ tea_value                 : num [1:20913] 0.0706 0.075 0.075 0.075 0.075 ...
  ..$ cafe_pvalue               : num [1:20913] 0.001 0.181 0.181 0.181 0.181 0.019 0.062 0.062 0.062 0.062 ...
  ..$ Arabidopsis_thaliana_total: num [1:20913] 12 6 6 6 6 10 6 6 6 6 ...
  ..$ Zea_mays_total            : num [1:20913] 1 2 2 2 2 3 2 2 1 2 ...
  ..$ Hordeum_vulgare_total     : num [1:20913] 4 2 2 2 2 2 1 1 2 1 ...
  ..$ expansion                 : chr [1:20913] "yes" "yes" "yes" "yes" ...
  ..$ Arabidopsis_thaliana_sigDE: num [1:20913] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Hordeum_vulgare_sigDE     : num [1:20913] 1 2 NA 1 NA 1 NA 1 1 1 ...
  ..$ Zea_mays_sigDE            : num [1:20913] NA NA NA NA 2 1 NA NA NA NA ...
  ..$ total_sigDE               : num [1:20913] 1 2 NA 1 2 2 NA 1 1 1 ...
 $ hypot

HOG,tea_value,cafe_pvalue,Arabidopsis_thaliana_total,Zea_mays_total,Hordeum_vulgare_total,expansion,Arabidopsis_thaliana_sigDE,Hordeum_vulgare_sigDE,Zea_mays_sigDE,total_sigDE
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
N0.HOG0000236,0.07058824,0.001,12,1,4,yes,,1.0,,1.0
N0.HOG0000865,0.075,0.181,6,2,2,yes,,2.0,,2.0
N0.HOG0000887,0.075,0.181,6,2,2,yes,,,,
N0.HOG0000914,0.075,0.181,6,2,2,yes,,1.0,,1.0
N0.HOG0000951,0.075,0.181,6,2,2,yes,,,2.0,2.0
N0.HOG0000317,0.08888889,0.019,10,3,2,yes,,1.0,1.0,2.0


HOG,tea_value,cafe_pvalue,Hordeum_vulgare_total,Zea_mays_total,expansion,Arabidopsis_thaliana_sigDE,Hordeum_vulgare_sigDE,Zea_mays_sigDE,total_sigDE
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
N0.HOG0000047,0.0009032634,0.0,31,8,yes,,10,,10
N0.HOG0000063,0.0022522523,0.0,35,2,yes,,14,,14
N0.HOG0000044,0.0023530762,0.0,41,3,yes,,11,,11
N0.HOG0000049,0.0025067751,0.0,37,4,yes,,9,,9
N0.HOG0000296,0.0033482143,0.01,12,4,yes,,7,,7
N0.HOG0000075,0.0033769063,0.0,31,3,yes,,9,,9
