## Quick demo on how to use the `A2TEA_finished.RData` output

- currently, the fastest way to execute the pipeline is to build a conda environment with ALL software included 
- using more fine-grained individual yaml files (--use-conda option) for rules/sets of rules leads to  sequential run (takes much longer) in the checkpoints steps for some reason...
- therefore for now: building big environment with -> workflow/envs/complete_environment.yaml
  
e.g.:  
`mamba env create --name A2TEA --file workflow/envs/complete_environment.yaml`  
activate this environment; then run A2TEA with:  
`snakemake --cores xx`

**the final output of A2TEA is found under `tea/A2TEA_finished.RData`**

### 2.) working with A2TEA output for the moment..

#### 2.1 - load libraries - define three classes
-> important for your R instance to understand my new custom data structs

In [6]:
library(DESeq2)
library(tidyverse)
library(ggtree)
library(ggtreeExtra)
library(Biostrings)
library(seqinr)
library(UpSetR)
library(cowplot)
library(ggplotify)

In [3]:
# class for the expanded_OG - containing all different types of data we have on it
setClass("expanded_OG", slots=list(genes="spec_tbl_df",
                                   blast_table="tbl_df",
                                   nrow_table="numeric",
                                   num_genes_HOG="numeric",
                                   num_genes_extend="numeric",
                                   num_genes_complete="numeric",
                                   genes_HOG="tbl_df",
                                   genes_extend_hits="tbl_df",
                                   fasta_files="list",
                                   msa="AAStringSet",
                                   tree="phylo"))

# class for the hypotheses
setClass("hypothesis",
         slots=list(description="character",
                                  number="character",
                                  expanded_in ="character",
                                  compared_to="character",
                                  expanded_OGs="list",
                                  species_tree="phylo"))

# class for extended BLAST hits info
setClass("extended_BLAST_hits",
         slots=list(blast_table="tbl_df",
                    num_genes_HOG="numeric",
                    num_genes_extend="numeric",
                    num_genes_complete="numeric",
                    genes_HOG="tbl_df",
                    genes_extend_hits="tbl_df")
         )

#### 2.2 - read-in the A2TEA output .RData file

In [7]:
load("../tea/A2TEA_finished.RData")

#### 2.3 - currently included: 4 files

- `hypotheses`     
- `HYPOTHESES.a2tea`
- `HOG_DE.a2tea`
- `HOG_level_list`

In [8]:
hypotheses

hypothesis,name,expanded_in,compared_to,Nmin_expanded_in,Nmin_compared_to,min_expansion_factor,expanded_in_all_found,compared_to_all_found
<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
1,Expanded in Arabidopsis compared to Monocots,Arabidopsis_thaliana,Zea_mays;Hordeum_vulgare,1,2,2,YES,YES
2,Expanded in barley compared to maize,Hordeum_vulgare,Zea_mays,1,1,2,NO,NO
3,bla,Zea_mays,Hordeum_vulgare;Arabidopsis_thaliana,1,1,3,YES,YES
4,bla2,Hordeum_vulgare;Zea_mays,Arabidopsis_thaliana,1,1,2,NO,NO
5,bla3,Hordeum_vulgare;Zea_mays,Hordeum_vulgare;Arabidopsis_thaliana,1,1,2,NO,NO


In [9]:
summary(HYPOTHESES.a2tea)
HYPOTHESES.a2tea$hypothesis_1@description
HYPOTHESES.a2tea$hypothesis_1@number
HYPOTHESES.a2tea$hypothesis_1@expanded_in
HYPOTHESES.a2tea$hypothesis_1@compared_to
HYPOTHESES.a2tea$hypothesis_1@species_tree

#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@genes
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@blast_table
## ? HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000308@nrow_table
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@num_genes_HOG
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@num_genes_extend
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@num_genes_complete
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@genes_HOG
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@genes_extend_hits
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@fasta_files
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@msa
#HYPOTHESES.a2tea$hypothesis_1@expanded_OGs$N0.HOG0000205@tree

             Length Class      Mode
hypothesis_1 1      hypothesis S4  
hypothesis_2 1      hypothesis S4  
hypothesis_3 1      hypothesis S4  
hypothesis_4 1      hypothesis S4  
hypothesis_5 1      hypothesis S4  


Phylogenetic tree with 3 tips and 2 internal nodes.

Tip labels:
  Arabidopsis_thaliana, Hordeum_vulgare, Zea_mays
Node labels:
  N0, N1

Rooted; includes branch lengths.

In [10]:
str(HOG_DE.a2tea)
head(HOG_DE.a2tea)

'data.frame':	16563 obs. of  10 variables:
 $ species       : chr  "Arabidopsis_thaliana" "Arabidopsis_thaliana" "Arabidopsis_thaliana" "Arabidopsis_thaliana" ...
 $ gene          : chr  "AT2G01008" "AT2G01021" "AT2G01023" "AT2G01035" ...
 $ baseMean      : num  89.447 6290.815 2.505 0.775 0.775 ...
 $ log2FoldChange: num  1.192 0.57 0.208 3.026 3.026 ...
 $ lfcSE         : num  1.37 1.46 1.04 1.92 1.92 ...
 $ stat          : num  0.868 0.391 0.2 1.574 1.574 ...
 $ pvalue        : num  0.386 0.696 0.842 0.115 0.115 ...
 $ padj          : num  1 1 1 1 1 ...
 $ significant   : chr  "no" "no" "no" "no" ...
 $ HOG           : chr  "singleton" "singleton" "singleton" "N0.HOG0002018" ...


Unnamed: 0_level_0,species,gene,baseMean,log2FoldChange,lfcSE,stat,pvalue,padj,significant,HOG
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
1,Arabidopsis_thaliana,AT2G01008,89.4474018,1.1915159,1.373179,0.8677062,0.3855552,0.9998053,no,singleton
2,Arabidopsis_thaliana,AT2G01021,6290.8151248,0.5698162,1.458011,0.3908175,0.6959322,0.9998053,no,singleton
3,Arabidopsis_thaliana,AT2G01023,2.5054909,0.2079018,1.039922,0.1999206,0.8415427,0.9998053,no,singleton
4,Arabidopsis_thaliana,AT2G01035,0.7753431,3.0264544,1.922366,1.5743383,0.1154093,0.9998053,no,N0.HOG0002018
5,Arabidopsis_thaliana,AT2G01045,0.7753431,3.0264544,1.922366,1.5743383,0.1154093,0.9998053,no,N0.HOG0002018
6,Arabidopsis_thaliana,AT2G01050,0.0,,,,,,no,N0.HOG0001718


In [11]:
str(HOG_level_list)
head(HOG_level_list$hypothesis_1)
head(HOG_level_list$hypothesis_2)

List of 5
 $ hypothesis_1: tibble [3,439 × 10] (S3: tbl_df/tbl/data.frame)
  ..$ HOG                       : chr [1:3439] "N0.HOG0000012" "N0.HOG0000025" "N0.HOG0000031" "N0.HOG0000033" ...
  ..$ tea_value                 : num [1:3439] Inf Inf Inf Inf Inf ...
  ..$ cafe_pvalue               : num [1:3439] 0.449 0 0.236 0.832 0.079 0.004 0.161 0.832 0.019 0.04 ...
  ..$ Arabidopsis_thaliana_total: num [1:3439] 3 13 4 2 8 10 5 2 8 7 ...
  ..$ Zea_mays_total            : num [1:3439] 1 1 1 1 2 1 2 1 2 1 ...
  ..$ Hordeum_vulgare_total     : num [1:3439] 1 1 1 1 3 2 1 1 1 3 ...
  ..$ expansion                 : chr [1:3439] "yes" "yes" "yes" "yes" ...
  ..$ Hordeum_vulgare_sigDE     : num [1:3439] NA 1 NA 1 NA NA 1 NA NA NA ...
  ..$ Zea_mays_sigDE            : num [1:3439] NA NA NA NA NA NA NA NA NA NA ...
  ..$ total_sigDE               : num [1:3439] NA 1 NA 1 NA NA 1 NA NA NA ...
 $ hypothesis_2: tibble [3,439 × 9] (S3: tbl_df/tbl/data.frame)
  ..$ HOG                  : chr [1:3439] 

HOG,tea_value,cafe_pvalue,Arabidopsis_thaliana_total,Zea_mays_total,Hordeum_vulgare_total,expansion,Hordeum_vulgare_sigDE,Zea_mays_sigDE,total_sigDE
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>
N0.HOG0000012,inf,0.449,3,1,1,yes,,,
N0.HOG0000025,inf,0.0,13,1,1,yes,1.0,,1.0
N0.HOG0000031,inf,0.236,4,1,1,yes,,,
N0.HOG0000033,inf,0.832,2,1,1,yes,1.0,,1.0
N0.HOG0000039,inf,0.079,8,2,3,yes,,,
N0.HOG0000040,inf,0.004,10,1,2,yes,,,


HOG,tea_value,cafe_pvalue,Hordeum_vulgare_total,Zea_mays_total,expansion,Hordeum_vulgare_sigDE,Zea_mays_sigDE,total_sigDE
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>
N0.HOG0000018,0.007619048,0.0,16,5,yes,4,,4
N0.HOG0000068,0.037037037,0.128,6,3,yes,2,,2
N0.HOG0000085,0.042857143,0.004,6,1,yes,4,,4
N0.HOG0000145,0.05952381,0.067,5,2,yes,2,,2
N0.HOG0000158,0.05952381,0.067,5,2,yes,2,,2
N0.HOG0000104,0.0625,0.032,6,2,yes,2,,2


In [22]:
head(HOG_level_list$hypothesis_1)

HOG,tea_value,cafe_pvalue,Arabidopsis_thaliana_total,Zea_mays_total,Hordeum_vulgare_total,expansion,Hordeum_vulgare_sigDE,Zea_mays_sigDE,total_sigDE
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>
N0.HOG0000012,inf,0.435,3,1,1,yes,,,
N0.HOG0000025,inf,0.0,13,1,1,yes,1.0,,1.0
N0.HOG0000031,inf,0.242,4,1,1,yes,,,
N0.HOG0000033,inf,0.863,2,1,1,yes,1.0,,1.0
N0.HOG0000039,inf,0.07,8,2,3,yes,,,
N0.HOG0000040,inf,0.004,10,1,2,yes,,,


In [18]:
HYPOTHESES.a2tea$hypothesis_2@expanded_OGs$N0.HOG0000814@