# Phylogenetic inference

Next, we'll make a couple quick phylogenetic trees of our samples from the ipyrad assembly. We'll use two different programs that have different theoretical underpinnings: IQ-TREE and SVDQuartets. IQ-TREE is a maximum likelhood method for estimating phylogenies on concatentated alignments. Most models in IQ-TREE and similar maximum likelihood and Bayesian phylogeny programs assume that all sites in an alignment share a single underlying tree. This assumption may be broken for many reasons, including incomplete lineage sorting and gene flow. Gene flow is still notoriously hard to detect and adequately model in phylogenetic inference, but many programs now exist that are robust to variation in phylogenetic signal across sites/genes caused by incomplete lineage sorting.

SVDQuartets is one such method. SVDQuartets uses a quartet approach that is statistically consistent with the multi-species coalescent without the computational burden of explicitly modeling the multi-spoecies coalescent process.

A full discussion of the performance of these programs in different scenarios is beyong the scope of this tutorial, but if your research involves phylogenetic inference, we highly recommend that you do some deeper reading.



<br>

## IQ-TREE

IQ-TREE is a very commonly used and very easy to use program for generating maximum likelihood phylogenies. We'll start with this.


The programs that we need for this are already in the container we've been using, but if you need to use them in other contexts you can download them at these links: [PAUP](https://paup.phylosolutions.com/get-paup/) and [IQtree](http://www.iqtree.org/#download)


If you did not run through the previous tutorial or are running this tutorial from a fresh instance, you can download the ipyrad output we provide in the "radseq_cloud" Google bucket. Only uncomment and run these next lines if you want to download the ipyrad assembly.

In [None]:
#! gsutil -m cp -r gs://radseq_cloud/ .
#! mkdir -p ./ipyrad_out/ruber_reduced_denovo_outfiles/
#! cp ./radseq_cloud/ruber-ipyrad-out/* ./ipyrad_out/ruber_reduced_denovo_outfiles/

It's very easy to run, we mostly just need to point iqtree to the input file, which we'll set as the phylip-formatted output from ipyrad, the `.phy` file. 


We'll set up our input and output paths as variables so that these can easily be changed and we shouldn't need to change much in the actual program call for different datasets, just these variables.

`INFILE` will be the name and path to the input file
`OUTFIX` will the be the prefix that gets prepended to each output file.
`outdir` is the directory that we want all output to go into.


 Options that we'll set in the program call include:

`-s $INFILE` sets the input sequence file.

`-m MFP` which instead of specifying a model of evolution, tells IQTree to use ModelFinderPlus to find the best model of sequence evolution.

`-T auto` tells IQTree to automatically determine the best number of threads to use, within some maximum we specify based on what we've allocated.

`--prefix $OUTFIX` sets the prefix for our output to what we define in out `OUTFIX` bash variable.

`-B 1000` tells IQTree to use 1000 rapid bootstraps for assessing support.

`-alrt 1000` uses 1000 bootstrap replicates for SH-aLRT calculation (a likelihood-based metric of branch support).

`-ntmax 12` sets the maximum number of threads to use, this should not exceeed the number of cores in your instance.


In [None]:
import os

# set up the input file, outfile prefix, and output directory
os.environ["INFILE"] = "/home/jupyter/RADseq_cloud_learn/ipyrad_out/ruber_reduced_denovo_outfiles/ruber_reduced_denovo.phy"
os.environ["OUTFIX"] = "ruber"
outdir = "/home/jupyter/RADseq_cloud_learn/iqtree_out"

In [None]:
os.makedirs(outdir, exist_ok=True) # create the output directory if it doesn't already exist
os.chdir(outdir)

In [None]:
## Execute iqtree


! iqtree2 -s $INFILE -m MFP -T auto --prefix $OUTFIX -B 1000 -alrt 1000 -nt AUTO -ntmax 16

You may see a bunch of "likelihood is underflown" warnings, these aren't ideal, but the tree we get is reasonable even with these, so we'll ignore them for now.

If IQ-TREE runs sucessfully, you should see something that ends like this:


<img src="images/IQ_end.png" width=40% />

and your output directory should end up with various files, most importantly `ruber.treefile`. We'll visualize the tree you estimated in the R in the next tutorial.



<br>
<br>
<br>


## SVDQuartets


SVDQuartets is a quartet-based method that is designed to work on SNPs to create species trees, but it can also be used with full concatenated alignments to generate trees of indiviuals like we've done with IQTree.

It is somewhat more involved to set up, and we'll again set it up with a bunch of bash variables.

What we'll do is run a single search for the best tree, save it, then run a search that includes bootstrapping and save those trees. Later, in R, we'll plot the bootstraps onto the best tree. Note that if you run a bootstrap analysis and just plot the tree that comes out from that with bootstap values at nodes, the bootstraps will be plotted on a consensus of bootstrap trees, not the tree that has the highest likelihood onyour actual data. I consider this to be highly undesirable.

### Edit the nexus file

You will need to manually edit the nexus file created by ipyrad to create a nexus file that SVDQuartets/PAUP will correctly read in. The character sets specified in the file we got from ipyrad will cause issues, and so we need to delete them. 

Use the editor to make a copy of `ruber_reduced_denovo.nex` as `ruber_reduced_denovoPAUP.nex` by right clicking the former, clicking "Duplicate" then right clicking the new duplicate and selecting "Rename". Open `ruber_reduced_denovoPAUP.nex` and delete the line `BEGIN SETS;` 


<img src="images/start_sets.png" width=30% />


all of the lines that begin `charset` and the `END;` that marks the end of the charsets block

<img src="images/end_sets.png" width=30% />



Note that there are other `end;` statements in the nexus file that you do not want to delete. The end of your Nexus file should look like this after deleting the charsets block:
 
<img src="images/no_charsets.png" width=40% />
 




<br>

Once that is done, you can proceed. We will set this up to use variables to specify the input, output, and some options for SVDQuartets so that in most cases, you should not need to edit anything in the second part of this code block. Note, however, that there are some options that we have defined in the program call, and in some cases you may want to change these.

You can also do this programatically using a sed one liner. Be sure to modify the path depending on if you ran ipyrad or merely copied the inputs in. 

In [None]:
! sed '/BEGIN SETS;/,/^END;/d' /home/jupyter/RADseq_cloud_learn/radseq_cloud/ruber-ipyrad-out/ruber_reduced_denovo.nex > /home/jupyter/RADseq_cloud_learn/radseq_cloud/ruber-ipyrad-out/ruber_reduced_denovoPAUP.nex

In [None]:
%%bash
PAUP="/usr/bin/paup4" # set up PAUP path
OUTDIR="/home/jupyter/RADseq_cloud_learn/svdq_out"


#define  variables for the PAUPblock
filebname="ruber_reduced_denovo" #basename for all produced files
# double check that you have this path
infile="/home/jupyter/RADseq_cloud_learn/radseq_cloud/ruber-ipyrad-out/ruber_reduced_denovoPAUP.nex" #name of input nexus file; can give a path so the input files don't have to be part of the working directory
nthreads=16 #number of threads to use
nreps=200 #number of replicates for bootstrapping



################################################################################################################################################################
################################################################################################################################################################
####    Run based on the parameters set above
################################################################################################################################################################
################################################################################################################################################################


#change working directory to where your output files will go
mkdir -p $OUTDIR
cd $OUTDIR


cat <<EOF > $filebname.paup.txt
Begin paup;
set autoclose=yes warntree=no warnreset=no flock=no;
log start file=$filebname.log ;
execute $infile;
svdQuartets evalQuartets=all showScores=no ambigs=distribute bootstrap=no nthreads=$nthreads;
savetrees file=$filebname.besttree.tre;
svdQuartets evalQuartets=all showScores=no ambigs=distribute bootstrap=standard nreps=$nreps nthreads=$nthreads treefile=$filebname.svdqboots.tre;  
quit; 
end;
EOF

$PAUP $filebname.paup.txt #execute your new paup block file





If SVDQuartets runs sucessfully, your output directory should end up with various files, most importantly `ruber_reduced_denovo.svdqboots.tre` and `ruber_reduced_denovo.besttree.tre`. You should also see a text representation of the tree that was inferred. We'll visualize the tree you estimated in the R in the next tutorial.