-
Notifications
You must be signed in to change notification settings - Fork 0
xscape
License
vijay120/xscape
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
xscape Tools Ran Libeskind-Hadas, Jessica Yi-Chieh Wu, Mukul Bansal January 2014 See also: www.cs.hmc.edu/~hadas/xscape WHAT'S INCLUDED The xscape programs are found in the bin directory which includes: * costscape.py: Visualize the landscape of optimal solutions * sigscape.py: Visualize the landscape of statistical significance * eventscape.py: Enumerate the events in each region and the events common to multiple regions. * tree2newick: Convert trees in .tree format into .newick format * view_tanglegram: View the tanglegram REQUIREMENTS These tools require Python 2.7.x and the following packages: * matplotlib * BioPython * shapely All of these packages are included in the Enthought Python Canopy suite which provides a full version of Python 2.7.3. https://www.enthought.com/products/epd/free/ USAGE Run from the command line using costscape sigscape eventscape view_tanglegram tree2newick If the path variables on the local machine are not set appropriately, the following will set them... export PATH=$PATH:<xscape path>/bin export PYTHONPATH=$PYTHONPATH:<xscape path>/python ... where <xscape path> denotes the path to the directory where the xscape tools have been installed. INPUT AND OUTPUT Costscape, sigscape, and eventscape prompt the user for the names of input and output files and arguments. INPUT FILE The input file comprises a species/host tree followed by a gene/parasite tree in newick format with internal node names: (LeftTree, RightTree) Root; The node names cannot be numeric (although they can be alpha-numeric). Immediately following the two newick trees is a list of tip associations with one entry per line of the form... g:s ... where p is the name of a gene/parasite tip and s is the name of a species/host tip. The name of the input file should end with .newick Sample files are provided in the examples directory. OUTPUT FILE All three programs will save to output files. Costscape and sigscape will save the plots to .pdf files and eventscape saves to a .csv file that can be opened and manipulated in programs such as Excel. In costscape and sigscape the output file name is optional: They provide another option to display the plots using matplotlibs display facilities. They also print summary data in the terminal window. ARGUMENTS In addition, these programs prompt for the range of transfer and loss costs relative to the normalized unit cost of duplication. Speciation cost is fixed to 0. COSTSCAPE IN DETAIL The program produces a plot in which the x-axis represents the range of loss costs and the y-axis represents the range of transfer costs. The cost space is then divided into color-coded "regions" where each region represents a subset of the cost space where optimal solutions will be the same. Each region is labeled by a "cost vector" of the form <c, d, t, l> representing the number of speciations, duplications, transfers, and losses, respectively, in an optimal solution. Costscape also prints the following information in the terminal window for each region: * The event count vector and the number of distinct solutions in that region * The vertices representing the boundary of the region * The area of the region SIGSCAPE IN DETAIL Sigsscape performs randomization trials to determine the fraction of random trials whose costs are at least as good as those of the original input dataset. Each trial comprisesa permutation of the leaf associations between the two trees. Sigscape then computes an empirical p-value for each combination of costs, indicating the fraction of random trials whose cost is less than or equal to that of the original input data. The cost space is colored green for significance at the 0.01 level, yellow for signficance between 0.01 and 0.05, and red for lack of significance at the 0.05 level. Because the permutation testing can be slow for large trees and large number of trials, sigscape is multithreaded and prompts for the number of cores to allocate to the permutation testing. EVENTSCAPE IN DETAIL Eventscape has two modes, Union and Intersection, and the user is prompted to select one. In Union mode, each region (i.e., event count vector) records every event in every reconciliation in that region, that is, the union of all events in the reconciliations for that region. In Intersection mode, each region records those events that are common to all reconciliations in that region, that is, the intersection of the events taken over the reconciliations in that region. Eventscape's output .csv file contains one line per region, indicating the event count vector for that region, the number of distinct reconciliations, followed by a list of the events for that region (either the union or intersection, depending on the specified mode of operation). Next, eventscape partitions all of these events into those found in all regions down to 1 region. The reported events are as follows: p h eventType ... where p is a node in the parasite tree, h is a node in the host tree, and eventType is the type of event. For example... p5 h4 cospeciation ... means that parasite tree node p5 cospeciates with host node h4. Similarly, p5 h4 duplication ... means that parasite tree node p5 duplicates on the edge terminating at h4. And... p5 h4 loss h7 ... means that the parasite edge terminating at p5 passes through host vertex h4 and continues on the host edge terminating at h7. Finally, p5 h4 switch h9 ... means that the parasite node p5 performs a duplication and switch on the host edge terminating at h4 and one of p5's children switches to the host edge terminating at h9. VIEW_TANGLEGRAM The view_tanglegram program renders the input file (tanglegram). This is particularly useful when interpreting the events that are output by eventscape. Run... view_tanglegram -h ... to see the command line options. For example, a typical usage is: view_tanglegram -n -g outputFile.svg inputFile.newick The -n option displays the names of the internal nodes in the trees (useful for interpreting the eventscape events which refer to these internal nodes) and the -g option saves the file to the specified .svg output file. TREE2NEWICK Jane users may prefer to use the .tree format because Jane saves files in that format (http://www.cs.hmc.edu/~hadas/jane/fileformats.html). Jane also has a GUI editor that allows users to construct trees and save them in .tree format. A program called tree2newick.py is also provided that is run from the command line, prompts the user for the .tree input file name and the .newick output file name, and writes the newick tree to the output file. When using the .tree format, only the HOSTTREE, PARASITETREE, and PHI entries are required. All others are ignored. ASSUMPTIONS The trees are untimed and switches are permitted from an edge h to any other edge h' as long as h' is neither ancestral nor descendant wrt to h. Timing incompatabilities are therefore theoretically possible. ACKNOWLEDGEMENTS AND DATA SOURCES The Heliconius example dataset was taken from: Jennifer Cuthill and Michael Charleston Phylogenetic Codivergence Supports Coevolution of Mimetic Heliconious Butterflies PLoS One 7(5): e36464. doi:10.1371/journal.pone.0036464 The Gopher-Louse dataset was taken from: Hafner MS and Nadler SA Phylogenetic trees support the coevolution of parasites and their hosts Nature 1988, 332:258-259
About
xscape
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published