Clustering cancer evolutionary trees
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
.Rproj.user/6441DF95/sdb updated Aug 12, 2016
LICHeE updated Aug 12, 2016
R function of deciding the number of cluster Mar 1, 2017
data Delete fig4.RData Jan 29, 2017
img updated Aug 12, 2016
man updated Aug 9, 2016
misc misc added Jan 30, 2017
.gitattributes Clustering cancer evolutional trees Jan 16, 2016
.gitignore Clustering cancer evolutional trees Jan 16, 2016
NAMESPACE updated Aug 9, 2016 Update Oct 3, 2018

phyC: Clustering Cancer Evolutionary Trees

The R package for clustering cancer evolutionary trees.

Version: 0.2.0 (2016.08.01)



ape, igraph, ggplot2, grDevices, png, RColorBrewer


Yusuke Matsui



General overview

Multi-regional sequencing provide new opportunities to investigate genetic heterogeneity within- or between- common tumours from a view of evolutionary perspective. The state-of-the-art methods have been proposed for reconstructing cancer sub-clonal evolutionary trees based on multi-regional sequencing data to develop models of cancer evolution. However, the methods developed thus far are not sufficient to characterize and interpret the diversity of cancer sub-clonal evolutionary trees. We propose a clustering method (phyC) for cancer sub-clonal evolutionary trees, in which sub-groups of the trees are identified based on topology and edge length attributes. For interpretation, we also propose a method for evaluating the diversity of trees in the clusters, which provides insight into the acceleration of sub-clonal expansion.

Input/Output and breif description of phyC

phyC (Phylogenetic tree Clustering) is designed for classifying cancer evolutionary trees.

The main inputs of phyC is

  • Variant allele frequency (VAF)

or alternatively

  • Edge matrix and edge length vector from reconstructed sub-clonal evolutionary trees

Table 1. Example of VAF

Normal Region 1 Region 2 Region 3 Region 4 Region 5
gene 1 0.00 0.24 0.18 0.24 0.24 0.22
gene 2 0.00 0.15 0.08 0.12 0.25 0.24
gene 3 0.00 0.19 0.18 0.22 0.26 0.26

Table 2. Example of edge matrix and corresponding edge length vector

Node 1 Node 2
Edge 1 9 8
Edge 2 8 6
Edge 3 6 7
Edge 4 6 4
Edge 5 4 1
Edge 6 1 3
Edge 7 8 10
Edge 8 4 5
Edge 9 1 2
Edge 1 Edge 2 Edge 3 Edge 4 Edge 5 Edge 6 Edge 7 Edge 8 Edge 9
209.00 0.00 17.00 0.00 8.00 2.00 44.00 44.00 19.00

In case of VAF input, we need reconstruct cancer sub-clonal evolutionary trees using existing methods. We implement two method; maximum parsimony approch and clustering-based rooted-constraint network approach. The former is based on acctran in phangorn package (Klein, et al. 2010) and the latter is based LICHeE (Popic, et al. 2015).

The meaning of each element of reconstructed trees is as follows: the root and its subsequent node represent a normal cell and founder cell, respectively. Sub-clones are described as nodes below the founder cell, and edge lengths indicate the number of SSNVs that are newly accumulated in descendant nodes.

phyC peform clustering for a set of the reconstructed evolutionary trees after transforming tree objects via transformation of the tree topologies and edge attributes to allow for effective comparison among trees, a procedure we refer to as tree registration.

Main outpus are

  • Cluster assignments
  • Multidimensional scaling (MDS) configuration of trees in clusters
  • Sub-clonal diversity of trees in clusters

Clustering is based on standard hierarchical clustering algorithm and default algorithm of phyC is Ward's method. For visualization of clustering results for interpretation, we develop the two tools; MDS and sub-clonal diversity plot. In MDS, we approximately embed the registered trees into lower-dimensional Euclidean space. We overlay tree shapes over embedded Euclidean space in the plot. In sub-clonal diversity plot, we apply the concept of a lineage-through-time (LTT) plot to visualize how sub-clones evolve along SSNV accumulation. The LTT plot generally describes the time vs. number of lineages; and in the present case, this is expressed as the number of sub-clones (y-axis) vs. the fraction of accumulated SSNVs (x-axis).

##Usage #####Installation

install_git(url = "")

#####Tree reconstruction from VAF profile The phyC implement a reconstruction method from VAF profiles based on maximum parsimony method implemented as acctran in phangorn package by (Klaus, 2011).


tree <- par.tree(VAF, thr = 0.05) ## maximum parsimony method. 

An example is here.

vaf <- lapply(ccRCC,function(x)x[,-(1:3)]) ## par.tree only needs a matrix of gene * (Normal / Region VAF).
label <- names(vaf)
trees <- vector("list",length(vaf))
for(i in seq_along(vaf)){
  trees[[i]] <- par.tree(vaf[[i]])
edgeList <- lapply(trees,function(x)x$edge) ## obtain edge list
edgeLenList <- lapply(trees,function(x)x$edge.length) ## obtain edge length list

#####Registrating trees and Clustering ######phyC The main function phyC needs the edgeList, edgeLenList and cluster(the number of the cluster) in minimal.

result <- phyC(edgeList,edgeLenList,cluster)

Here is an example. We use the edgeList and edgeLenList from maximum parsimony method here.

result <- phyC(edgeList,edgeLenList,cluster=3,type='h')
result$cluster # Assignment of clusters

######sub-clonal diversity plot To calculate the diversity of each cluster, we use the diversity function. This function requires only the phyC object.

result2 <- diversity(phyC.obj)

Here is an example.

result2 <- diversity(result)

######phyCMD To obtain the configuration of trees in Euclidean space, we use phyCMD function. The input is phyC object.

result3 <- phyCMD(phyC.obj)

Here is an example.

result3 <- phyCMD(result)


  1. Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. (2014) Cancer evolution: mathematical models and computational inference. Syst Biol. 6(1):e-2
  2. Billera LJ, Holmes SP, Vogtmann K. (2001) Geometry of the Space of Phylogenetic Trees. Adv. Appl.Math. 27(4),733-767.
  3. Klaus,P.S. (2011) phangorn: phylogenetic analysis in R. Bioinformatics, 27, 592-593.
  4. Matsui Y, Niida A, Uchi R. Mimori K, Miyano S, and Shimamura T.(2017) Clustering cancer evolutionary trees. PLoS Comput Biol. 13(5):e1005509. doi: 10.1371/journal.pcbi.1005509.
  5. Popic V, Salari R, Hajirasouliha I, Kashef-Haghighi D, West RB, Batzoglou S.(2015) Fast and scalable inference of multi-sample cancer lineages. Genome Biol. 16:91.