RF-clustering

Step 1: Create synthetic label

Addcl1: ‘synthetic-labeled’ data are added by randomly sampling from the product of empirical marginal distributions of the variables. The Addcl1 RF dissimilarity weighs the contribution of each variable on the dissimilarity according to how dependent it is on the other variables.
Addcl2: ‘synthetic-labeled’ data are added by randomly sampling from the hyper rectangle that contains observed data. (Suitable when MDS plot leads to two distinct point clouds corresponding to the values of the binary data.)

Step 2: Perform Random Forest Predictor

Number of forests: 100
Number of trees: 4000
The idea is to use the similarity matrix generated from a RF predictor that distinguishes observed from “synthetic” data. The RF dissimilarity easily deals with a large number of variables due to its intrinsic variable selection.
Here we use Addcl1 proximity to calculate by 𝑑𝑖𝑠𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦=√(1−𝑝𝑟𝑜𝑥𝑖𝑚𝑖𝑡𝑦) (Proximity measures among the input based on the frequency that pairs of data points are in the same terminal nodes)

Step 3: Approximating the RF dissimilarity

When dealing with quantitative variables, one can sometimes find a Euclidean distance-based approximation of the Addcl1 RF dissimilarity if each variable is equally important for distinguishing observed from synthetic observations.
RF dissimilarity depends only on variable ranks since the underlying tree node splitting criterion (Gini index) considers only variable ranks
Using the resulting variables in a Euclidean distance

Classical MDS for Addcl1 dissimilarity

The RF dissimilarity can be used as input of MDS, which yields a set of points in an Euclidean space such that the Euclidean distances between these points are approximately equal to the dissimilarities.
Multidimensional scaling (MDS) algorithms start with a matrix of item-item distances and then assign coordinates for each item in a low-dimensional space to represent the distances graphically. Unlike other ordination methods, MDS makes few assumptions about the nature of the data. For example, principal components analysis assumes linear relationships and reciprocal averaging assumes modal relationships. MDS makes neither of these assumptions, so is well suited for a wide variety of data.
Usually, we use classical MDS for the Addcl1 dissimilarity to avoid the detection of spurious patterns. There is empirical evidence that the Addcl1 RF dissimilarity can be superior to standard distance distance measures in several applications.

Next steps

Reduce Spearman rank correlation threshold in Step 1
Examine imaging features of clusters

Imaging features used in analysis
Imaging features not used in analysis

Compare clusters based on their

Clinical characteristics
RNA sequencing data (when available)

[Reference] Tao Shi & Steve Horvath (2006) Unsupervised Learning With Random Forest Predictors, Journal of Computational and Graphical Statistics, 15:1, 118-138, DOI: 10.1198/106186006X94072 http://dx.doi.org/10.1198/106186006X94072

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
AdjRand.R		AdjRand.R
Analysis.R		Analysis.R
PAM.R		PAM.R
README.md		README.md
RFDist.R		RFDist.R
analysis.R		analysis.R
cluster1.R		cluster1.R
comparison.R		comparison.R
hierarchy.R		hierarchy.R
imagingclustering.R		imagingclustering.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

AdjRand.R

AdjRand.R

Analysis.R

Analysis.R

PAM.R

PAM.R

README.md

README.md

RFDist.R

RFDist.R

analysis.R

analysis.R

cluster1.R

cluster1.R

comparison.R

comparison.R

hierarchy.R

hierarchy.R

imagingclustering.R

imagingclustering.R

Repository files navigation

RF-clustering

Step 1: Create synthetic label

Step 2: Perform Random Forest Predictor

Step 3: Approximating the RF dissimilarity

Classical MDS for Addcl1 dissimilarity

Next steps

About

Releases

Packages

Languages

yuepaang/Random-Forest-Clustering

Folders and files

Latest commit

History

Repository files navigation

RF-clustering

Step 1: Create synthetic label

Step 2: Perform Random Forest Predictor

Step 3: Approximating the RF dissimilarity

Classical MDS for Addcl1 dissimilarity

Next steps

About

Resources

Stars

Watchers

Forks

Languages