# RMixtComp Prostate example

## Data

Load the CSV data file as dataframe.

In [None]:
data <- read.table("mixtcomp-example.csv", sep = ";", header = TRUE)
head(data)

## Clustering with RMixtComp

Launch the RMixtComp package.

In [None]:
library(RMixtComp)

Define the distribution used for each variable.

In [None]:
model <- list(Age = "Gaussian", Wt = "Gaussian", PF = "Multinomial", 
              HX = "Multinomial", SBP = "Gaussian", DBP = "Gaussian", 
              EKG = "Multinomial", HG = "Gaussian", SZ = "Gaussian", 
              SG = "Gaussian", AP = "Gaussian", BM = "Multinomial")

Define the SEM algorithm's parameters

In [None]:
algo <- list(nbBurnInIter = 50,
             nbIter = 100,
             nbGibbsBurnInIter = 50,
             nbGibbsIter = 100,
             nInitPerClass = floor(nrow(data)/2),
             nSemTry = 5,
             confidenceLevel = 0.95,
             ratioStableCriterion = 0.99,
             nStableCriterion = 10)

Choose the desired number of classes and the number of runs for each given number of classes.

In [None]:
nClass <- 1:8
nRun <- 3

In [None]:
res <- mixtCompLearn(data, model, algo, nClass = nClass, criterion = "ICL", nRun = nRun, nCore = 1)

## Output's Analysis

### Criterion

Draw the criterion value (BIC and ICL) for each model that was built. The higher the value (close to 0) the better the model. 

In [None]:
plotCrit(res, pkg = "plotly")

### Estimation

See estimation of all the missing Age values.

In [None]:
res$variable$data$Age$completed # imputed

In [None]:
res$variable$data$Age$stat # confidence interval

Idem for other variables.

In [None]:
res$variable$data$BM$completed # imputed

In [None]:
res$variable$data$BM$stat # confidence interval

Choose the number of classes to study in the following.

In [None]:
K <- 3
resK <- extractMixtCompObject(res, K)

### Variables

Draw the discriminating level of each variable. A high value (close to one) means that the variable is highly discriminating. A low value (close to zero) means that the variable is poorly discriminating.

In [None]:
plotDiscrimVar(resK, pkg = "plotly")

Draw the similarity between every pair of variable. A high value (close to one) means that the two variables provide the same information for the clustering task (i.e. similar partitions). A low value (close to zero) means that the two variables provide some different information for the clustering task (i.e. different partitions).

In [None]:
heatmapVar(resK, pkg = "plotly")

Select a variable to draw its distribution.

In [None]:
variable <- "SG"
plotDataBoxplot(resK, variable, grl = TRUE, pkg = "plotly")

### Classes

Draw the proportion of individuals in each class.

In [None]:
plotProportion(resK, pkg = "plotly")

Draw the similarity level between each pair of classes. A high value (close to one) means that the 2 classes are strongly different (i.e. low overlapping). A low value (close to zero) means that the 2 classes are similar for the clustering task (i.e. high overlapping).

In [None]:
heatmapClass(resK, pkg = "plotly")

Draw the discriminating level of each variable for the selected class.

In [None]:
class <- 2
plotDiscrimVar(resK, class = class, pkg = "plotly")

Select a variable to draw its distribution for the selected class.

In [None]:
variable <- "SG"
plotDataBoxplot(resK, variable, class = class, grl = TRUE, pkg = "plotly")

### Probabilities

Draw the probability of assignment to a class for each individual. Individuals have been reordered in decreasing assignment probability. 

In [None]:
heatmapTikSorted(resK, pkg = "plotly")

### Prediction

In [None]:
Patient1=data[1,]
Patient1["Age"]="?"
Patient1["Wt"]="[70:80]"
Patient1["EKG"]="{4,5}"
resPredict <- mixtCompPredict(Patient1,resLearn=resK)

In [None]:
resPredict$variable$data$z_class$stat

In [None]:
resPredict$variable$data$z_class$completed

In [None]:
resPredict$variable$data$Age$completed

In [None]:
resPredict$variable$data$Age$stat

In [None]:
resPredict$variable$data$Wt$completed

In [None]:
resPredict$variable$data$Wt$stat

In [None]:
resPredict$variable$data$EKG$completed

In [None]:
resPredict$variable$data$EKG$stat