# A CyTOF dataset example using LDA

Using the **AML** dataset

Set the working directory to the 'AML' dataset folder

In [1]:
setwd('AML')

Load the AML dataset

In [2]:
AML.data <-read.csv('AML_benchmark.csv',header = TRUE)
head(AML.data)

Time,Cell_length,DNA1,DNA2,CD45RA,CD133,CD19,CD22,CD11b,CD4,...,CD117,CD49d2,HLA.DR,CD64,CD41,Viability,file_number,event_number,cell_type,subject
2693,22,201.7833,253.0166,0.81704921,-0.1479468,-0.033481941,0.3321835,-0.04592244,1.85833371,...,0.26537463,4.8048577,12.7340918,-0.02687777,-0.009804348,3.4741678,94,307,Basophils,H1
3736,35,191.8286,308.8691,3.80138493,-0.1914464,-0.08327385,0.3723878,4.494378567,-0.1771584,...,0.44890141,0.9955558,2.5581648,0.7266016,4.905976295,2.9566925,94,545,Basophils,H1
7015,32,116.1119,200.8392,3.20443869,-0.1611056,0.369612783,-0.2149521,-0.009404267,-0.04390361,...,0.23119387,33.0254593,8.5743637,-0.05480448,-0.052066747,3.4432089,94,1726,Basophils,H1
7099,29,176.2485,313.0225,2.23738217,-0.1380714,-0.088311136,-0.2204303,4.006597996,-0.09533478,...,0.33259615,8.8794279,0.7049295,-0.06724661,-0.130210981,-0.1326317,94,1766,Basophils,H1
7700,25,133.3328,226.4678,-0.04404699,-0.1515095,0.402548134,2.581769,6.742060184,2.90662718,...,-0.03111706,0.9095623,0.9930771,0.38120484,-0.202496067,1.4354575,94,2031,Basophils,H1
8333,28,132.1282,326.0217,1.15033615,-0.1475202,-0.001792617,-0.149773,1.529571056,-0.17544551,...,-0.21846008,2.01228,0.9860064,0.53348595,-0.023030506,-0.0666722,94,2300,Basophils,H1


Filter out Debris and Singlets

In [3]:
AML.data <- AML.data[AML.data$cell_type != 'NotDebrisSinglets',]
dim(AML.data)

Divide the dataset into training and testing folds, write to csv files

In [4]:
library(caret)
Folds <- createDataPartition(AML.data$cell_type,2)

AML.Train <- AML.data[unlist(Folds[1],use.names = FALSE),]
AML.Test <- AML.data[unlist(Folds[2],use.names = FALSE),]

write.table(AML.Train,file = 'AML_train.csv',col.names = FALSE,row.names = FALSE,sep = ',')
write.table(AML.Test,file = 'AML_test.csv',col.names = FALSE,row.names = FALSE,sep = ',')

Loading required package: lattice
Loading required package: ggplot2
"Some classes have no records ( NotDebrisSinglets ) and these will be ignored"

Manually create new directories 'AML Train' and 'AML Test'.Next, move 'AML_train.csv' and 'AML_test.csv' to the new directories, repectively

Use CyTOF_LDAtrain function to train the classifier using 'AML_train.csv'

Use CyTOF_LDApredict function to obtain predictions for 'AML_test.csv'

In [5]:
source('CyTOF_LDAtrain.R')
source('CyTOF_LDApredict.R')

# cell type labels are in column no. 40 
LDA.Model <- CyTOF_LDAtrain(TrainingSamplesExt = 'AML Train/',TrainingLabelsExt = '',mode = 'CSV',
                            RelevantMarkers =  c(5:36),LabelIndex = 40, Transformation = 'arcsinh')

Predictions <- CyTOF_LDApredict(LDA.Model,TestingSamplesExt = 'AML Test/', mode = 'CSV', RejectionThreshold = 0)

Compare the predicted cell labels with the original labels in **AML.Test**

In [6]:
Predictions <- unlist(Predictions)
Accuracy <- sum(Predictions == AML.Test$cell_type)/length(AML.Test$cell_type) * 100
Accuracy