Skip to content

theresascharl/RNAseq_clustering

Repository files navigation

This repository contains R code to reproduce the analysis of the paper:

A Clustering Procedure for Three-Way RNA Sequencing Data Using Data Transformations and Matrix-Variate Gaussian Mixture Models

1 Figure1.R

Contains code to reproduce Figure 1:

  • Generate the 2 artificial data sets.
  • Visualise the data in the simplex and R2.

2 fission_data_preprocessing.R

Contains code for Step 1: Pre-processing RNA-seq data:

  • Obtain normalised expression profiles of the genes across time points for a biological unit and experiment.
  • Take averages across biological replicates.
  • Identify differentially expressed genes to reduce the number of observations.

Output:

  • ALR profiles of differentially expressed genes fission_alr_de_flat.RData
  • mean profiles of differentially expressed genes fission_mean_profiles_de_flat.RData
  • ALR profiles of differentially expressed genes array fission_ALR_de_array.RData

3 Figure2.R

Contains code to reproduce Figure 2 of the pre-processing steps of three-way RNAseq data.

4 fission_threeway_clustering.R

  • Fit finite mixture models with matrix-normal components where the parameters are specified as G-VVI-VV and different number of components.
  • In a refinement step a modified three-way clustering solution is obtained for a column-wise covariance matrix with AR1.
  • Select the number of clusters based on BIC and ICL.

Output:

  • fission_ALR_G-VVI-VV_1to20.RData
  • fission_ALR_G-VVI-VAR1_1to20.RData

5 fission_twoway_clustering.R

  • Fit finite mixture models with multivariate normal components where the variance-covariance matrices are specified as VVV and different number of components.
  • Select the number of clusters based on ICL.

Output:

  • fission_Malr_VVV_4.RData

6 fission_cluster_postprocessing.R

Contains code for post-processing of the three-way clustering:

  • dbsi information plot: Figure3.pdf
  • cluster map: Figure4.pdf
  • visualisation of cluster solution: Figure5.pdf

Contains code for post-processing of the two-way clustering:

  • dbsi information plot: Figure6a.pdf
  • cluster map: Figure6b.pdf
  • visualisation of cluster solution: Figure7.pdf

7 simulation_study.R

  • Generate 100 artificial datasets drawn from the three-way cluster solution of the fission data with AR1 column-wise covariance matrix.
  • Perform three-way clustering, the refinement step using AR1, two-way model-based clustering as well as k-means clustering.

Contains code to create

  • visualisation of the performance: Figure8a.pdf
  • visualisation of the number of selected clusters: Figure8b.pdf

About

Workflow for clustering RNAseq data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages