Introduction

Please cite as:

Tariq Faquih. (2022). tofaquih/imputation_of_untargeted_metabolites: Official Release v1.4 (v1.4). Zenodo. https://doi.org/10.5281/zenodo.6347808

Faquih T, van Smeden M, Luo J, le Cessie S, Kastenmüller G, Krumsiek J, Noordam R, van Heemst D, Rosendaal FR, van Hylckama Vlieg A, Willems van Dijk K, Mook-Kanamori DO. A Workflow for Missing Values Imputation of Untargeted Metabolomics Data. Metabolites. 2020 Nov 26;10(12):486. doi: 10.3390/metabo10120486. PMID: 33256233; PMCID: PMC7761057.

Introduction

A script designed to impute missing values in Metabolon metabolomics datasets. Two imputation methods can be used: MICE and KNN This script is designed to impute missing values in Metabolon HD4 datasets using KNN and MICE imputations. It is technically possible to use it with any metabolomic dataset.

Workflow:

The user provides two lists of metabolites: group1 to be imputed using MICE-pmm or kNN-obs-sel; group2 to be impute with zero.
The user must also provide the variables t be used in the analysis after the imputation including the outcome.
A correlation matrix is created for all the group1 metabolites.
The group1 metabolites are split to complete cases metabolites(ccm) and incomplete cases metabolites(icm).
For each icm 10 ccm with the highest absolute R correlation are selected. These will be used to impute the missing values in icm.
- if the icm has more than 90% missing values OR if the number of non-missing values in less than the number of predictor variables + 20.
- This is done to because of two reasons:
- to eliminate possibly mis-annotated metabolites or unannotated metabolites that are xenobiotic in nature,
- To ensure the availability of enough cases to perform the imputation.
- This issue prevents the MICE package from performing the imputation all together.
- The invalid cases will be imputed to zero.
The imputed results are returned with 3 objects; The imputed data, the summary of the imputation, the mean R of the ccm used for each icm.

How to use:

Import the script source(UnMetImp.R)
Create a vector with the names of the group1 (endogenous and/or unannotated metabolites) and group2 (xenobiotics) metabolites.
Use the UnMetImp function.

Usage
- UnMetImp(DataFrame , imp_type = 'mice' , number_m = 5 , group1 , group2 = NULL , outcome=NULL, covars=NULL, fileoutname = NULL , use_covars = FALSE , logScale = TRUE , covars_only_mode = FALSE , maxN_input = 10)
Arguments
- DataFrame: The full dataframe to be used with all the metabolites, covariables and the outcome. Must numeric. Must be a dataframe.
- imp_type: String. Type of imputation to be used: mice or knn. Default is mice.
- number_m: Numeric. For imp_type == "mice" only. Number of imputations to be used. Default = 5.
- group1: Vector. Required. Vector with the names of metabolite columns. Will be imputed using the provided imp_type.
- group2: Vector. Optional. Vector with the names of metabolite columns. Will be imputed to zero.
- outcome: String. Required. The outcome variable to be used in the future analysis.
- covars: Vector. Recommended. variables used in the future analysis. Will be returned with the imputed data.
- fileoutname: String value. Optional. Saves the imputed output to a file.
- use_covars: Logical. Optional. Whether the covars will be used to impute the missing values in the metabolites. Default = FALSE.
- logScale: Logical. Optional. Whether the values need to be log and scaled for the imputation. if TRUE, the values will be log and scaled then un-log and unscaled before returning the imputed output. If FALSE, script will assume you have log the values. Default = TRUE.
- covars_only_mode: an option to only use the covariables for the imputation, ignoring all other metabolites. Useful in case of collinear/constant variables. only works with mice imputation
- maxN_input: sets the max number of ccm metabolites to be used for the imputation. Default is 10. Is overridden if covars_only_mode == TRUE. Useful in case of collinear/constant variables.

Examples of running the scripts:

Please try out the script using the provided NEO metabolomics data (with simulated charaterists variables) in the Binder docker

mydata: your data table in dataframe class format.

endoids: a user created vector containing the column names of the endogenous metabolites

unknowns: a user created vector containing the column names of the unannotated metabolites

xeono: a user created vector containing the column names of the xenobiotic metabolites

Running default knn:

source('Master_Script.r')

knnimp <- UnMetImp(DataFrame = mydata, group1 = c( endoids , unknowns ) , group2= xeon , covars = c('age', 'sex'), imp_type = 'knn' ,outcome = c('BMI') , logScale = TRUE )

Running default MICE:

source('Master_Script.r')

miceimp <- UnMetImp(DataFrame = mydata, group1 = c( endoids , unknowns ) , group2= xeon , covars = c('age', 'sex'), imp_type = 'mice', number_m = 5, outcome = c('BMI') , use_covars = FALSE , logScale = TRUE)

The mice package stores the output from the imputation step into the object class mids by default. This stores information about the imputation process used and the imputation datasets created. The with() and pool() need the object class mids as input to run the analysis on the datasets, calculate the estimate for each dataset then pools the estimates and standard errors using Rubin’s Rules. To convert the object class mids to a “long” format:

require(‘mice’)


IMP <- miceimp$mids

Longformat =<-complete(IMP , action = 'long' , include = TRUE)

To convert the Longformat back to mids class:

IMP <- as.mids(Longformat)

To run the analysis on the mids class and pool the estimates by Rubin's Rules:

Model_Formula = as.formula('BMI~age+sex+...')

Mysummary <- summary(pool( with(data = IMP, expr = lm( formula = Model_Formula ) )) ,conf.int = TRUE)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
SimulationExamples		SimulationExamples
.Rhistory		.Rhistory
Example.ipynb		Example.ipynb
ExampleNeoData.rds		ExampleNeoData.rds
LICENSE		LICENSE
MetaboliteGroups.rds		MetaboliteGroups.rds
README.md		README.md
Rrequirments.txt		Rrequirments.txt
UnMetImp.r		UnMetImp.r
install.R		install.R
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Workflow:

How to use:

Examples of running the scripts:

Running default knn:

Running default MICE:

About

Releases 5

Packages

Languages

License

tofaquih/imputation_of_untargeted_metabolites

Folders and files

Latest commit

History

Repository files navigation

Introduction

Workflow:

How to use:

Examples of running the scripts:

Running default knn:

Running default MICE:

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages