# Example of running the Imputation Script

We show an example of running the imputation workflow presented in our paper. The data used here uses the measured metabolites from the NEO cohort subset. The data includes the following simulated variables:
    - age (simage)
    - sex (simsex)
    - body mass index (simbmi)
The metabolites are grouped to 3 categories based on their sources:
    - Endogenous
    - Unannotated
    - Xenobiotic 
This information are in the MetaboliteGroups.rds file

Our script requires the following:
    - R version 3.6.3 (2020-02-29)
    - mice_3.8.0
    - docstring_1.0.0
    - VIM_5.1.1
    - dplyr_0.8.5


Load the imputation script

In [5]:
source ('UnMetImp.r')

Load the example dataset

In [10]:
NEOexample <- readRDS(file = 'ExampleNeoData.rds')

Load the metabolites group information

In [7]:
MG <- readRDS('MetaboliteGroups.rds')

In [11]:
head(NEOexample)

simage,simsex,simbmi,MB_38768,MB_38296,MB_63436,MB_62533,MB_63380,MB_57814,MB_43264,...,MB_62715,MB_62716,MB_62717,MB_62719,MB_62729,MB_62749,MB_62877,MB_62937,MB_62963,MB_62967
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,...,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
54.83518,1,23.45444,38777636,2647052,1068098,,88554.63,1866252,861090.1,...,4969569,1060881.4,1120531,730705.4,,,166978.9,509234.1,1007105.8,
45.52874,1,27.44471,65388204,3950235,1752222,,,4044048,1481673.9,...,4146539,1526446.2,1161347,1099129.0,,119995.2,317710.5,149103.9,1707608.5,65312.05
62.95567,2,26.56372,60871776,5670276,1165632,,636072.62,1635275,8675656.0,...,4153697,1807133.6,1393197,1904349.6,,108866.1,308771.2,,998462.6,
45.341,2,32.08951,31664244,2742438,1426454,147693.6,233949.89,1940137,729727.4,...,4733615,627964.5,1319957,2533104.8,384640.2,250226.2,,,1120093.2,
56.2628,1,20.20521,68640904,4087804,1065901,,,1899449,415342.0,...,4116790,3823185.8,2222088,1765556.5,,,242860.0,,972111.2,167862.17
60.0677,2,27.4515,34128596,2455512,969471,,208226.83,2305378,1062821.0,...,4709995,6318873.0,1742431,902106.1,,,235400.0,,1198085.1,


## Run the imputation workflow using the MICE-pmm method

In [19]:
MiceImp <- UnMetImp(DataFrame = NEOexample , group1 = c(MG$endoids , MG$unannotated) , group2 = MG$xeno ,
                 outcome = 'simbmi' , covars = c('simage' , 'simsex'))

* The imputed datasets are returned as **mids** class object. Please see the *mice* package for details (https://cran.r-project.org/web/packages/mice/)

In [20]:
head(complete(MiceImp$mids , 1))

simbmi,MB_52614,MB_63251,MB_34397,MB_52450,MB_46115,MB_61827,MB_32457,MB_1558,MB_43582,...,MB_62106,MB_62484,MB_52913,MB_39787,MB_22196,MB_53240,MB_61848,MB_42593,simage,simsex
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,...,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
23.45444,819063.9,745697.1,3247325,408950.3,716964.3,379035.8,2866876,650750.4,646646.1,...,0,0,0,0,0,0,0,0,54.83518,1
27.44471,1137114.9,575323.9,2688095,520289.6,704031.4,721442.2,4631964,996591.5,559277.8,...,0,0,0,0,0,0,0,0,45.52874,1
26.56372,1360086.8,404075.6,3119761,643582.4,536390.6,678225.6,4806823,683196.1,495206.5,...,0,0,0,0,0,0,0,0,62.95567,2
32.08951,697560.4,551867.8,3265406,639703.7,566401.3,523526.7,3367916,832350.0,749931.2,...,0,0,0,0,0,0,0,0,45.341,2
20.20521,1695474.0,390705.9,1618906,1031283.1,518932.5,718904.8,2863784,654769.1,570709.7,...,0,0,0,0,0,0,0,0,56.2628,1
27.4515,1116891.7,1061266.9,1644631,763573.8,1509880.0,445548.0,2994279,905334.9,659497.2,...,0,0,0,0,0,0,0,0,60.0677,2


## Run the imputation workflow using the kNN-obs-sel method

In [22]:
knnImp <- UnMetImp(DataFrame = NEOexample , imp_type = 'knn' , group1 = c(MG$endoids , MG$unannotated) , group2 = MG$xeno )

Loading required package: VIM

Loading required package: colorspace

Loading required package: grid

Loading required package: data.table


Attaching package: 'data.table'


The following objects are masked from 'package:dplyr':

    between, first, last


Registered S3 methods overwritten by 'car':
  method                          from
  influence.merMod                lme4
  cooks.distance.influence.merMod lme4
  dfbeta.influence.merMod         lme4
  dfbetas.influence.merMod        lme4

VIM is ready to use. 
 Since version 4.0.0 the GUI is in its own package VIMGUI.

          Please use the package to use the new (and old) GUI.


Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues


Attaching package: 'VIM'


The following object is masked from 'package:datasets':

    sleep




* Note: the output for knn is *NOT* an object of class **mids**. It is a **dataframe**

In [25]:
head(knnImp$mids)

simage,simsex,simbmi,MB_38768,MB_38296,MB_63436,MB_62533,MB_63380,MB_57814,MB_43264,...,MB_62715,MB_62716,MB_62717,MB_62719,MB_62729,MB_62749,MB_62877,MB_62937,MB_62963,MB_62967
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,...,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
54.83518,1,23.45444,-0.01476515,-0.13375319,1068098,0.0,88554.63,-0.6234432,861090.1,...,4969569,1060881.4,-1.1779071,-1.1793813,183176.7,65331.42,166978.9,509234.1,-0.5191385,123567.14
45.52874,1,27.44471,1.27019775,0.96863033,1752222,0.0,0.0,2.3706281,1481673.9,...,4146539,1526446.2,-1.1106328,-0.3236051,178178.9,119995.18,317710.5,149103.9,0.7207262,65312.05
62.95567,2,26.56372,1.09418241,1.96398991,1165632,0.0,636072.62,-1.1349814,8675656.0,...,4153697,1807133.6,-0.768375,0.8284803,144082.9,108866.07,308771.2,172169.0,-0.5393778,89559.49
45.341,2,32.08951,-0.5131495,-0.03627073,1426454,147693.6,233949.89,-0.4731163,729727.4,...,4733615,627964.5,-0.8699163,1.4265197,384640.2,250226.19,215577.1,210086.6,-0.2694545,103975.33
56.2628,1,20.20521,1.3895872,1.06289743,1065900,0.0,0.0,-0.5551766,415342.0,...,4116790,3823185.8,0.1094487,0.6698557,164050.7,76816.16,242860.0,108879.0,-0.6021832,167862.17
60.0677,2,27.4515,-0.32883346,-0.340587,969471,0.0,208226.83,0.1947041,1062821.0,...,4709994,6318873.0,-0.3477831,-0.7376791,185744.8,85859.62,235400.0,272947.2,-0.1113931,296998.86
