Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About checked$outliers #1

Open
Vanhesling opened this issue Jun 14, 2017 · 1 comment
Open

About checked$outliers #1

Vanhesling opened this issue Jun 14, 2017 · 1 comment

Comments

@Vanhesling
Copy link

Hi!
I would like your help. I have read your paper titled "Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq" which is a excellent work. So I plan to use the MDSeq to detect the gene expression variability. However, I get error messages when I try to detect the outlier. It seems that all the genes were detected as outlier.

Why is that? Could you give me any hints and how to fixed this issue?

> dat.checked <- remove.outlier(dat.normalized[1:1000, ], X=covars, U=covars,
+                               contrast = groups, mc.cores = 4)
4 threads are using!
Total time elapsed:3.9 seconds
> head(dat.checked$outliers)
                  status num.outliers
ENSG00000227232.4      2           NA
ENSG00000238009.2      2           NA
ENSG00000237683.5      2           NA
ENSG00000239906.1      2           NA
ENSG00000241860.2      2           NA
ENSG00000228463.4      2           NA
> table(dat.checked$outliers$status)

   2 
1000 
> 
> covars
          GENDER AGE RACE   BMI SMRIN SMTSISCH            X1            X2            X3            X4
Normal1        1  63    3 29.53   5.9     1017  0.1630506541 -1.095766e-01 -0.0056186635  0.0540478146
Normal2        1  62    3 30.78   8.1      133 -0.0688636685 -1.749981e-02  0.0345506232  0.0869748239
COPD1          2  66    3 25.82   7.6      840  0.1037386933 -9.394710e-02 -0.0855297718  0.0384818582
Normal3        1  58    3 29.83   6.2      825  0.1425086228  6.758439e-02  0.0474234960 -0.0076811603
COPD2          1  58    3 27.02   6.4     1218  0.0328722772  5.871772e-02  0.0569128696 -0.0170814274
COPD3          1  62    3 29.54   5.8      991  0.1076800161  7.706005e-02  0.0719706122 -0.0013471444
COPD4          2  66    3 20.12   7.6      670 -0.0160420656  9.598426e-02 -0.0436259659  0.1806682021
Normal4        2  66    3 29.05   6.9      773  0.0994010357  3.385225e-02 -0.0921304267  0.0165312554
Normal5        2  51    3 29.35   7.2       80 -0.0787204805 -6.110502e-02 -0.0235813528  0.0084930880

Finally, my used sessionInfo()

Best wishes
Kevin

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gtools_3.5.0               edgeR_3.18.1               limma_3.32.2              
 [4] MDSeq_1.0.5                BiocInstaller_1.26.0       devtools_1.13.2           
 [7] org.Hs.eg.db_3.4.0         AnnotationDbi_1.38.1       BiocParallel_1.10.1       
[10] WGCNA_1.51                 fastcluster_1.1.22         dynamicTreeCut_1.63-1     
[13] R.matlab_3.6.1             RColorBrewer_1.1-2         clusterProfiler_3.4.3     
[16] DOSE_3.2.0                 gplots_3.0.1               ggplot2_2.2.1             
[19] sva_3.24.0                 genefilter_1.58.1          mgcv_1.8-17               
[22] nlme_3.1-131               DESeq2_1.16.1              SummarizedExperiment_1.6.3
[25] DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
[28] GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2            
[31] S4Vectors_0.14.3           BiocGenerics_0.22.0  
@zjdaye
Copy link
Owner

zjdaye commented Jun 14, 2017

Hi Kevin,

Thanks for your interest!

It seems that an error is encountered in the process, so the detection stopped and no outlier detection was performed. (A non-zero status indicates an error, and the num.outliers is NA or not available.) Usually, this might be encountered on only a few unusually genes when the data cannot be properly estimated; for example, if all counts were zero. Please see Supplementary materials for more detailed discussions.

I suggest that you check your data first to see if there might be anything peculiar and compare with the vignette. You can also try outlier check on individual genes with check.outlier function (setting verbose=TRUE to get more info). Finally, you can try glm.MD function to see if the counts data were properly fitted.

Please let us know how it goes! If necessary for us to look in more details, please send us an example data and your codes/outputs. (Can send to my email address in the paper.)

Good luck!

Best,
John.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants