Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p.mat matrix not ordered correctly #120

Closed
pohndorff opened this issue Jan 8, 2018 · 4 comments
Closed

p.mat matrix not ordered correctly #120

pohndorff opened this issue Jan 8, 2018 · 4 comments

Comments

@pohndorff
Copy link

pohndorff commented Jan 8, 2018

I encountered an issue when plotting with corrplot while using a p-value matrix to mark correlations with high p-values.

It appears, that the p-matrix is not ordered correctly when corrplot orders the correlation coefficient-matrix.

To reproduce the issue, you can use this code:

library(tidyverse)
library(Hmisc)
library(corrplot)

set_1 <- select_if(mtcars, is.numeric)


cor_set <- rcorr(as.matrix(set_1))
set_cor <- cor(set_1[,-1], method = "pearson", use = "complete.obs")
p_mat <- cor_set$P
corrplot(set_cor, type = "upper", order = "FPC",
         method = "color", tl.pos = "td", tl.cex = 0.5, 
         diag = FALSE, p.mat = p_mat, sig.level = 0.01)

corrplot_pmat

vs seems to have many correlations with p > 0.01. But looking into the data you'll see this round(p_mat, 4) :

        mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb
mpg      NA 0.0000 0.0000 0.0000 0.0000 0.0000 0.0171 0.0000 0.0003 0.0054 0.0011
cyl  0.0000     NA 0.0000 0.0000 0.0000 0.0000 0.0004 0.0000 0.0022 0.0042 0.0019
disp 0.0000 0.0000     NA 0.0000 0.0000 0.0000 0.0131 0.0000 0.0004 0.0010 0.0253
hp   0.0000 0.0000 0.0000     NA 0.0100 0.0000 0.0000 0.0000 0.1798 0.4930 0.0000
drat 0.0000 0.0000 0.0000 0.0100     NA 0.0000 0.6196 0.0117 0.0000 0.0000 0.6212
wt   0.0000 0.0000 0.0000 0.0000 0.0000     NA 0.3389 0.0010 0.0000 0.0005 0.0146
qsec 0.0171 0.0004 0.0131 0.0000 0.6196 0.3389     NA 0.0000 0.2057 0.2425 0.0000
vs   0.0000 0.0000 0.0000 0.0000 0.0117 0.0010 0.0000     NA 0.3570 0.2579 0.0007
am   0.0003 0.0022 0.0004 0.1798 0.0000 0.0000 0.2057 0.3570     NA 0.0000 0.7545
gear 0.0054 0.0042 0.0010 0.4930 0.0000 0.0005 0.2425 0.2579 0.0000     NA 0.1290
carb 0.0011 0.0019 0.0253 0.0000 0.6212 0.0146 0.0000 0.0007 0.7545 0.1290     NA
@pohndorff
Copy link
Author

You get even weirder results, when using order = "original":

corrplot(set_cor, type = "upper", order = "original",
         method = "color", tl.pos = "td", tl.cex = 0.5, 
         diag = FALSE, p.mat = p_mat, sig.level = 0.01)

corrplot_bug_original

@vsimko vsimko added the bug label Jan 11, 2019
@N1h1l1sT
Copy link

What I've witnessed is that it will put an "X" where the p.value is higher than the significance level, but it will also put X's elsewhere.
In fact, it seems to me like it plots double the X's.
For instance, on a 6x6 p_mat matrix with only 2 values > significance level, I get 4 X's on the upper triangle (and their mirror of course on the lower triangle)

This is probably the most serious bug in the package, as it actually produces a false result, which means that we can't trust it to draw conclusions, thus rendering the whole package unusable.

It's very unfortunate that it's been almost 2 years since this bug has been posted, and since any updates were made to the package, so I'm guessing corrplot is pretty much dead.
It's the most complete correlation visualisation package I've seen, and it doesn't really need much more active development, but this specific bug is a very big deal, preventing me (and I'm guessing many others) from actually using it.
Or even worse, there are people that do use it, not having noticed the false results and therefore drawing false conclusions.

@taiyun taiyun removed the bug label May 12, 2021
@taiyun
Copy link
Owner

taiyun commented May 12, 2021

@pohndorff

The corr and pmat matrix you input are not paired!

> colnames(set_cor)
 [1] "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"
> colnames(p_mat)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

You SHOULD make them paired first.

ind=c('cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs',  'am', 'gear', 'carb')
corr = set_cor[ind, ind]
p = p_mat[ind, ind]

corrplot(corr, type = "upper", 
         method = "color", tl.pos = "td", tl.cex = 0.5, 
         diag = FALSE, p.mat = p, sig.level = 0.05)

@vsimko @N1h1l1sT

taiyun added a commit that referenced this issue May 12, 2021
@taiyun
Copy link
Owner

taiyun commented May 12, 2021

The latest version on Github give a warning message when p.mat and corr's rownames, colnames are not same.

> set_1 <- select_if(mtcars, is.numeric)
> 
> 
> cor_set <- rcorr(as.matrix(set_1))
> set_cor <- cor(set_1[,-1], method = "pearson", use = "complete.obs")
> p_mat <- cor_set$P
> corrplot(set_cor, type = "upper", order = "FPC",
+          method = "color", tl.pos = "td", tl.cex = 0.5, 
+          diag = FALSE, p.mat = p_mat, sig.level = 0.01) 
Warning message:
In corrplot(set_cor, type = "upper", order = "FPC", method = "color",  :
  p.mat and corr may be not paired, their rownames and colnames are not totally same!

@taiyun taiyun closed this as completed May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants