Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect AUC value and CI [bug] #128

Closed
UnixJunkie opened this issue Apr 23, 2024 · 5 comments
Closed

Incorrect AUC value and CI [bug] #128

UnixJunkie opened this issue Apr 23, 2024 · 5 comments

Comments

@UnixJunkie
Copy link

Describe the bug

The reported AUC is x, but the obvious value is 1.0-x.

To Reproduce
Steps to reproduce the behavior:

  1. What packages were loaded? Run sessionInfo() and report the output.

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS: /home/fbr/lib/R/lib/libRblas.so
LAPACK: /home/fbr/lib/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=C LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] pROC_1.18.5

loaded via a namespace (and not attached):
[1] compiler_4.1.3 plyr_1.8.7 tools_4.1.3 Rcpp_1.0.8.3

  1. What command did you run?

library(pROC, quietly=TRUE)

args <- commandArgs(TRUE)

create numerical vectors

scores <- read.table(args[1])
labels <- read.table(args[2])

the roc function requires a data frame

df <- data.frame(scores, labels)
colnames(df) <- c("scores", "labels")
roc_curve <- roc(df, "labels", "scores")
auc(roc_curve)
ci.auc(roc_curve)

  1. What data did you use? Use save(myData, file="data.RData") or save.image("data.RData")

score_labels.txt

  1. What error or output did you get?

The reported AUC is obviously wrong (the correct value is less than 0.5).

Expected behavior

The reported ROC AUC and associated CI must be correct.

Additional context
Add any other context about the problem here.

Apparently, reported AUCs are always >= 0.5.
In reality, the AUCs are in [0:1].

@UnixJunkie
Copy link
Author

cf. over there for more such examples
https://gist.github.com/UnixJunkie/d38de911f7fac31aca48651c8684d1de

@UnixJunkie UnixJunkie changed the title Incorrect AUC value and CI Incorrect AUC value and CI [bug] Apr 23, 2024
@xrobin
Copy link
Owner

xrobin commented Apr 23, 2024

The value of AUC is obviously correct.

With the dataset you provided in score_labels.txt:

> roc_curve <- roc(df, "labels", "scores")
Setting levels: control = 0, case = 1
Setting direction: controls > cases
> auc(roc_curve)
Area under the curve: 0.6789
> ci.auc(roc_curve)
95% CI: 0.6382-0.7196 (DeLong)

If you have prior knowledge of which group has higher values of the predictor, you should change the direction argument to match.

> roc_curve <- roc(df, "labels", "scores", direction = "<")
Setting levels: control = 0, case = 1
> auc(roc_curve)
Area under the curve: 0.3211
> ci.auc(roc_curve)
95% CI: 0.2804-0.3618 (DeLong)

See ?roc for more details.

@xrobin xrobin closed this as completed Apr 23, 2024
@UnixJunkie
Copy link
Author

That's very dangerous this direction parameter.
The default behavior is not right all the time.

@xrobin
Copy link
Owner

xrobin commented Apr 23, 2024

There's no default that will be right all the time - otherwise there would be no need for a parameter in the first place.

This behavior is documented in ?roc, in FAQs and extensively in forums/stackoverflow. Output is created to communicate about the decision transparently.

@UnixJunkie
Copy link
Author

Funnily, the python CROC package never gets the AUC wrong...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants