Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precision-recall curves #18

Closed
topepo opened this issue Jun 8, 2017 · 6 comments
Closed

Precision-recall curves #18

topepo opened this issue Jun 8, 2017 · 6 comments

Comments

@topepo
Copy link

topepo commented Jun 8, 2017

Any chance that you might add these?

@xrobin
Copy link
Owner

xrobin commented Jun 9, 2017

I believe that for the coords function it would be pretty easy. I just need to add precision and recall to the list of return coordinates. CI follows automatically, and plot easily.

I assume that you're more interested in the AUC. At the moment the code is very ROC-centric, and requires the curve to increase monotonically. That's not the case for PR, so a new method would have to be written. Once we have that it should be possible to bootstrap and calculate variance, p values etc. I have been working on cleaning up the current mess in my bootstrapping code, but it's still very convoluted (as I need to keep the parameters used to build the ROC curve such as partial AUC, smoothing, direction etc, and handle stratified and non stratified sampling).

Smoothing is out of the picture and I'd have no idea how to integrate that.

I guess that makes it a pretty big rewrite. Except for §1 it's unlikely I'll have time to do it in the foreseeable future unless someone else steps in.

@topepo
Copy link
Author

topepo commented Jun 9, 2017

Everything is simple for the person not doing it =]

I guess that makes it a pretty big rewrite. Except for §1 it's unlikely I'll have time to do it in the foreseeable future unless someone else steps in.

Thanks for the assessment.

@topepo topepo closed this as completed Jun 9, 2017
@sebastienwood
Copy link

Hi,
sorry for unearthening this thread, I would be pretty interested if the coords function would work with PR curve as well. Use case would be to simply call
pROC::coords(roc(something), "bestPR", "threshold")
Is there any chance it would be considered for the package ? Thanks ! :)

@xrobin
Copy link
Owner

xrobin commented May 11, 2019

Precision and recall was implemented in version 1.10.0. I don't know why it wasn't mentioned here.

co <- coords(rocobj, "all", ret = c("recall", "precision"))
plot(precision ~ recall, t(co), type="l")

Now regarding the "bestPR" bit, I don't know what that would mean, and how "best" is defined on a PR curve, or if it even is defined at all. PR curves aren't very intuitive to say the least, and many things that "work" for ROC just make no sense in PR. Do you have any reliable reference for this? If so I will re-open the issue, or even please feel free to open a new one.

@sebastienwood
Copy link

Thanks for the input :) I would've thought that the approach would be the same that in ROC curve analysis : drawing a line from the "ideal" corner and gradually decreasing till finding a tangent point in the PR curve (EER if I'm not mistaken). This link offers a few options and they seem legit from what I understand : https://stats.stackexchange.com/questions/7718/how-to-choose-a-good-operation-point-from-precision-recall-curves
They state another interesting approach which is a cost function that would be user defined, but I don't know if it would be easy to implement.

@xrobin
Copy link
Owner

xrobin commented May 12, 2019

There are a lot of legitimate things that can be done. The real question is which one(s) are worth being implemented.

Equal Error Rate or EER is not something that pROC can do with ROC curves at the moment. It's tricky to calculate, as one needs to interpolate both sensitivity and specificity together. It may likely not correspond to a threshold. I am not aware that it's ever used in practice. However if interest is there it can be done. For PR curves I don't know if the equation has a single solution, but again this can be worked out.

You can already specify the prevalence and relative cost for mis-classifications, with a formula given by Perkins and Schisterman.

Please feel free to open a new feature request with the specific feature you'd like to see implemented. It would also help if you can provide some evidence that it's been used in published research, and an algorithm to calculate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants