Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP per-class metrics #1538

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

swsuggs
Copy link
Contributor

@swsuggs swsuggs commented May 27, 2022

For #492

@swsuggs
Copy link
Contributor Author

swsuggs commented May 27, 2022

@davidslater The poisoning scenario adds this metric automatically, right in the code. Is this an acceptable way to make it available to other scenarios through the config/metric/task section?

armory/instrument/config.py Outdated Show resolved Hide resolved
@davidslater
Copy link
Contributor

I like adding it to the config part of instrumentation.

@swsuggs
Copy link
Contributor Author

swsuggs commented May 31, 2022

@davidslater So far I've added (beside per-class accuracy) a confusion matrix, and precision and recall for each class. Two questions: Will you check that I'm computing the right thing (I described my understanding in the comments), and that the output is in a sufficiently useful format (dict, array, etc)? Can you think of any other metrics or statistics that would be nice to have?

Copy link
Contributor

@davidslater davidslater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the metrics to task, as they take in the paired y and y_pred as inputs.

Comment on lines 60 to 62
if y_pred.ndim == 2:
y_pred = np.argmax(y_pred, axis=1)
N = len(np.unique(y))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If y_pred is 2D, you can use that to derive N. (Or at least check to ensure that they match).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding, but N is the number of classes, not the total number of items. Hence length of np.unique(y) and not length of y. I don't think we can assume every class will show up in y_pred. For that matter it seems a little risky to assume they will all be present in y.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If y_pred is 2D, then it outputs either logits or probability distributions over the set of predicted classes, so you can do N = y_pred.shape[1].

If y_pred is 1D, however, that doesn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implicitly assumes that the classes are all integers from 0 to N - 1. However, if y has missing classes, then there will be some misalignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, of course. Is it true that Armory scenarios will always have a 2D y_pred? Or it just depends on how the meter and probes are set up? So far the only source of a 1D y_pred I've encountered is my own unit tests, but I can expand those to 2D and then get N the way you described.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's dependent on the underlying model, unfortunately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well after all, if a class is totally absent from y, its row in the matrix would be all zeros since it was never classified as anything. So maybe what I need to do is make this a dictionary after all, and key it with class labels, so if one is missing, then at least it will be clear what rows are what class. Alternatively, I could add a row of zeros at the index of missing class labels, but this would only be possible for missing labels less than the greatest non-missing label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just have the function assume that y_pred is 2D (and add that to the docstring). Other things can be handled by the user.

total_selected = C[:, class_].sum()
precision = tp / total_selected

# recall: true positives / number of actual items in class_
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per-class recall is the exact same as per-class accuracy, which I didn't realize till now. Is it still useful to have two separate per_class_accuracy and per_class_precision_and_recall functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's keep both.

@davidslater
Copy link
Contributor

See my two recent comments. Beyond that, I think what needs to be done is:
A) remove the changes to armory/instrument/config.py, and armory/utils/config_schema.json.
B) move the new functions from statistical to task, since we're registering them as task populationwise metrics anyhow.
C) Probably easiest to rebase off of develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants