New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-642: Added return probability distribution over all classes #782
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Philipduerholt, thanks for implementing this really useful feature. I tried out your implementation and it works like expected! See my comments below :)
@@ -587,11 +601,21 @@ def _viterbi_decode(self, feats): | |||
_, idx = torch.max(backscore, 0) | |||
prediction = idx.item() | |||
best_scores.append(softmax[prediction].item()) | |||
scores.append([elem.item() for elem in softmax.flatten()]) | |||
# This has been taken from https://github.com/zalandoresearch/flair/pull/642 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not entirely clear to me what this code actually does. I surrounded it with a simple assertion and it seems that scores remains unchanged.
_scores_before = scores.copy()
swap_best_path, swap_max_score = (
best_path[0],
scores[-1].index(max(scores[-1])),
)
scores[-1][swap_best_path], scores[-1][swap_max_score] = (
scores[-1][swap_max_score],
scores[-1][swap_best_path],
)
assert _scores_before == scores
@mauryaland Since you added this code in your original PR (#642), can you perhaps shed some light on this? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply! This is done when using CRF because sometimes the best path is not related to the max score. However,I'm not sure that my implementation is optimal or always good.
Glad to see this feature added and thanks @Philipduerholt for having enhance and finish the few work I started.
@@ -514,26 +515,28 @@ def _calculate_loss_old(self, features, lengths, tags) -> float: | |||
score /= len(features) | |||
return score | |||
|
|||
def _obtain_labels(self, feature, sentences) -> List[List[Label]]: | |||
def _obtain_labels(self, feature, sentences) -> (List[List[Label]], List[List[List[Label]]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not very transparent what the return values correspond to. Maybe we can add some explanation along the lines of:
"""
Returns a tuple of two lists:
- The first list corresponds to the most likely `Label` per token in each sentence.
- The second list contains a probability distribution over all `Labels` for each token in a sentence for all sentences.
"""
Hi @jantrienes, thank you for your timely feedback. I'm glad everything worked as expected. If there are any other suggestions, please let me know. |
@Philipduerholt thanks for adding this! A lot of people will find this useful. |
👍 |
1 similar comment
👍 |
@Philipduerholt and @mauryaland: thank you for adding this! |
Hi,
this PR is about issue #745. It:
tags_proba_dist
toToken
, which contains a list ofLabel
. The list of labels represents the probability distribution over all possible labels for this token.SequenceTagger._obtain_labels
to not only return the most probable label, but in addition a complete list of all possible labels with their respective scores.The required changes in
SequenceTagger._viterbi_decode
are taken from #642.Disclaimer: This is my first PR to the flair project. I'm happy for any feedback or ideas on how to improve this.