Open
Description
Hi,
In most papers, we only see mAP, which is a good aggregate metric for the comparison over the horizon, but it fails to provide further insights. The inclusion of F1 in the leaderboard is a great step! If we go one step further, between precision and recall, specific applications may prefer one over the other. If we are searching for an important object from satellite imagery, we'd mostly not mind losing the precision to gain high recall. For example, searching for objects in the vicinity of the national borders which are of national security concern.
If we include Precision/Recall in the leaderboard, we can potentially answer the following and similar questions:
- For the same recall, which model provides better precision?
- Which model provides the highest recall at a specified precision threshold? For example, we may like to bind the precision to be > 0.5.