ActEV Scoring Software
Date: November 19, 2020
This software package contains a scoring script for the TRECVID
Activities in Extended Video (ActEV) task. The script
ActEV_Scorer.py, is a Python 3.7 script that will validate and score
a system output file adhering to the JSON format defined in the ActEV evaluation plan. A collection of unit and integration test cases have also been included, see the setup section for more detail.
ActEV_Scorer.py, when run with the '-h' option, will show the script's usage text. The
example_run.sh contains an example of typical usage (using the provided test data).
This package assumes a Unix-like environment. Included Python files are written for Python 3.7
- Install Python 3.7+,
jqand required dependencies using
- Run the tests (optional, but strongly recommended) using
SCORING_PROTOCOL - Positional argument, from a fixed set of values (e.g. ActEV18_AD). This required argument controls what system output formats are valid, and what metrics are computed. A description of each supported protocol can be found in the Protocols section of this document.
-s SYSTEM_OUTPUT_FILE- Required; path to the system output JSON file to be scored
-r REFERENCE_FILE- Required unless running validation only; path to the reference JSON file to score against
-a ACTIVITY_INDEX- Required; path to activity index JSON file. This file lists what activities the system output will be evaluated on
-f FILE_INDEX- Required; path to file index JSON file. This file lists the files, and temporal ranges within those files, that the system output will be evaluated on ( NOTE currently the temporal ranges specified in FILE_INDEX are not used when considering what portion of the system output and reference to evaluate, and are only used to compute the duration of files. This will be implemented in a future release)
-o OUTPUT_DIR- Required unless running validation only; directory for computed scores and figures
-d- Optional; by default, the script will produce a Detection Error Tradeoff (DET) curve figure for each activity and a combined figure of all activity curves. If the '-d' option is set, no DET curves will be produced by the script
-p SCORING_PARAMETERS_FILE- Optional; path to a scoring parameter JSON file. If provided, overwrites the default parameters used for scoring
-v- Optional; if enabled, the script will be more verbose (i.e. provide some scoring progress information)
-V- Optional; if enabled, the SYSTEM_OUTPUT_FILE will be validated but not scored. REFERENCE_FILE and OUTPUT_DIR parameters are not required if this option is enabled
-F- Optional; if enabled, ignores extraneous "filesProcessed" and ignores system and reference instance localizations for extraneous files. Note that extraneous files in this sense are those not included in the FILE_INDEX
-tDET_Point_Resolution - Optional; if enabled, this will change the number of points used for the det curves to be the input integer value rather than the max
-P PERCENTAGE- Optional; if set, the system output will be pruned, keeping maximum
MAX_FRAMESof instances per activity.
-i- Optional; if set, ignore no score regions.
-n- Optional; if set, define the number of processes to use for alignments and results computation.
-c- Optional; if set, specify the path for the plotting parameters JSON file (see test_17_0 for an example).
-I- Optional; if set, do not ignore activities that are not in the reference activity instances.
ActEV18_AD - Scoring protocol for the ActEV 2018 Activity Detection task, the following measures are computed:
- PMiss at RFA for RFA values of 1, 0.2, 0.15, 0.1, 0.03, and 0.01
- NMIDE: NOTE currently using a no-score collar size of 0 frames, this will likely change in a future release)
- NMIDE: at RFA for RFA values of 1, 0.2, 0.15, 0.1, 0.03, and 0.01
ActEV18_AOD - Scoring protocol for the ActEV 2018 Activity and Object Detection task. This protocol computes both the PMiss at RFA and NMIDE measures reported for the ActEV18_AD protocol, but over an activity instance alignment that also considers object detections. The following additional measure are computed:
- minMODE: The minimum NMODE score for an object detection alignment; reported for each aligned activity instance pair
- PMiss at RFA for RFA values of 0.5, 0.2, 0.1, 0.033 for object detections (these measures are prefixed with "object-" to differentiate them from PMiss at RFA measures on activity detections)
ActEV18_AODT - Scoring protocol for the ActEV 2018 Activity and Object Detection and Tracking task. This protocol computes both PMiss at RFA, NMIDE, and minMODE measures reported for the ActEV18_AD and ActEV18_AOD protocols, but over an activity instance alignment that also considers object detections. The following additional measures are computed:
- MOTE: the Multiple Object Tracking Error for an object detection and tracking alignment.
ActEV19_AD - Scoring protocol for the ActEV 2019 Activity Detection task. The difference between ActEV18_AD and ActEV19_AD is correct instances require at least 1 second of overlap with the reference and the use of Time-based False Alarms (TFA).
ActEV19_AD_V2 - Scoring protocol for Version 2 of the ActEV 2019 Activity Detection task. The difference between ActEV19_AD and ActEV19_AD_V2 is correct instances require above a specified percentage of the reference activity must be overlapped by the system activity.
ActEV_SDL_V1 - Scoring protocol for Version 1 of the ActEV Sequestered Data Leaderboard Activity. This version revises the computation of Time-based False Alarm to include false alarm time during reference instances when the system produces detections in excess of the reference instances.
ActEV_SDL_V2 - Scoring protocol for Version 2 of the ActEV Sequestered Data Leaderboard Activity.
The scoring script writes to several files in the specified OUTPUT_DIR directory (all *.csv files are pipe separated):
scores_by_activity.csv- Scores by activity; computed with respect to the selected scoring protocol
scores_aggregated.csv- Aggregated scores over all activities
scoring_parameters.json- Lists the scoring/protocol parameters used
alignment.csv- Lists each of the matched and unmatched system and reference instances, along with the matching kernel components and similarity scores
pair_metrics.csv- Metrics computed on the matched system/reference instance pairs
figures/DET_<activity>.png- Unless disabled with the '-d' option, the DET curve figure for <activity>
- object_alignment.csv - For the "ActEV18_AOD" and "ActEV18_AODT" protocols, enabled with '-j'; Lists the frame-by-frame object alignments for each pair of aligned activity instances
Jan 31, 2018 - Version 0.0.1
- Initial release
Feb 22, 2018 - Version 0.0.2
- Fixed an issue where system or reference files without instances for an activity listed in the activity index would cause the script to fail
- Fixed an issue with older versions of the matplotlib package, which would cause the script fail when plotting activities with no false alarms
March 16, 2018 - Version 0.1.0
- Renamed the
ActEV18_AD, as it's specifically for the Activity Detection task
- Added scoring protocol for
ActEV18_AOD. This protocol includes an option (-j) to dump out the frame-by-frame object alignments for each pair of aligned activity instances
- Updated schema and code to expect "presenceConf" instead of "decisionScore" for activity detections. The headers of some output files have been updated to reflect this change
- Changed "config.csv" output to be a serialized JSON, named "scoring_parameters.json"
- Added a command line option to accept a scoring parameters JSON file to overwrite the protocol's default parameters for the scoring run
- The script now checks the file index against the "filesProcessed" reported in the system output file for congruence
- Added N-MIDE measure to aggregate scores, which is computed over the entire alignment (regardless of activity) in addition to the already reported N-MIDE macro-averaged over activities (currently reported as "mean-n-mide")
- The N-MIDE computation now ignores pairs where the reference instance has been reduced to a duration of zero due to the size of the no-score collar. The number of ignored pairs are reported as "n-mide_num_rejected"
March 27, 2018 - Version 0.1.1
- Added object detection PMiss@RFA measures for the
- Fixed object congruence calculation for
ActEV18_AOD. Should be calculated as 1 - minMODE, rather than simply minMODE (NOTE this change affects alignment, and as a result your scores may have changed from the previous version)
ActEV18_AOD, updated the default object congruence delta to be 0 instead of 1 to reflect the updated object congruence calculation
- Fixed an issue where DET curve points with PMiss of 1 or 0 weren't being plotted. Note that these points will not be directly visible within the DET curve figures due to the y-axis scaling, but will connect to other points within the view
April 9, 2018 - Version 0.1.2
- Fixed an issue where a reference activity instance spanning the entire duration of the source video would cause the N-MIDE computation to fail. These instances are now ignored for N-MIDE, and are included in the "n-mide_num_rejected" count - Added an optimization to the kernel builder function whereby unnecessary filter computations are skipped
April 23, 2018 - Version 0.2.0
- Now using global range of "presenceConf" scores for detection congruence score component of alignment kernel for both
- The "ActEV18_AOD" protocol can now accept an "objectTypeMap" for each activity in the provided activity index. Reference and system "objectType" strings are passed through the map (if provided) prior to alignment
- Added additional columns to the "object_alignment.csv" output file, which specifies both the original "objectType" strings and re-mapped strings for both reference and system instances
ActEV18_AODprotocol will now ignore any objects provided by the reference or system output with an "objectType" not included in the list of "objectTypes" for a given activity in the activity index file. If the "objectTypes" property is provided as an empty list, or is simply omitted, no such filtering takes place
April 25, 2018 - Version 0.2.1
- Added an option (-V, --validation-only) where the system output file will be validated but not scored. With this option enabled, the reference file (-r), and output directory (-o) parameters are not required
- Fixed an issue where the global range of "presenceConf" scores for object detections was being computed more often than necessary
April 27, 2018 - Version 0.2.2
- Fixed a divide by zero issue when computing MODE for the
ActEV_AODobject detection congruence kernel component. Specifically when there are no reference objects
- Added "temporal_fa" and "temporal_miss" to the pair metrics output
May 16, 2018 - Version 0.3.0
- Renamed existing scoring parameters to be less ambiguous. Added scoring parameters to control error weights for some metrics as well as the target rates of false alarm
- Refactored portions of the alignment and metric computation code to improve performance
May 24, 2018 - Version 0.3.1
- Now reporting a PMiss@RFA of 1.0 instead of None when there are no system reported instances
- For the
ActEV_AODprotocol, now reporting a mean object PMiss@RFA of 1.0 instead of None when there are no aligned activity instances
June 13, 2018 - Version 0.3.2
- Added an option (-F, --ignore-extraneous-files) to ignore "filesProcessed" and reference and system localizations for files not included in the provided FILE_INDEX
- Miscellaneous improvements
June 25, 2018 - Version 0.3.3
- Added NMIDE at RFA measures for both
- Moved DET curve plot legend to the right of the plot
September 24, 2018 - Version 0.3.4
- Added AODT Task
- Added MOTE to output files
- Added integration test 8_0
June 12, 2019
- Added AUC at the various tfa and rfa thresholds
June 13, 2019
- Add nAUC using nAUC@Xtfa = AUC@Xtfa / X
August 30, 2019
- Added git.commit info to scoring parameters
- Fixed tfa calculation
- Added DMRender for graphing
September 5, 2019
- Added -t option for det point resolution scores to use for processing
- Added test 14-0
September 18, 2019
- AUDC now calculated using dm files
- Added tests 15-0, 15-1, 15-2, 15-3
December 16, 2019
- Added tests 11-4, 11-5
May 01, 2020 - Version 0.5.0
- Updated to Python 3.7
- Updated tests
- Ignored output files are now
diffis no longer used during tests. A custom one is used due to the difference of floats precision between Python 2 and 3.
- Ignored output files are now
- Updated README and made it more user-friendly, using MarkDown
installrecipe for Makefile
June 17, 2020 - Version 0.5.1
- Add parallelization
August 19, 2020 - Version 0.5.2
- Add pruning option
September 11, 2020 - Version 0.5.3
- Added metric computations for email@example.comTFA
September 21, 2020 - Version 0.5.4
October 16, 2020 - Version 0.5.5
--no-ppfoption as it is now the default behavior.
November 9, 2020 - Version 0.5.6
--include-zero-ref-instancesoption for legacy purposes.
November 16, 2020 - Version 0.5.7
- Add checks on
Please send any issues, questions, or comments to firstname.lastname@example.org
- David Joy
- Andrew Delgado
- Baptiste Chocot