# Mask Scorer

## Description

This script calculates performance scores that measure the accuracy of a system output mask to a reference mask. The script generates CSV report tables, one containing scores for each mask and another containing an average of the scores in the first CSV. Given the option, the script will generate a detailed HTML index file for the mask region performance results.

In the following metrics, the following terminology is used:
 * $GT$ refers to the ground truth mask
 * $sys$ refers to the system output mask
 * $TP$ refers to True Positives computed between the ground truth mask and the system output
 * $TN$ refers to True Negatives computed between the ground truth mask and the system output
 * $FN$ refers to False Negatives computed between the ground truth mask and the system output
 * $FP$ refers to False Positives computed between the ground truth mask and the system output
 * $weights$ is a matrix of 1's and 0's that denotes the set of pixels scored by the system; it is generated by a difference in the erosion and dilation of the manipulated area of $GT$.

The following metrics are used to score each mask:

### Nimble Mask Metric (NMM)
\begin{equation*}
NMM(GT,sys,weights,c)=\max{\left(\frac{TP - FN - FP}{\Sigma_{px\in GT}weights(px)},c\right)}
\end{equation*}

$\Sigma_{px \in GT}$ refers to the sum over the pixels in the ground truth that are marked black. $c$ denotes a minimum cutoff value for the scoring to have any meaning; by default, $c=-1$.

### Matthews Correlation Coefficient (MCC)
\begin{equation*}
MCC(GT,sys) = \frac{TP*TN - FP*FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}
\end{equation*}

An MCC of 1 denotes perfect correlation, an MCC of 0 denotes no correlation at all, and an MCC of -1 denotes perfect anti-correlation.

### Weighted L1 Loss (BWL1 and GWL1)
\begin{equation*}
WL1(GT,sys,weights)=\frac{(FP+FN)_{weights > 0}}{\Sigma weights(px)}
\end{equation*}

A Weighted L1 of 0 denotes perfect or near perfect match up to variation within the weights that are 0; 1 denotes perfect mismatch. $(FP+FN)_{weights > 0}$ refers to the total number of $FP$ and $FN$ pixels where weights are greater than 0.

The Weighted L1 is applied separately to the original grayscale system output and the binarized mask, producing the grayscale Weighted L1 (GWL1) and binarized Weighted L1 (BWL1) metrics respectively. In the case of the original grayscale, the value is summed over the weighted difference in pixel intensity.

## Command-line Options

The command-line options for the mask scorer can be categorized as follows:

### Task Type Options

-t --task [manipulation, splice]

  * Specify the task type for evaluation (default = manipulation)

### Input Options

All CSV files passed to the Mask Scorer must contain headers and must have their rows separated by pipe characters ('|'). Fields and values in the CSV should <i>not</i> be enclosed in quotes ( ' or " ) if possible (e.g. entries 'foo', an empty field, and 'bar', in that order, should look like this in the csv: foo||bar). Additional specifications for the index and system output files can be found in the ValidatorNotebook.html file under the Validator directory.

--refDir

  * Specify the reference and index data path (e.g. "/NC2016_Test0601") (default = .)


-r --inRef

  * Specify the reference CSV file within refDir that contains the ground-truth information and metadata about each image. Key fields are TaskID, ProbeFileID, ProbeFileName, and ProbeMaskFileName, and if scoring on the 'splice' task, DonorFileID, DonorFileName, and DonorMaskFileName as well. Often the File ID's for the Probe and Donor will be the same as the file names, minus the extension. Additional fields, especially metadata pertaining to the ground-truth manipulation, may be included.

-x --inIndex

  * Define the index CSV file within refDir. The index file contains the TaskID, ProbeFileID, ProbeFileName, ProbeWidth, and ProbeHeight fields, and if scoring on the splice task, the DonorFileID, DonorFileName, DonorWidth, and DonorHeight fields as well. No additional fields are permitted for the index file.

--sysDir

  * Specify the system output data path, for example "mysysoutput/" (default = .) 


-s --inSys

  * Specify the CSV file of the system performance results formatted according to NC2016 specification. The file must contain the ProbeFileID, ConfidenceScore, and ProbeOutputMaskFileName fields, in that order, and if scoring on the splice task, the ProbeFileID, DonorFileID, ConfidenceScore, ProbeOutputMaskFileName, and DonorOutputMaskFileName fields, in that order. The ProbeOutputMaskFileNames and DonorOutputMaskFileNames (where relevant) should be directory strings relative to the location of the system performance CSV.

--rbin

  * Binarize the reference mask to black and white with a numeric threshold in the interval [0,255]. Choose -1 to not binarize and leave the mask as is. (default = -1)

--sbin

  * Binarize the system output mask to black and white with a numeric threshold in the interval [0,255]. Pick -1 to binarize by the threshold that gives the maximum absolute MCC. (default = -1)

### Output Options

--outRoot

  * Specify the report output path and the file name suffix for saving the plot and table (e.g., test/sys_xxx). For example, if you specify "--outRoot test/NIST_001", you will find the aggregate score report "NIST_001.csv" and the per-image report "NIST_001-perimage.csv" in the "test" folder.


### Scoring Options

--eks

  * Erosion kernel size. (number must be odd; default = 15)
  
--dks

  * Dilation kernel size. (number must be odd; default = 9)
  
-k kernel

  * The shape of the kernel to be used, for both erosion and dilation. Choose from 'box','disc','diamond','gaussian', or 'line'. The default kernel is 'box'.

### Performance Evaluation by Query

This option allows the user to evaluate their algorithm performance on either subsets or partitions of the data based on the specified queries and query options. The reference and index CSV files contain a list of factors (e.g., ProbePostProcessed|DonorPostProcessed|ManipulationQuality|IsManipulationTypeRemoval|...). Selecting none of the following factors will output a single report table (CSV) over the entire computed dataset.

-q query
 * Evaluate algorithm performance on a partitioned dataset using multiple factor queries, one at a time (e.g. "Collection==['NC2017'] & Purpose==['add','remove']" will average over the rows that fit this criterion for one queried average, but "Collection==['NC2017']" "Purpose==['add','remove']" will average over the first and then the second independently for two queried averages). The option generates N report tables (CSV), one for each query.
   * Syntax: -q "query1" "query2" "query3" ...
   - The syntax is the same as Pandas' query syntax. Please see the detailed query rule in the website: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query.
   
   Examples:
   
   % -q "Collection==['Nimble-SCI']" => 1 query
   
   % -q "Collection==['Nimble-SCI'] and PostProcessing=['rescale']" => 1 query
   
   % -q "Collection==['Nimble-SCI','Nimble-WEB']" "PostProcessing=['rescale']" "200<ProbeWidth<=3000" => 3 queries

-qp queryPartition
 * Uses one factor query to evaluate algorithm performance on a partitioned dataset through its individual sub-queries (e.g. "Collection==['NC2017'] & Purpose==['add','remove']" will average over "Collection==['NC2017'] & Purpose==['add']" and "Collection==['NC2017'] & Purpose==['remove']" for a total of two queried averages). This option generates a single report table (CSV) that contains M partition results, one result for each query.
   * Syntax: -qp "query"
   
   Examples:
   
   % -qp "Collection==['Nimble-SCI']" => 1 partition
   
   % -qp "Collection==['Nimble-SCI','Nimble-WEB'] & PostProcessing=['rescale']" => 2 partitions
   
   % -qp "Collection==['Nimble-SCI','Nimble-WEB'] & PostProcessing=['rescale','noise']" => 4 partitions
   
-qm queryManipulation
 * Filters the dataset before scoring takes place for some number of queries. It is functionally similar to the -q query option. The option generates M report tables (CSV), one for each query.
   * Syntax: -qm "query1" "query2" "query3" ...
   - Like the -q option, the syntax is the same as Pandas' query syntax.
   
   Examples:
   
   % -qm "Purpose==['remove']" => 1 query
   
   % -qm "Operation==['PasteSplice']" "Operation==['FillContentAwareFill']" => 2 query
   
   % -qm "Purpose==['remove']" "Purpose==['add']" "Purpose==['splice']"=> 3 queries

### Report Options

-v verbose

  * Control print output. Select 1 to print all non-error related output and 0 to suppress all print output (bar argument-parsing errors).
  
--precision

  * The number of digits to round computed scores, (e.g. a score of 0.3333333333333... will round to 0.33333 for a precision of 5), (default = 16).

-html

  * Output the report to HTML files. Set the flag to choose this option.

## Examples

In [19]:
%%bash
python MaskScorer.py -t manipulation --refDir ../../data/test_suite/maskScorerTests \
-r reference/manipulation/NC2017-manipulation-ref.csv -x indexes/NC2017-manipulation-index.csv \
-s ../../data/test_suite/maskScorerTests/B_NC2017_Manipulation_ImgOnly_c-me2_1/B_NC2017_Manipulation_ImgOnly_c-me2_1.csv \
-oR outputs/maniptest -html -q "ConfidenceScore < 0.5" "ManMade==['no']"

Current query: ConfidenceScore < 0.5
Current query: ManMade==['no']


Running this code in the tools/MaskScorer directory will generate an aggregate report of the computed mask scores titled B_NC2017_Manipulation_ImgOnly_c-me2_1-mask_scores.csv and a per-image score report titled B_NC2017_Manipulation_ImgOnly_c-me2_1-mask_scores_perimage.csv for the manipulation task.

The -html flag is also set, allowing the code to generate an HTML per-image <a href="outputs/maniptest/index.html">index file</a> with the scores and metadata containing links to individual detailed reports of each image.

The user may also select which manipulation regions to score, depending on the manipulations listed under the "Purpose" column in the journalMask file. Other regions are dilated by a separate factor and counted as selective no-score zones in addition to the boundary no-score zones applied to the regions of interest. Multiple pre-filters can be applied independently to the data, resulting in the output of multiple output indices corresponding to the number of queries passed to -qm.

In [17]:
%%bash
python MaskScorer.py -t manipulation --refDir ../../data/test_suite/maskScorerTests \
-r reference/manipulation/NC2017-manipulation-ref.csv -x indexes/NC2017-manipulation-index.csv \
-s ../../data/test_suite/maskScorerTests/B_NC2017_Manipulation_ImgOnly_c-me2_1/B_NC2017_Manipulation_ImgOnly_c-me2_1.csv \
-oR outputs/maniptargets -html -qm "Purpose==['remove']" "Purpose==['clone','add']"

The sample HTML index files for the 'remove' and 'clone','add' operations can be found <a href="outputs/maniptargets/index_0/index.html">here</a> and <a href="outputs/maniptargets/index_1/index.html">here</a> respectively.

## Disclaimer

This software was developed at the National Institute of Standards
and Technology (NIST) by employees of the Federal Government in the
course of their official duties. Pursuant to Title 17 Section 105
of the United States Code, this software is not subject to copyright
protection and is in the public domain. NIST assumes no responsibility
whatsoever for use by other parties of its source code or open source
server, and makes no guarantees, expressed or implied, about its quality,
reliability, or any other characteristic.