# Mask Scorer

## Description

This script calculates performance scores that measure the accuracy of a system output mask to a reference mask. The script generates CSV report tables, one containing scores for each mask and another containing an average of the scores in the first CSV. Given the option, the script will generate a detailed HTML index file for the mask region performance results.

In the following metrics, the following terminology is used:
 * $GT$ refers to the ground truth mask
 * $sys$ refers to the system output mask
 * $TP$ refers to True Positives computed between the ground truth mask and the system output
 * $TN$ refers to True Negatives computed between the ground truth mask and the system output
 * $FN$ refers to False Negatives computed between the ground truth mask and the system output
 * $FP$ refers to False Positives computed between the ground truth mask and the system output
 * $weights$ is a matrix of 1's and 0's that denotes the set of pixels scored by the system; it is generated by a difference in the erosion and dilation of the manipulated area of $GT$.

The following metrics are used to score each mask:

### Nimble Mask Metric (NMM)
\begin{equation*}
NMM(GT,sys,weights,c)=\max{\left(\frac{TP - FN - FP}{\Sigma_{px\in GT}weights(px)},c\right)}
\end{equation*}

$\Sigma_{px \in GT}$ refers to the sum over the pixels in the ground truth that are marked black. $c$ denotes a minimum cutoff value for the scoring to have any meaning; by default, $c=-1$.

### Matthews Correlation Coefficient (MCC)
\begin{equation*}
MCC(GT,sys) = \frac{TP*TN - FP*FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}
\end{equation*}

An MCC of 1 denotes perfect correlation, an MCC of 0 denotes no correlation at all, and an MCC of -1 denotes perfect anti-correlation.

### Weighted L1 Loss (WL1)
\begin{equation*}
WL1(GT,sys,weights)=\frac{(FP+FN)_{weights > 0}}{\Sigma weights(px)}
\end{equation*}

A Weighted L1 of 0 denotes perfect or near perfect match up to variation within the weights that are 0; 1 denotes perfect mismatch. $(FP+FN)_{weights > 0}$ refers to the total number of $FP$ and $FN$ pixels where weights are greater than 0.

## Command-line Options

Example:
```
python2 MaskScorer.py -t removal --refDir ../../data/test_suite/maskScorerTests/ -r reference/removal/NC2016-removal-ref.csv -x index/NC2016-removal-index.csv -s ../../data/test_suite/maskScorerTests/B_NC2016_Removal_ImgOnly_c-me2_2/B_NC2016_Removal_ImgOnly_c-me2_2.csv -oR maskoutputs/sample -html
```
Running this code would produce, under the maskoutputs directory, an aggregate report of the computed mask scores titled sample.csv and a per-image score report titled sample-perimage.csv for the removal task. The -html flag is also set, allowing the code to generate an HTML per-image index file with the scores and metadata containing links to individual detailed reports of each image (described in the Sample HTML Output at the bottom of the page).

The command-line options for the mask scorer can be categorized as follows:

### Task Type Options

-t --task [manipulation, clone, removal, splice, provenance]

  * Specify the task type for evaluation (default = manipulation)

### Input Options

All CSV files passed to the Mask Scorer must contain headers and must have their rows separated by pipe characters ('|'). Fields and values in the CSV should <i>not</i> be enclosed in quotes ( ' or " ) if possible (e.g. entries 'foo', an empty field, and 'bar', in that order, should look like this in the csv: foo||bar). Additional specifications for the index and system output files can be found in the ValidatorNotebook.html file under the Validator directory.

--refDir

  * Specify the reference and index data path (e.g. "/NC2016_Test0601") (default = .)


-r --inRef

  * Specify the reference CSV file within refDir that contains the ground-truth information and metadata about each image. Key fields are TaskID, ProbeFileID, ProbeFileName, and ProbeMaskFileName, and if scoring on the 'splice' task, DonorFileID, DonorFileName, and DonorMaskFileName as well. Often the File ID's for the Probe and Donor will be the same as the file names, minus the extension. Additional fields, especially metadata pertaining to the ground-truth manipulation, may be included.

-x --inIndex

  * Define the index CSV file within refDir. The index file contains the TaskID, ProbeFileID, ProbeFileName, ProbeWidth, and ProbeHeight fields, and if scoring on the splice task, the DonorFileID, DonorFileName, DonorWidth, and DonorHeight fields as well. No additional fields are permitted for the index file.

--sysDir

  * Specify the system output data path, for example "mysysoutput/" (default = .) 


-s --inSys

  * Specify the CSV file of the system performance results formatted according to NC2016 specification. The file must contain the ProbeFileID, ConfidenceScore, and ProbeOutputMaskFileName fields, in that order, and if scoring on the splice task, the ProbeFileID, DonorFileID, ConfidenceScore, ProbeOutputMaskFileName, and DonorOutputMaskFileName fields, in that order. The ProbeOutputMaskFileNames and DonorOutputMaskFileNames (where relevant) should be directory strings relative to the location of the system performance CSV.

--rbin

  * Binarize the reference mask to black and white with a numeric threshold in the interval [0,255]. Choose -1 to not binarize and leave the mask as is. (default = -1)

--sbin

  * Binarize the system output mask to black and white with a numeric threshold in the interval [0,255]. Pick -1 to binarize by the threshold that gives the maximum absolute MCC. (default = -1)

-tmt --targetManiType

  * An array of manipulation tasks to be scored, separated by commas, e.g. 'Removal,Splice'. The default 'all' means that the system output will be scored against every manipulated region of the reference masks. (default: 'all')

### Output Options

--outRoot

  * Specify the report output path and the file name suffix for saving the plot and table (e.g., test/sys_xxx). For example, if you specify "--outRoot test/NIST_001", you will find the aggregate score report "NIST_001.csv" and the per-image report "NIST_001-perimage.csv" in the "test" folder.


### Scoring Options

--eks

  * Erosion kernel size. (number must be odd; default = 15)
  
--dks

  * Dilation kernel size. (number must be odd; default = 9)
  
-k kernel

  * The shape of the kernel to be used, for both erosion and dilation. Choose from 'box','disc','diamond','gaussian', or 'line'. The default kernel is 'box'.

### Performance Evaluation by Factors

This option allows the user to evaluate their algorithm performance on either subsets or partitions of the data based on the specified factors. The reference and index CSV files contain a list of factors (e.g., ProbePostProcessed|DonorPostProcessed|ManipulationQuality|IsManipulationTypeRemoval|...). Selecting neither of the following factors will output a single report table (CSV) over the entire computed dataset.

-q query
 * Evaluate algorithm performance on a partitioned dataset using multiple factor queries. The option generates N report tables (CSV), one for each query.
   * Syntax: -q "query1" "query2" "query3" ...

-qp queryPartition
 * Evaluate algorithm performance on a partitioned dataset using one factor query. This option generates a single report table (CSV) that contains M partition results, one result for each query.
   * Syntax: -qp "query"
   
-qm queryManipulation
 * Identical to -q for the mask scorer in behavior and result.
    * Syntax: -q "query1" "query2" "query3" ...

### Report Options

-v verbose

  * Control print output. Select 1 to print all non-error related output and 0 to suppress all print output (bar argument-parsing errors).
  
--precision

  * The number of digits to round computed scores, (e.g. a score of 0.3333333333333... will round to 0.33333 for a precision of 5), (default = 16).

-html

  * Output the report to HTML files. Set the flag to choose this option.

## Sample HTML Output

The following set of images is a sample of what you should see in the HTML report for each image. You can access each image from the appropriate link in the index.html file that is produced in your chosen output directory. Following is a sample index file:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: center;">
      <th></th>
      <th>TaskID</th>
      <th>ProbeFileID</th>
      <th>ProbeFileName</th>
      <th>ProbeMaskFileName</th>
      <th>IsTarget</th>
      <th>OutputProbeMaskFileName</th>
      <th>ConfidenceScore</th>
      <th>NMM</th>
      <th>MCC</th>
      <th>WL1</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>manipulation</td>
      <td>morelight</td>
      <td><font style="color:#0000FF"><u>morelight.jpg</u></font></td>
      <td>reference/manipulation/mask/light-mask.png</td>
      <td>Y</td>
      <td>mask/sys-light.png</td>
      <td>1.0</td>
      <td>-1.0</td>
      <td>0.272</td>
      <td>0.028</td>
    </tr>
  </tbody>
</table>
<h3>Average Scores</h3>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: center;">
      <th></th>
      <th>TaskID</th>
      <th>TargetManipulations</th>
      <th>NMM</th>
      <th>MCC</th>
      <th>WL1</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>manipulation</td>
      <td>all</td>
      <td>-1.0</td>
      <td>0.272</td>
      <td>0.028</td>
    </tr>
  </tbody>
</table>

The columns in this table have been truncated for the sake of display. Multiple rows may be displayed in the Average Scores table depending on the query option and query(ies) being passed.

Reports for each image are located in hyperlinks (formatted in the above example for display) under ProbeFileName for images that can be evaluated. Probe and donor masks for each ProbeFileID in the splice task will have one report each. 

Following is an example of basic output being evaluated by the mask scorer:

<h2>Manipulated Image: morelight.jpg </h2><br/>
<img src="notebookImgs/morelight.jpg" alt="manipulated image" style="width:640px;">
<br/>
<h3>Composite with Color Mask: sys-light_composite.jpg </h3>
<img src="notebookImgs/sys-light_composite.jpg" alt="manip image plus mask" style="width:640px;">
<br/>
<table border="1">
  <tbody>
    <tr>
      <td><img src="notebookImgs/light-mask.png" alt="reference mask" style="width:640px;"><br/><b>Colorized Reference Mask</b></td>
      <td><img src="notebookImgs/sys-light.png" alt="system output mask" style="width:640px;"><br/><b>System Output Mask</b></td>
    </tr>
    <tr>
      <td><img src="notebookImgs/light-mask-bin.png" alt="binarized reference mask" style="width:640px;"><br/><b>Binarized Reference Mask (Black = Manipulated, Yellow = Boundary No-Score Zone, Pink = Selective No-Score Zone)</b></td>
      <td><img src="notebookImgs/sys-light-bin.png" alt="binarized system output mask" style="width:640px;"><br/><b>Binarized System Output Mask (Black = Manipulated)</b></td>
    </tr>
    <tr>
      <td><img src="notebookImgs/sys-light-weights.png" alt="no-score zone" style="width:640px;"><br/><b>No-Score Zone (Yellow = Boundary No-Score Zone, Pink = Selective No-Score Zone)</b></td>
      <td><img src="notebookImgs/sys-light_colored.jpg" alt="color mask" style="width:640px;"><br/><b>Evaluation Result Visualization</b></td>
    </tr>
    <tr>
    <td><b>NIMBLE Mask Metric (NMM)</b>: -1.0 <br/>
      <b>Matthews Correlation Coefficient (MCC)</b>: 0.272 <br/>
      <b>Weighted L1 Loss (WL1)</b>: 0.028 <br/></td>
      <td><b>Total Pixels</b>: 6002101 <br/>
<table style = "border:1;background-color:#C8C8C8">
  <tbody>
    <tr>
      <th>
      Confusion Measures
      </th>
      <th>
      Pixels
      </th>      
      <th>
      Proportion
      </th>      
    </tr>
    <tr>
      <td>
<b>True Positives (<font style="color:#00CF00">TP: green</font>)</b>:
      </td>
      <td>
      14055
      </td>
      <td>
      0.002
      </td>
    </tr>
    <tr>
      <td>
<b>False Positives (<font style="color:#FF0000">FP: red</font>)</b>:
      </td>
      <td>
      168868
      </td>
      <td>
      0.028
      </td>
    </tr>
    <tr>
      <td>
<b>True Negatives (<font style="color:#FFFFFF">TN: white</font>)</b>:
      </td>
      <td>
      5819054
      </td>
      <td>
      0.97
      </td>
    </tr>
    <tr>
      <td>
<b>False Negatives (<font style="color:#3333FF">FN: blue</font>)</b>:
      </td>
      <td>
      124
      </td>
      <td>
      0.0
      </td>
    </tr>
    <tr>
      <td>
<b>Boundary No-Score Zone (<font style="color:#FFFF00">BNS: yellow</font>)</b>:
      </td>
      <td>
      13899
      </td>
      <td>
      0.002
      </td>
    </tr>
    <tr>
      <td>
<b>Selective No-Score Zone (<font style="color:#FFB6C1">SNS: pink</font>)</b>:
      </td>
      <td>
      0
      </td>
      <td>
      0.0
      </td>
    </tr>
  </tbody>
</table>
</td>
    </tr>
  </tbody>
</table>
<br/>
<h4>Target Manipulations: all</h4><br/>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Purpose</th>
      <th>Color</th>
      <th>Evaluated</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>clone</td>
      <td bgcolor="#000000"></td>
      <td>Y</td>
    </tr>
  </tbody>
</table>
<br/>
<b>Measures for Each Threshold</b></br>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Threshold</th>
      <th>NMM</th>
      <th>MCC</th>
      <th>WL1</th>
      <th>TP</th>
      <th>TN</th>
      <th>FP</th>
      <th>FN</th>
      <th>N</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>0.0</td>
      <td>-1.0</td>
      <td>0.271986</td>
      <td>0.028155</td>
      <td>14055</td>
      <td>5819054</td>
      <td>168868</td>
      <td>124</td>
      <td>6002101</td>
    </tr>
  </tbody>
</table>

---
The page below is a sample page of a manipulation with selected regions (based on manipulation type) being scored. Other regions are dilated by a separate factor and counted as selective no-score zones (pink) in addition to the boundary no-score zones applied to the regions of interest (yellow).

<h2>Manipulated Image: 43e1d6f0a9306a629a51062729549d76.jpg </h2><br/>
<img src="notebookImgs/43e1d6f0a9306a629a51062729549d76.jpg" alt="manipulated image" style="width:640px;">
<br/>
<h3>Composite with Color Mask: leaves-mask_composite.jpg </h3>
<img src="notebookImgs/leaves-mask_composite.jpg" alt="manip image plus mask" style="width:640px;">
<br/>
<table border="1">
  <tbody>
    <tr>
      <td><img src="notebookImgs/77232d3f2f9e866ad7bbc5faccd25119.png" alt="reference mask" style="width:640px;"><br/><b>Colorized Reference Mask</b></td>
      <td><img src="notebookImgs/leaves-mask.png" alt="system output mask" style="width:640px;"><br/><b>System Output Mask</b></td>
    </tr>
    <tr>
      <td><img src="notebookImgs/77232d3f2f9e866ad7bbc5faccd25119-bin.png" alt="binarized reference mask" style="width:640px;"><br/><b>Binarized Reference Mask (Black = Manipulated, Yellow = Boundary No-Score Zone, Pink = Selective No-Score Zone)</b></td>
      <td><img src="notebookImgs/leaves-mask-bin.png" alt="binarized system output mask" style="width:640px;"><br/><b>Binarized System Output Mask (Black = Manipulated)</b></td>
    </tr>
    <tr>
      <td><img src="notebookImgs/leaves-mask-weights.png" alt="no-score zone" style="width:640px;"><br/><b>No-Score Zone (Yellow = Boundary No-Score Zone, Pink = Selective No-Score Zone)</b></td>
      <td><img src="notebookImgs/leaves-mask_colored.jpg" alt="color mask" style="width:640px;"><br/><b>Evaluation Result Visualization</b></td>
    </tr>
    <tr>
      <td><b>NIMBLE Mask Metric (NMM)</b>: 0.292536767785 <br/>
      <b>Matthews Correlation Coefficient (MCC)</b>: 0.722353353812 <br/>
      <b>Weighted L1 Loss (WL1)</b>: 0.0717813808655 <br/></td>
      <td><b>Total Pixels</b>: 9227609 <br/>
<table style = "border:1;background-color:#C8C8C8">
  <tbody>
    <tr>
      <td>
<b>True Positives (<font style="color:#00CF00">TP: green</font>)</b>:
      </td>
      <td>
      1012095
      </td>
      <td>
      0.109681175264
      </td>
    </tr>
    <tr>
      <td>
<b>False Positives (<font style="color:#FF0000">FP: red</font>)</b>:
      </td>
      <td>
      588762
      </td>
      <td>
      0.063804393966
      </td>
    </tr>
    <tr>
      <td>
<b>True Negatives (<font style="color:#FFFFFF">TN: white</font>)</b>:
      </td>
      <td>
      7874451
      </td>
      <td>
      0.853357679113
      </td>
    </tr>
    <tr>
      <td>
<b>False Negatives (<font style="color:#3333FF">FN: blue</font>)</b>:
      </td>
      <td>
      98456
      </td>
      <td>
      0.0106697195341
      </td>
    </tr>
    <tr>
      <td>
<b>Boundary No-Score Zone (<font style="color:#FFFF00">BNS: yellow</font>)</b>:
      </td>
      <td>
      160636
      </td>
      <td>
      0.0174081931733
      </td>
    </tr>
    <tr>
      <td>
<b>Selective No-Score Zone (<font style="color:#FFB6C1">SNS: pink</font>)</b>:
      </td>
      <td>
      348211
      </td>
      <td>
      0.0377357774912
      </td>
    </tr>
  </tbody>
</table>
</td>
    </tr>
  </tbody>
</table>
<br/>
<h4>Target Manipulations: add</h4><br/>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Purpose</th>
      <th>Color</th>
      <th>Evaluated</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>add</td>
      <td bgcolor="#00F7FA"></td>
      <td>Y</td>
    </tr>
    <tr>
      <td>remove</td>
      <td bgcolor="#FF00F0"></td>
      <td>N</td>
    </tr>
    <tr>
      <td>add</td>
      <td bgcolor="#00FF06"></td>
      <td>Y</td>
    </tr>
    <tr>
      <td>add</td>
      <td bgcolor="#F4FD00"></td>
      <td>Y</td>
    </tr>
    <tr>
      <td>clone</td>
      <td bgcolor="#FF0C00"></td>
      <td>N</td>
    </tr>
    <tr>
      <td>heal</td>
      <td bgcolor="#FFFF7F"></td>
      <td>N</td>
    </tr>
  </tbody>
</table>

## Disclaimer

This software was developed at the National Institute of Standards
and Technology (NIST) by employees of the Federal Government in the
course of their official duties. Pursuant to Title 17 Section 105
of the United States Code, this software is not subject to copyright
protection and is in the public domain. NIST assumes no responsibility
whatsoever for use by other parties of its source code or open source
server, and makes no guarantees, expressed or implied, about its quality,
reliability, or any other characteristic.