# Detection Scorer

## Description

<p>The purpose of this script is to evaluate the accuracy of the image tampering detection algorithms in multimedia forensics. The script currently supports an evaluation of four distinctive detection tasks, namely 1) Manipulation, 2) Removal, 3) Splice, and 4) Clone. The Manipulation detection task is to detect whether or not a probe image has been manipulated. The Removal detectiont task is to detect if a region of a probe image has been removed. The Splice detection task is to detect if a region of the other source (donor) image has been spliced into a probe image. The Clone detection task is to detect if a region within a probe image has been cloned.</p>   
<p>The script calculates performance measures of AUC (Area Under Curve) and EER (Equal Error Rate)
based on a system's output (e.g., confidence scores) for the tasks described above. The output is a report table (CSV) that includes the measures of AUC, EER, and AUC_CI (confidence interval for AUC), and a graphical plot (PDF) that supports either ROC (receiver operating characteristic) or DET (detection error tradeoff) for algorithm performance results. In addition, this script supports factor-based evaluations on algorithm performance</p> 


## Command-line Options

Example:
```
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/sample -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/sample -s D_NC2016_Manipulation_ImgOnly_p-me_1/D_NC2016_Manipulation_ImgOnly_p-me_1.csv --outRoot ./testcases/NC16_01
```
This command generates the ROC plot and the report table for evaluating algorithm performance given the reference data and the system output.

The command-line options for detection scorer can be categorized as follows:

### Task Type Options:

-t --task [manipulation, removal, splice, clone]

  * Define the target manipulation task type for evaluation (default = manipulation). This is a value of the "TaskID" column in the index file. 

### Input Options

--refDir

  * Specify the reference and index data path, for example "/NC2016_Test" (default = .)


-r --inRef

  * Specify the reference CSV file (under the refDir folder) that contains the ground-truth information
  * For example, the fields are: TaskID|ProbeFileID|ProbeFileName|ProbeMaskFileName|...             

-x --inIndex

  * Specify the index CSV file
  * For example, the fields are: TaskID|ProbeFileID|ProbeFileName|ProbeWidth|ProbeHeight

--sysDir

  * Specify the system output data path, for example "mysysoutput/" (default = .) 


-s --inSys

  * Specify the CSV file of the system performance result formatted according to NC2016 specification

### Metric Options

--farStop

 * Specify the stop point of FAR for calculating partial AUC. The default (1) provides the full AUC value.
    


### Output Options

--outRoot

  * Specify the report output path and the file name prefix for saving the plot(s) and table (s). For example, if you specify "--outRoot test/NIST_001", you will find the plot "NIST_001_det.png" and the table "NIST_001_report.csv" in the "test" folder (default =.)


--dump

   * Save the DM files (formatted in a binary) that contains a list of FAR, FPR, TPR, threshold, AUC, and EER values. The purpose of the dump files is to load the point values for further analysis without calculating the values again.


-v --verbose

   * Print output with procedure messages on the command-line if this option is specified.

--ci

   * Calculate the lower and upper confidence interval for AUC if this option is specified. The option will slowdown the speed due to the bootstrapping method.

### Plot Options

--plotType [det, roc]

  * Define the plot type (default = roc)


--display

  * Display a window with the plot (s) on the command-line if this option is specified (default = False) 


### Custom Plot Options

The user can customize the plot (e.g. change the title font size) by adjusting the json files located in the "plotJsonFiles" folder (e.g., plotJsonFiles/plot_options.json).

An example:
```json
{"title": "DET",
 "plot_type": "DET",
 "title_fontsize": 15,
 "xticks_size": "medium",
 "yticks_size": "medium",
 "xlabel": "False Alarm Rate [%]",
 "xlabel_fontsize": 12,
 "ylabel": "Miss Detection Rate [%]",
 "ylabel_fontsize": 12}
 ```

### Performance Evaluation by Factors

This option allows the user to evaluate their algorithm performance on either subsets or partitions of the data based on the specified factors. The reference and index CSV files contain a list of factors (e.g., ProbePostProcessed|DonorPostProcessed|ManipulationQuality|IsManipulationTypeRemoval|...).
The following figure illustrates the data query option along with corresponding outputs for factor-based evaluations.


<img src="./notebookImgs/partition_overview1.png" width="700" height="400">

-f --factor

* Evaluate algorithm performance on a partitioned dataset using multiple factor queries. Depending on the number (N) of queries, the option generates N report tables (CSV) and one plot (PDF) that contains N curves.
  + Syntax : -f "query1" "query2" "query3" ... 
   ```
   - The syntax is the same with Pandas query syntax. Please see the detailed query rule in the website: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query.
   Examples:
   % -f "Collection=['Nimble-SCI']" => 1 query
   % -f "Collection=['Nimble-SCI'] and PostProcessing=['rescale']" => 1 query
   % -f "Collection=['Nimble-SCI','Nimble-WEB']" "PostProcessing=['rescale']" "200<ProbeWidth<=3000" => 3 queries
   ```
  + Output: 
   ```
   - CSV report: NIST_001_f_query0.csv, NIST_001_f_query1.csv, ...
   - PDF plot: NIST_001_roc_all.pdf
   - DM file (using --dump): NIST_001_query0.dm, NIST_001_query1.dm, ...
   ```

-fp --factorp

* Evaluate algorithm performance on a partitioned dataset using one factor query. Depending on the number (M) of partitions provided by the cartesian product on query conditions, this option generates a single report table (CSV) that contains M partition results and one plot that contains M curves.
  + Syntax : -f "query"
   ```
   - The query syntax only allows the three operators "==[]", "<", and "<=".
   Examples: 
   % -fp "Collection=['Nimble-SCI']" => 1 partition
   % -fp "Collection=['Nimble-SCI','Nimble-WEB'] & PostProcessing=['rescale']" => 2 partitions
   % -fp "Collection=['Nimble-SCI','Nimble-WEB'] & PostProcessing=['rescale','noise']" => 4 partitions
   ```
  + Output: 
   ``` 
   - CSV report: NIST_001_fp_query.csv
   - PDF plot: NIST_001_roc_all.pdf
   - DM file (using --dump): NIST_001_query0.dm, NIST_001_query1.dm, ...
   ```

--multiFigs
* Generate a single curve plot per partition
   ``` 
   - Plot output: NIST_001_f_roc_0.pdf, NIST_001_f_roc_1.pdf, ...
   ```

## Command-line Usage

### Default Output

* For rendering the ROC curve and the report table,
```
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s dct02.csv --outRoot ./testcases/NC16_001 --ci --display
```
<img src="./notebookImgs/NC16_001_roc_all.png" alt="Default ROC curve" width="500" height="400" align="left">

In [3]:
import IPython.core.display as di
# This line will hide code by default when the notebook is exported as HTML
di.display_html('<script>jQuery(function() {if (jQuery("body.notebook_app").length == 0) { jQuery(".input_area").toggle(); jQuery(".prompt").toggle();}});</script>', raw=True)
import pandas as pd
df1=pd.read_csv("./notebookImgs/NC16_001_all.csv")
df1

Unnamed: 0,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,0.517917,1,0.516901,0.492906,0.54328


* For rendering DET curve:
```
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s dct02.csv --outRoot ./testcases/NC16_002 --plotType det --display
```
<img src="./notebookImgs/NC16_002_det_all.png" alt="ROC2" width="500" height="400" align="left">

### -f and -fp option usage example

#### -f option with one query
```
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s dct02.csv --outRoot ./testcases/NC16_003 -f "Collection==['Nimble-SCI','Nimble-WEB']" --display
```
<img src="./notebookImgs/NC16_003_roc_all.png" alt="ROC3" width="700" height="600" align="left">

In [4]:
df2=pd.read_csv("./notebookImgs/NC16_003_f_query_0.csv")
df2

Unnamed: 0,Query,auc,fpr_stop,eer,auc_ci_lower,auc_ci_upper
0,"Collection==['Nimble-SCI','Nimble-WEB']",0.517917,1,0.516901,0.492906,0.54328


#### -f option with two queries
```
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s dct02.csv --outRoot ./testcases/NC16_004 -f "Collection==['Nimble-SCI'] & 300 <= ProbeWidth"  "Collection==['Nimble-WEB'] & 300 <= ProbeWidth" --ci --display
```
<img src="./notebookImgs/NC16_004_roc_all.png" alt="ROC4" width="700" height="600"  align="left">

In [5]:
df3=pd.read_csv("./notebookImgs/NC16_004_f_query_0.csv")
df3


Unnamed: 0,Query,auc,fpr_stop,eer,auc_ci_lower,auc_ci_upper
0,Collection==['Nimble-SCI'] & 300 <= ProbeWidth,0.462995,1,0.565625,0.411921,0.502428


In [6]:
df4=pd.read_csv("./notebookImgs/NC16_004_f_query_1.csv")
df4

Unnamed: 0,Query,auc,fpr_stop,eer,auc_ci_lower,auc_ci_upper
0,Collection==['Nimble-WEB'] & 300 <= ProbeWidth,0.556042,1,0.479198,0.523205,0.592781


#### -fp option with one partition
```
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s dct02.csv --outRoot ./testcases/NC16_005 -fp "Collection==['Nimble-SCI'] & 300 <= ProbeWidth" --ci --display
```
<img src="./notebookImgs/NC16_005_roc_all.png" alt="ROC4" width="700" height="600" align="left">

In [7]:
df5=pd.read_csv("./notebookImgs/NC16_005_fp_query.csv")
df5

Unnamed: 0.1,Unnamed: 0,Collection,ProbeWidth,auc,fpr_stop,eer,auc_ci_lower,auc_ci_upper
0,Partition_0,'Nimble-SCI',300<=ProbeWidth,0.462995,1,0.565625,0.411921,0.502428


#### -fp option with two partitions
```
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s dct02.csv --outRoot ./testcases/NC16_006 -fp "Collection==['Nimble-SCI','Nimble-WEB'] & 300 <= ProbeWidth" --ci --display
```
<img src="./notebookImgs/NC16_006_roc_all.png" alt="ROC6" width="700" height="600" align="left">

In [8]:
df6=pd.read_csv("./notebookImgs/NC16_006_fp_query.csv")
df6

Unnamed: 0.1,Unnamed: 0,Collection,ProbeWidth,auc,fpr_stop,eer,auc_ci_lower,auc_ci_upper
0,Partition_0,'Nimble-SCI',300<=ProbeWidth,0.462995,1,0.565625,0.411921,0.502428
1,Partition_1,'Nimble-WEB',300<=ProbeWidth,0.556042,1,0.479198,0.523205,0.592781


#### --multiFigs option
```
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s dct02.csv --outRoot ./testcases/NC16_007 -fp "Collection==['Nimble-SCI','Nimble-WEB'] & 300 <= ProbeWidth" --multiFigs --ci --display
```
<img src="./notebookImgs/NC16_007_roc_combine.png" alt="ROC7" width="1000" height="700" align="left">

#### --dump option
```
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s dct02.csv --outRoot ./testcases/NC16_008 -fp "Collection==['Nimble-SCI','Nimble-WEB'] & 300 <= ProbeWidth" --dump --display
```
<img src="./notebookImgs/NIST_005_roc_0_1.png" alt="ROC7" width="1000" height="700" align="left">

### Splice task example

* An example:
```
python DetectionScorer.py -t splice --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-splice-ref.csv -x NC2016-splice-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s splice.csv --outRoot ./testcases/NC16_100 --ci --display
```
<img src="./notebookImgs/NC16_100_roc_all.png" alt="ROC8" width="500" height="400" align="left">

In [9]:
df6=pd.read_csv("./notebookImgs/NC16_100_all.csv")
df6

Unnamed: 0,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,0.872003,1,0.2148,0.853012,0.88796


* Disclaimer

This software was developed at the National Institute of Standards
and Technology (NIST) by employees of the Federal Government in the
course of their official duties. Pursuant to Title 17 Section 105
of the United States Code, this software is not subject to copyright
protection and is in the public domain. NIST assumes no responsibility
whatsoever for use by other parties of its source code or open source
server, and makes no guarantees, expressed or implied, about its quality,
reliability, or any other characteristic.
