# Detection Scorer

## Description

<p>The purpose of this script is to evaluate the accuracy of the image tampering detection algorithms in multimedia forensics. The script currently supports an evaluation of two distinctive detection tasks, namely 1) Manipulation Detection and 2) Splice Detection. The Manipulation detection task is to detect whether or not a probe image has been manipulated. The Splice detection task is to detect if a region of the other source (donor) image has been spliced into a probe image.  Please refer the detailed information in the document "Nimble Challenge 2017 Evaluation Data and Tool" </p>   
<p>The script calculates performance measures of AUC (Area Under Curve) and EER (Equal Error Rate)
based on a system's output (e.g., confidence scores) for the tasks described above. The output is a report table (CSV) that includes the measures of AUC, EER, and AUC_CI (confidence interval for AUC), and a graphical plot (PDF) that supports either ROC (receiver operating characteristic) or DET (detection error tradeoff) for algorithm performance results. In addition, this script allows the user to evaluate their algorithm performance on either subsets or partitions of the data set based on the specified queries.</p> 


## Command-line Options

Syntax:
<code>
python DetectionScorer.py [OPTIONS] -r inRef -x inIndex -s inSys
</code>


The command-line options for detection scorer can be categorized as follows:

### Task Type Options:

-t --task [manipulation, splice]

  * Define the target manipulation task type for evaluation (default = manipulation). This is a value of the "TaskID" column in the index file. 

### Input Options:

--refDir

  * Specify the reference and index data path, for example "/NC2016_Test" (default = .)


-r --inRef

  * Specify the reference CSV file (under the refDir folder) that contains the ground-truth and metadata information
  * For example, the fields are: TaskID|ProbeFileID|ProbeFileName|ProbeMaskFileName|...             

-x --inIndex

  * Specify the index CSV file
  * For example, the fields are: TaskID|ProbeFileID|ProbeFileName|ProbeWidth|ProbeHeight

--sysDir

  * Specify the system output data path, for example "mysysoutput/" (default = .) 


-s --inSys

  * Specify the CSV file of the system performance result formatted according to the specification

### Metric Options:

--farStop

 * Specify the stop point of FAR for calculating partial AUC. The default (1) provides the full AUC value.
    


--ci

   * Calculate the lower and upper confidence interval for AUC if this option is specified. The option will slowdown the speed due to the bootstrapping method.

--ciLevel

   * Specify the confidence level (range [0.8, 0.99]) to calculate the lower and upper confidence interval for AUC. The default is 0.9. 

--dLevel

  * Define the lower and upper exclusions (range [0, 0.3]) for d-prime calculation. The default is 0.
    Note that the d-prime specified in the plot is the maximum d-prime.

### Output Options:

--outRoot

  * Specify the report output path and the file name prefix for saving the plot(s) and table (s). For example, if you specify "--outRoot test/NIST_001", you will find the plot "NIST_001_det.png" and the table "NIST_001_report.csv" in the "test" folder (default =.)


--dump

   * Save the dump files (formatted as a binary) that contains a list of FAR, FPR, TPR, threshold, AUC, and EER values. The purpose of the dump files is to load the point values for further analysis without calculating the values again.
   <pre>
   - Dump file output: NIST_001_query_0.dm, NIST_001_query_1.dm, ...
   </pre>

-v --verbose

   * Print output with procedure messages on the command-line if this option is specified.

### Plot Options:

--plotType [det, roc]

  * Define the plot type (default = roc)


--display

  * Display a window with the plot (s) on the command-line if this option is specified (default = False) 


--multiFigs
* Generate a single curve plot per partition
  <pre>
  Plot output: NIST_001_f_roc_0.pdf, NIST_001_f_roc_1.pdf, ...
  </pre>

### Custom Plot Options:

--configPlot

  * Open a JSON file that allows the user to customize the plot (e.g. change the title font size) by augmenting the json files located in the "plotJsonFiles" folder (e.g., plotJsonFiles/plot_options.json).

An example:
```json
{"title": "DET",
 "plot_type": "DET",
 "title_fontsize": 15,
 "xticks_size": "medium",
 "yticks_size": "medium",
 "xlabel": "False Alarm Rate [%]",
 "xlabel_fontsize": 12,
 "ylabel": "Miss Detection Rate [%]",
 "ylabel_fontsize": 12}
 ```

### OptOut Options:

--optOut

  * Evaluate algorithm performance on trials where IsOptOut values are 'N'.

### Performance Evaluation by Query Options:

The set of the following options allow the user to evaluate their algorithm performance on either subsets or partitions of the data set based on the specified queries. These options utilize Pandas’ queries to produce scoring reports using the metadata (e.g., Operation| Color| Purpose| OperationArgument| ...) within the reference file to subset/partition of the scored data.  There are three types of analysis supported:

* Query (-q --query): this option allows the user to specify multiple queries. Each query filters both target and non-target trials and then processes one scoring run of the system output to generate the requested scoring report.
* Query for partitions (-qp --queryPartition): this option allows the user to specify one query. The query separates the data set into M partitions by filtering both target and non-target trials and processes one or multiple scoring runs of the system output to generate the requested scoring report.
* Query for selective manipulations (-qm --queryManipulation): this option allows the user to specify multiple queries. Each query restricts filtering to target trials only (while using all non-target trials) to generate the requested scoring report.

-q --query

* Evaluate algorithm performance selecting the targets and non-targets via a query. Multiple queries can be used. Depending on the number (N) of queries, the option generates N report tables (CSV) and one plot (PDF) that contains N curves.
  + Syntax : -q "query1" "query2" "query3" ... 
   ```
   - The syntax is the same with Pandas' query syntax. Please see the detailed query rule in the website: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query.
   Examples:
   % -q "Collection==['Nimble-SCI']" => 1 query
   % -q "Collection==['Nimble-SCI'] and PostProcessing=['rescale']" => 1 query
   % -q "Collection==['Nimble-SCI','Nimble-WEB']" "PostProcessing=['rescale']" "200<ProbeWidth<=3000" => 3 queries
   ```
  + Output: 
   ```
   - CSV report: NIST_001_q_query0.csv, NIST_001_q_query1.csv, ...
   - PDF plot: NIST_001_roc_all.pdf
   - DM file (using --dump): NIST_001_query0.dm, NIST_001_query1.dm, ...
   ```

-qp --queryPartition

* Evaluate algorithm performance partitioning the dataset via a query.  The partitions are automatically determined based on the metadata factors identified in the query.  Depending on the number (M) of partitions provided by the cartesian product on query conditions, this option generates a single report table (CSV) that contains M partition results and one plot that contains M curves.
  + Syntax : -fp "query"
   ```
   - The query syntax only allows the three operators "==[]", "<", and "<=".
   Examples: 
   % -qp "Collection==['Nimble-SCI']" => 1 partition
   % -qp "Collection==['Nimble-SCI','Nimble-WEB'] & PostProcessing=['rescale']" => 2 partitions
   % -qp "Collection==['Nimble-SCI','Nimble-WEB'] & PostProcessing=['rescale','noise']" => 4 partitions
   ```
  + Output: 
   ``` 
   - CSV report: NIST_001_qp_query.csv
   - PDF plot: NIST_001_roc_all.pdf
   - DM file (using --dump): NIST_001_query0.dm, NIST_001_query1.dm, ...
   ```

-qm --queryManipulation

* This option is similar to the '-q' option; however, the queries are only applied to the target trials (IsTarget == 'Y') while all non-target trials are used. Depending on the number (N) of queries, the option generates N report tables (CSV) and one plot (PDF) that contains N curves.
  + Syntax : -qm "query1" "query2" "query3" ... 
   ```
   - The syntax is the same with Pandas' query syntax, 
     note that your query will be converted as to ("query" and IsTarget == ['Y']) or IsTarget == ['N'])
   Examples:
   % -qm "Purpose==['remove']" => 1 query
   % -qm "Operation==['PasteSplice']" "Operation==['FillContentAwareFill']" => 2 query
   % -qm "Purpose==['remove']" "Purpose==['add']" "Purpose==['splice']"=> 3 queries
   ```
  + Output: 
   ```
   - CSV report: NIST_001_qm_query0.csv, NIST_001_qm_query1.csv, ...
   - PDF plot: NIST_001_roc_all.pdf
   - DM file (using --dump): NIST_001_query0.dm, NIST_001_query1.dm, ...
   ```

## Command-line Usage

### NC2016 Dataset Testcases

#### * Manipulation Task

* Full scoring: rendering the ROC curve and the report table,
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Manipulation_ImgOnly_p-dct_02.csv --outRoot ./testcases/NC16_001 --ci --display
</code>
<img src="./notebookImgs/NC16_001_roc_all1.png" alt="Default ROC curve" width="500" height="400" align="left">

In [1]:
import IPython.core.display as di
# This line will hide code by default when the notebook is exported as HTML
di.display_html('<script>jQuery(function() {if (jQuery("body.notebook_app").length == 0) { jQuery(".input_area").toggle(); jQuery(".prompt").toggle();}});</script>', raw=True)
import pandas as pd
df1=pd.read_csv("./notebookImgs/NC16_001_all.csv")
df1

Unnamed: 0,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,0.517917,1,0.516901,0.492906,0.54328


* Full scoring: rendering DET curve:
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Manipulation_ImgOnly_p-dct_02.csv --outRoot ./testcases/NC16_002 --plotType det --display
</code>
<img src="./notebookImgs/NC16_002_det_all.png" alt="ROC2" width="500" height="400" align="left">

* OptOut (IsOptOut =='N') scoring
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Manipulation_ImgOnly_p-dct_02_optout.csv --outRoot ./testcases/NC16_101 --optOut --dLevel 0.1 --ci --display
</code>
<img src="./notebookImgs/NC16_101_roc_all.png" alt="Default ROC curve" width="500" height="400" align="left">

* Query (-q) with one query
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Manipulation_ImgOnly_p-dct_02.csv --outRoot ./testcases/NC16_003 <b>-q "Collection==['Nimble-SCI','Nimble-WEB']"</b> --ci --display
</code>
<img src="./notebookImgs/NC16_003_roc_all.png" alt="ROC3" width="700" height="600" align="left">

In [2]:
df2=pd.read_csv("./notebookImgs/NC16_003_q_query_0_report.csv")
df2

Unnamed: 0,QUERY,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,"Collection==['Nimble-SCI','Nimble-WEB']",0.517917,1,0.516901,0.492906,0.54328


* Query (-q) with two queries
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Manipulation_ImgOnly_p-dct_02.csv --outRoot ./testcases/NC16_004 <b>-q "Collection==['Nimble-SCI'] & 300 <= ProbeWidth"  "Collection==['Nimble-WEB'] & 300 <= ProbeWidth"</b> --ci --display
</code>
<img src="./notebookImgs/NC16_004_roc_all.png" alt="ROC4" width="700" height="600"  align="left">

In [3]:
df3=pd.read_csv("./notebookImgs/NC16_004_q_query_0_report.csv")
df3


Unnamed: 0,QUERY,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,Collection==['Nimble-SCI'] & 300 <= ProbeWidth,0.462995,1,0.565625,0.411921,0.502428


In [4]:
df4=pd.read_csv("./notebookImgs/NC16_004_q_query_1_report.csv")
df4

Unnamed: 0,QUERY,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,Collection==['Nimble-WEB'] & 300 <= ProbeWidth,0.556042,1,0.479198,0.523205,0.592781


* Query for partition (-qp) with one partition
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Manipulation_ImgOnly_p-dct_02.csv --outRoot ./testcases/NC16_005 <b>-qp "Collection==['Nimble-SCI'] & 300 <= ProbeWidth"</b> --ci --display
</code>
<img src="./notebookImgs/NC16_005_roc_all.png" alt="ROC4" width="700" height="600" align="left">

In [5]:
df5=pd.read_csv("./notebookImgs/NC16_005_qp_query_report.csv")
df5

Unnamed: 0.1,Unnamed: 0,Collection,ProbeWidth,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,Partition_0,'Nimble-SCI',300<=ProbeWidth,0.462995,1,0.565625,0.411921,0.502428


* Query for partition (-qp) with two partitions
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Manipulation_ImgOnly_p-dct_02.csv --outRoot ./testcases/NC16_006 <b>-qp "Collection==['Nimble-SCI','Nimble-WEB'] & 300 <= ProbeWidth"</b> --ci --display
</code>
<img src="./notebookImgs/NC16_006_roc_all.png" alt="ROC6" width="700" height="600" align="left">

In [6]:
df6=pd.read_csv("./notebookImgs/NC16_006_qp_query_report.csv")
df6

Unnamed: 0.1,Unnamed: 0,Collection,ProbeWidth,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,Partition_0,'Nimble-SCI',300<=ProbeWidth,0.462995,1,0.565625,0.411921,0.502428
1,Partition_1,'Nimble-WEB',300<=ProbeWidth,0.556042,1,0.479198,0.523205,0.592781


* Query for selective manipulation (-qm) with two queries
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Manipulation_ImgOnly_p-dct_02.csv --outRoot ./testcases/NC16_010 <b>-qm "Collection==['Nimble-SCI'] & IsManipulationTypeRemoval==['Y']" "Collection==['Nimble-WEB'] & IsManipulationTypeRemoval==['Y']"</b> --display
</code>
<img src="./notebookImgs/NC16_010_roc_all.png" alt="ROC111" width="700" height="600" align="left">

In [7]:
df10=pd.read_csv("./notebookImgs/NC16_010_qm_query_0_report.csv")
df10

Unnamed: 0,QUERY,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,Collection==['Nimble-SCI'] & IsManipulationTyp...,0.4525,1,0.594643,0,0


In [8]:
df11=pd.read_csv("./notebookImgs/NC16_010_qm_query_1_report.csv")
df11

Unnamed: 0,QUERY,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,Collection==['Nimble-WEB'] & IsManipulationTyp...,0.590206,1,0.436607,0,0


* --multiFigs with the query option
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-manipulation-ref.csv -x NC2016-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Manipulation_ImgOnly_p-dct_02.csv --outRoot ./testcases/NC16_007 <b>-qp "Collection==['Nimble-SCI','Nimble-WEB'] & 300 <= ProbeWidth" --multiFigs</b> --ci --display
</code>
<img src="./notebookImgs/NC16_007_roc_combine.png" alt="ROC7" width="1000" height="700" align="left">

#### * Splice Task

<code>
python DetectionScorer.py <b>-t splice</b> --refDir ../../data/test_suite/detectionScorerTests/reference -r NC2016-splice-ref.csv -x NC2016-splice-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2016_Splice_ImgOnly_p-splice_01.csv --outRoot ./testcases/NC16_100 --ci --display
</code>
<img src="./notebookImgs/NC16_100_roc_all.png" alt="ROC8" width="500" height="400" align="left">

In [9]:
df6=pd.read_csv("./notebookImgs/NC16_100_all.csv")
df6

Unnamed: 0,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,0.872003,1,0.2148,0.853012,0.88796


### NC2017 Dataset Testcases

#### * Manipulation Task

* Full scoring: rendering ROC curve
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/ -r reference/NC2017-manipulation-ref.csv -x reference/NC2017-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2017_Manipulation_ImgOnly_p-copymove_01.csv --outRoot ./testcases/NC17_001 --ci --display
</code>
<img src="./notebookImgs/NC17_001_copymove_roc_all.png" alt="ROC9" width="500" height="400" align="left">

In [10]:
df7=pd.read_csv("./notebookImgs/NC17_001_copymove_all.csv")
df7

Unnamed: 0,AUC,FAR_STOP,EER,AUC_CI_LOWER,AUC_CI_UPPER
0,0.679533,1,0.328889,0.620826,0.735491


<b>The tables are omitted from now on.</b>

* Query (-q) with two queries
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/ -r reference/NC2017-manipulation-ref.csv -x reference/NC2017-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2017_Manipulation_ImgOnly_p-copymove_01.csv --outRoot ./testcases/NC17_002 <b>-q "(Purpose ==['remove'] and IsTarget == ['Y']) or IsTarget == ['N']" "(Purpose ==['clone'] and IsTarget == ['Y']) or IsTarget == ['N']"</b> --display
</code>
<img src="./notebookImgs/NC17_002_roc_all.png" alt="ROC17_002" width="700" height="600" align="left">

* Query for selective manipulation (-qm) with the factor "Purpose"
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/ -r reference/NC2017-manipulation-ref.csv -x reference/NC2017-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2017_Manipulation_ImgOnly_p-copymove_01.csv --outRoot ./testcases/NC17_003 <b>-qm "Purpose==['remove']" "Purpose==['clone']"</b> --display
</code>
<img src="./notebookImgs/NC17_003_roc_all.png" alt="ROC17_003" width="700" height="600" align="left">

* Query for selective manipulation (-qm) with the factor "OperationArgument"
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/ -r reference/NC2017-manipulation-ref.csv -x reference/NC2017-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2017_Manipulation_ImgOnly_p-copymove_01.csv --outRoot ./testcases/NC17_004 <b>-qm "OperationArgument==['people','face']" "OperationArgument==['man-made object','landscape']"</b> --display
</code>
<img src="./notebookImgs/NC17_004_roc_all.png" alt="ROC17_004" width="700" height="600" align="left">

* Query for selective manipulation (-qm) with the mixed of factors
<code>
python DetectionScorer.py -t manipulation --refDir ../../data/test_suite/detectionScorerTests/ -r reference/NC2017-manipulation-ref.csv -x reference/NC2017-manipulation-index.csv --sysDir ../../data/test_suite/detectionScorerTests/baseline -s Base_NC2017_Manipulation_ImgOnly_p-copymove_01.csv --outRoot ./testcases/NC17_005 <b>-qm "Purpose==['remove'] and Operation ==['FillContentAwareFill']"</b> --display
</code>
<img src="./notebookImgs/NC17_005_roc_all.png" alt="ROC17_005" width="700" height="600" align="left">

### Disclaimer

This software was developed at the National Institute of Standards
and Technology (NIST) by employees of the Federal Government in the
course of their official duties. Pursuant to Title 17 Section 105
of the United States Code, this software is not subject to copyright
protection and is in the public domain. NIST assumes no responsibility
whatsoever for use by other parties of its source code or open source
server, and makes no guarantees, expressed or implied, about its quality,
reliability, or any other characteristic.
