Skip to content

This tool is a modified version of the tool presented by Marmot. A new/separate version is being created.

Notifications You must be signed in to change notification settings

unkown512/Evalatuion_tool_MOP_v1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

Evalatuion_tool_MOP_v1

This is a customized evaluation tool for the dataset presented in the paper: "Semi-Automatic LaTeX-Based Labeling of Mathematical Objects in PDF Documents: MOP Data Set". ACM DocEng 2019

Acknowledgment to the authors in "Performance evaluation of mathematical formula identification", who developed the original version of the evaluation tool for the Marmot dataset. Unlike Marmot tool, our dataset does not differentiate between Isolated mathematical objects (display) or Embedded mathematical objects (in-line). The reported results are adjusted accordingly.

NOTE

We are currently developing a new tool that will also test the performance of LaTeX generation and subject prediction.

FOR PDF OFFSETS

Depending on what tool you use, you may have to subtract some offset from the cordinates.

HOW TO USE

  1. Download the project with git clone https://github.com/unkown512/Evalatuion_tool_MOP_v1.git
  2. From the root directory run cd Evaluation_Tool
  3. To insure the project builds correctly, run python evaluate.py This should output the below information:

cor 0 mis 0 fal 1 par 0 exp 1 pae 0 mer 0 spl 0 cor 0 mis 0 fal 0 par 0 exp 1 pae 0 mer 0 spl 0 cor 1 mis 0 fal 0 par 0 exp 0 pae 0 mer 0 spl 0 cor 0 mis 1 fal 0 par 0 exp 0 pae 0 mer 0 spl 0 cor 1 mis 2 fal 8 par 0 exp 1 pae 1 mer 0 spl 0 MO False rate 0.6 MO Mis rate 0.333333333333

In total, the tool outputs 9 metrics:

  1. Correct
  2. Miss
  3. False
  4. Partial
  5. Expanded
  6. Partial and Expanded
  7. Merged
  8. Split

How to evaluate your performance?

Create a XML file whose name schema consists of <ID>_page_<#>.xml, whose ID and # correspond to the PDF_File and Ground_Truth file name schema. See MOP for more information.

For each extracted MO (mathematical object), insert a new element row as follows:

alt text

Note that for each row, it is optional to add additional rows such as <char BBox>. However, these are currently ignored during evaluation.

Follow the examples in the data/prediction directory. NOTE: You can add additional information such as the BBox for each character, size, and value without affecting the result.

BBox values X1, Y1, X2, Y2 represent the full tight bounding box of the predicted MO in the PDF document:

  • X1 = Left most x-coordinate

  • Y1 = Lowest y-coordinate

  • X2 = Right most x-coordinate

  • Y2 = Highest y-coordinate

    For each file in the data/pdf directory ensure a file exists in both the data/ground_truth and data/prediction directories. There must be an equal number of files and each file should have a unique <ID> inside its own directory that corresponds to the other directories.

About

This tool is a modified version of the tool presented by Marmot. A new/separate version is being created.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages