Skip to content
A utility for mapping the file formats embedded within a single file
Python JavaScript HTML CSS
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
logo Updated to the new logo Oct 31, 2019
polyfile Up the version number for a bugfix release Nov 4, 2019
testdata Add simple scaffolding for Travis CI Aug 19, 2019
.gitignore Package the cached defs instead of the raw XML Aug 29, 2019
.travis.yml Travis CI: only run against the default Ubuntu VM for Python Aug 19, 2019
LICENSE
README.md Added a Travis build status badge Nov 4, 2019
setup.py Updated classifiers Nov 1, 2019

README.md

PolyFile


Build Status PyPI version Slack Status

A utility to identify and map the semantic structure of files, including polyglots, chimeras, and schizophrenic files. It can be used in conjunction with its sister tool PolyTracker for Automated Lexical Annotation and Navigation of Parsers, a backronym devised solely for the purpose of collectively referring to the tools as The ALAN Parsers Project.

Quickstart

In the same directory as this README, run:

pip3 install -e .

This will automatically install the polyfile executable in your path.

Usage

$ polyfile --help
usage: polyfile [-h] [--html HTML] [--debug] [--quiet] FILE

A utility to recursively map the structure of a file.

positional arguments:
  FILE                  The file to analyze

optional arguments:
  -h, --help            show this help message and exit
  --html HTML, -t HTML  Path to write an interactive HTML file for exploring
                        the PDF
  --debug, -d           Print debug information
  --quiet, -q           Suppress all log output (overrides --debug)

To generate a JSON mapping of a file, run:

polyfile INPUT_FILE > output.json

You can optionally have PolyFile output an interactive HTML page containing a labeled, interactive hexdump of the file:

polyfile INPUT_FILE --html output.html > output.json

File Support

PolyFile can identify all 10,000+ file formats in the TrID database. It currently has support for parsing and semantically mapping the following formats:

For an example that exercises all of these file formats, run:

curl -v --silent https://www.sultanik.com/files/ESultanikResume.pdf | polyfile --html ESultanikResume.html - > ESultanikResume.json

Current Status and Known Deficiencies

  • The instrumented Kaitai Struct parser generator implementation has only been tested on the JPEG/JFIF grammar; other KSY definitions may exercise portions of the KSY specification that have not yet been implemented
  • The JSON output schema will soon be replaced with the similar SBuD format

License and Acknowledgements

This research was developed by Trail of Bits with funding from the Defense Advanced Research Projects Agency (DARPA) under the SafeDocs program as a subcontractor to Galois. It is licensed under the Apache 2.0 lisense. The PDF parser is modified from the parser developed by Didier Stevens and released into the public domain. © 2019, Trail of Bits.

You can’t perform that action at this time.