Skip to content
Several PDF analysis reassembled with additional tips and tools
Branch: master
Clone or download
Latest commit fe328da Apr 9, 2017
Type Name Latest commit message Commit time
Failed to load latest commit information. Syntax update Apr 9, 2017
pdf_ange_albertini.png Add files via upload Aug 3, 2016

PDF Analysis

Several PDF analysis has already been done, I reassembled a lot of them with additional tips & tools here

PDF Format 📄

alt text

Tools list 🔧

Tool URL
Didier Stevens suite
PDF Xray
poppler-utils (pdftotext, pdfimages, pdftohtml, pdftops, pdfinfo, pdffonts, pdfdetach, pdfseparate, pdfsig, pdftocairo, pdftoppm, pdfunite)

Existing list

Quick Analysis 🚀

Basic informations

$ file file.pdf
$ pdfinfo -box -meta -js -rawdates file.pdf

Displaying objects and actions structure

$ python -aefv file.pdf

Search for /OpenAction /AA /Launch /GoTo /GoToR /SubmitForm /Richmedia (for Flash) /JS /JavaScript /URI - Encode - Cipher - Shell code - Obfuscation...

Automatically with ParanoiDF

$ python -fl file.pdf

Or with pdf-parser

$ python -v file.pdf

With an hexadecimal analyser

$ bless file.pdf

Extract files / scripts / Objects

pdf-parser to extract a js object for example

$ pdf-parser --object 32 --raw > extractedObject.js

pdfextract from Origami

$ pdfextract file.pdf

Online analysis

Beware to don't leak any important/professional/personnal data or to expose your research

Complete Analysis 🔎

Basic informations

$ file file.pdf
$ pdfinfo file.pdf
$ pdfinfo -box -meta -js -rawdates file.pdf

Powerfull Python tool to analyze PDF and exploit

$ pyew file.pdf 	

Other Python tool to explore PDF

$ peepdf -fl file.pdf
$ peepdf --interactive file.pdf

Analysis under Windows

PDF Stream Dumper


Get metadata

$ exiftool -a -u -g2 file.pdf

Get metadata recursivly from current directory

$ exiftool -r -ext pdf .

Change an element

$ exiftool -Title="New title" file.pdf

Remove metadata

$ exiftool -all= file.pdf && exiftool -all:all= file.pdf && qpdf --linearize file.pdf filewithoutmeta.pdf
$ mat file.pdf # latest version of mat doesn't support pdf format anymore...

Remove metadata recursively from the current directory : Very dirty but work well The filename must not have space at the moment, the commande will be optimized

$ find . -name "*.pdf" -print0 | while read -d $'\0' file; do echo ${file:2} && mv ${file:2} ${file:2}.pdf && exiftool -all= ${file:2}.pdf && exiftool -all:all= ${file:2}.pdf && qpdf --linearize ${file:2}.pdf ${file:2} && rm ${file:2}.pdf && rm ${file:2}.pdf_original; done

Search for older versions

Search for older "hidden" versions

$ pdfresurrect file.pdf -i
$ exiftool -pdf-update:all= file.pdf

Online Analysis

Name URL
Hybrid analysis
Malware Tracker
PDF examiner
Document Analyzer
PDF X-ray
PDF Online
Extract PDF
Char conversion


Calcul byte statistics, entropy min and max, ASCII count, ... from a PDF

$ python file.pdf

Visual analysis

Visual analysis of a PDF or a binary file

Go deeper in the analysis

Displaying objects and actions structure

$ python --all --extra --force --verbose file.pdf

Map of the objects flows

$ pdf-parser file.pdf | ./pdfobjflow
$ eog pdfobjflow.png


Search for :
/OpenAction /AA specifies the script or action to run automatically.
/Names /AcroForm /Action can also specify and launch scripts or actions.
/JavaScript specifies JavaScript to run.
/GoTo changes the view to a specified destination within the PDF or in another PDF file.
/Launch a program or opens a document.
/URI accesses a resource by its URL.
/SubmitForm /GoToR can send data to URL.
/RichMedia can be used to embed Flash in PDF.
/ObjStm can hide objects inside an Object Stream.
/JavaScript > /J#61vaScript Beware on obfuscation technique with hex codes

With ParanoiDF

$ python -fl file.pdf

With pdf-parser

$ python -v file.pdf

With an hexadecimal analyser

$ bless file.pdf

With dumppdf

$ dumppdf -a file.pdf


Search for compression

$ strings file.pdf | grep --color "/Filter"

2 ways to decompress a PDF

$ pdftk compressed.pdf output uncompressed.pdf uncompress
$ qpdf --stream-data=uncompress compressed.pdf uncompressed.pdf 

Embeded files

4 ways to search for embeded files/scripts inside a PDF

$ binwalk file.pdf
$ foremost -a -v file.pdf
$ hachoir-subfile file.pdf
$ scalpel file.pdf

Extract files / scripts / objects

Extract file corresponding to object ID, jpg for example

$ -i 32 -r file.pdf > image.jpg

Extract js from an object for example

$ pdf-parser --object 32 --raw > extractedObject.js

pdfextract from Origami

$ pdfextract file.pdf


PDF to Postscript

$ pdftops file.pdf


$ pdftotext file.pdf


$ convert file.pdf image.jpg

Non-exhaustive list of possible conversion

LZWDecode filter

Convert a PDF to Postscript without the LZWDecode filter

$ qpdf --stream-data=uncompress original.pdf decoded.pdf # Decompress it
$ pdftops decoded.pdf # Convert it


PDF supports RC4 encryption (40 to 128 bits keys) and AES (128 to 256 with the Extension Level 3).
Beware with empty password.

Password recovering

Brute force a PDF with pdfcrack

$ pdfcrack -w yourDictionnary.txt file.pdf

With john

$ file.pdf > x.hash
$ john --wordlist=yourDictionnary.txt x.hash


2 ways to search for Javascript

$ pdf-parser --search=JavaScript file.pdf 
$ pdfinfo -js file.pdf

Extract an object With jsunpack

$ jsunpack-extractjs file.pdf

With pdf-parser

$ pdf-parser --object 32 --raw file.pdf > file.js

With pdfextract from Origami

$ pdfextract --js file.pdf


Online :

Malzilla and SpiderMonkey can also help deobfuscate JavaScript.
Malzilla :
SpiderMonkey :
More details coming soon.

Add Javascript to PDF

Disarming a PDF

$ python --disarm file.pdf


Search for flash

$ python --search flash file.pdf

Extract flash with swf_mastah

$ python -f file.pdf -o ./
$ file *.swf

With pdf-parser

$ --object 32 --filter --raw file.pdf > flashFile.swf
$ file flashFile.swf

Analysing flash program

$ swfdump -Ddu flashFile.swf > flashFile.txt

More details coming soon.

Sources ℹ️

You can’t perform that action at this time.