Skip to content
Several PDF analysis reassembled with additional tips and tools
Branch: master
Clone or download
Latest commit fe328da Apr 9, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md Syntax update Apr 9, 2017
pdf_ange_albertini.png Add files via upload Aug 3, 2016

README.md

PDF Analysis

Several PDF analysis has already been done, I reassembled a lot of them with additional tips & tools here

PDF Format 📄

alt text


https://www.adobe.com/devnet/pdf/pdf_reference.html
https://blog.didierstevens.com/2008/04/09/quickpost-about-the-physical-and-logical-structure-of-pdf-files/
https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF

Tools list 🔧

Tool URL
AnalyzePDF.py https://github.com/hiddenillusion/AnalyzePDF
ByteForce https://github.com/weaknetlabs/ByteForce
Caradoc https://github.com/ANSSI-FR/caradoc
Didier Stevens suite https://github.com/DidierStevens/DidierStevensSuite
dumppdf https://packages.debian.org/jessie/python-pdfminer
forensics-all https://packages.debian.org/jessie-backports/forensics-all
Origami https://code.google.com/archive/p/origami-pdf/
ParanoiDF https://github.com/patrickdw123/ParanoiDF
peepdf https://github.com/jesparza/peepdf
PDF Xray https://github.com/9b/pdfxray_public
pdf-parser http://didierstevens.com/files/software/pdf-parser_V0_6_4.zip
pdf2jhon.py https://github.com/magnumripper/JohnTheRipper/blob/unstable-jumbo/run/pdf2john.py
pdfcrack https://packages.debian.org/jessie/pdfcrack
pdfextract https://github.com/CrossRef/pdfextract
pdfobjflow.py https://bitbucket.org/sebastiendamaye/pdfobjflow
pdfresurrect https://packages.debian.org/jessie/pdfresurrect
PdfStreamDumper.exe http://sandsprite.com/CodeStuff/PDFStreamDumper_Setup.exe
pdftk https://packages.debian.org/en/jessie/pdftk
pdfxray_lite.py https://github.com/9b/pdfxray_lite
poppler-utils https://packages.debian.org/en/jessie/poppler-utils (pdftotext, pdfimages, pdftohtml, pdftops, pdfinfo, pdffonts, pdfdetach, pdfseparate, pdfsig, pdftocairo, pdftoppm, pdfunite)
pyew https://packages.debian.org/en/jessie/pyew
qpdf https://packages.debian.org/jessie/qpdf
swf_mastah.py https://github.com/9b/pdfxray_public/blob/master/builder/swf_mastah.py

Existing list

http://blog.didierstevens.com/programs/pdf-tools/
https://github.com/sans-dfir/sift-files/tree/master/pdf-tools

Quick Analysis 🚀

Basic informations

$ file file.pdf
$ pdfinfo -box -meta -js -rawdates file.pdf

Displaying objects and actions structure

$ python pdfdid.py -aefv file.pdf

Search for /OpenAction /AA /Launch /GoTo /GoToR /SubmitForm /Richmedia (for Flash) /JS /JavaScript /URI - Encode - Cipher - Shell code - Obfuscation...

Automatically with ParanoiDF

$ python paranoiDF.py -fl file.pdf

Or with pdf-parser

$ python pdf-parser.py -v file.pdf

With an hexadecimal analyser

$ bless file.pdf

Extract files / scripts / Objects

pdf-parser to extract a js object for example

$ pdf-parser --object 32 --raw > extractedObject.js

pdfextract from Origami

$ pdfextract file.pdf

Online analysis

Beware to don't leak any important/professional/personnal data or to expose your research
https://www.hybrid-analysis.com/

Complete Analysis 🔎

Basic informations

$ file file.pdf
$ pdfinfo file.pdf
$ pdfinfo -box -meta -js -rawdates file.pdf

Powerfull Python tool to analyze PDF and exploit

$ pyew file.pdf 	

Other Python tool to explore PDF

$ peepdf -fl file.pdf
$ peepdf --interactive file.pdf

Analysis under Windows

PDF Stream Dumper
https://github.com/dzzie/pdfstreamdumper

Metadata

Get metadata

$ exiftool -a -u -g2 file.pdf

Get metadata recursivly from current directory

$ exiftool -r -ext pdf .

Change an element

$ exiftool -Title="New title" file.pdf

Remove metadata

$ exiftool -all= file.pdf && exiftool -all:all= file.pdf && qpdf --linearize file.pdf filewithoutmeta.pdf
$ mat file.pdf # latest version of mat doesn't support pdf format anymore...

Remove metadata recursively from the current directory : Very dirty but work well The filename must not have space at the moment, the commande will be optimized

$ find . -name "*.pdf" -print0 | while read -d $'\0' file; do echo ${file:2} && mv ${file:2} ${file:2}.pdf && exiftool -all= ${file:2}.pdf && exiftool -all:all= ${file:2}.pdf && qpdf --linearize ${file:2}.pdf ${file:2} && rm ${file:2}.pdf && rm ${file:2}.pdf_original; done

Search for older versions

Search for older "hidden" versions

$ pdfresurrect file.pdf -i
$ exiftool -pdf-update:all= file.pdf

Online Analysis

Name URL
Malwr https://malwr.com/submission/
Hybrid analysis https://www.hybrid-analysis.com/
Malware Tracker https://www.malwaretracker.com/pdf.php
VirusTotal http://www.virustotal.com/
PDF examiner http://www.pdfexaminer.com/
Document Analyzer http://www.document-analyzer.net/
Jotti https://virusscan.jotti.org/
PDF X-ray http://www.pdfxray.com/
PDF Online https://www.pdf-online.com/
Extract PDF http://www.extractpdf.com
Char conversion https://kt.pe/tools.html#conv/

Statistics

Calcul byte statistics, entropy min and max, ASCII count, ... from a PDF

$ python byte-stats.py file.pdf

Visual analysis

Visual analysis of a PDF or a binary file
http://binvis.io

Go deeper in the analysis

Displaying objects and actions structure

$ python pdfid.py --all --extra --force --verbose file.pdf

Map of the objects flows

$ pdf-parser file.pdf | ./pdfobjflow
$ eog pdfobjflow.png

Actions

Search for :
/OpenAction /AA specifies the script or action to run automatically.
/Names /AcroForm /Action can also specify and launch scripts or actions.
/JavaScript specifies JavaScript to run.
/GoTo changes the view to a specified destination within the PDF or in another PDF file.
/Launch a program or opens a document.
/URI accesses a resource by its URL.
/SubmitForm /GoToR can send data to URL.
/RichMedia can be used to embed Flash in PDF.
/ObjStm can hide objects inside an Object Stream.
/JavaScript > /J#61vaScript Beware on obfuscation technique with hex codes

With ParanoiDF

$ python paranoiDF.py -fl file.pdf

With pdf-parser

$ python pdf-parser.py -v file.pdf

With an hexadecimal analyser

$ bless file.pdf

With dumppdf

$ dumppdf -a file.pdf

Compression

Search for compression

$ strings file.pdf | grep --color "/Filter"

2 ways to decompress a PDF

$ pdftk compressed.pdf output uncompressed.pdf uncompress
$ qpdf --stream-data=uncompress compressed.pdf uncompressed.pdf 

Embeded files

4 ways to search for embeded files/scripts inside a PDF

$ binwalk file.pdf
$ foremost -a -v file.pdf
$ hachoir-subfile file.pdf
$ scalpel file.pdf

Extract files / scripts / objects

Extract file corresponding to object ID, jpg for example

$ dumppdf.py -i 32 -r file.pdf > image.jpg

Extract js from an object for example

$ pdf-parser --object 32 --raw > extractedObject.js

pdfextract from Origami

$ pdfextract file.pdf

Conversion

PDF to Postscript

$ pdftops file.pdf

PDF to TXT

$ pdftotext file.pdf

PDF to JPG

$ convert file.pdf image.jpg

Non-exhaustive list of possible conversion

LZWDecode filter

Convert a PDF to Postscript without the LZWDecode filter

$ qpdf --stream-data=uncompress original.pdf decoded.pdf # Decompress it
$ pdftops decoded.pdf decoded.ps # Convert it

Encryption

PDF supports RC4 encryption (40 to 128 bits keys) and AES (128 to 256 with the Extension Level 3).
Beware with empty password.

Password recovering

Brute force a PDF with pdfcrack

$ pdfcrack -w yourDictionnary.txt file.pdf

With john

$ pdf2john.py file.pdf > x.hash
$ john --wordlist=yourDictionnary.txt x.hash

Javascript

2 ways to search for Javascript

$ pdf-parser --search=JavaScript file.pdf 
$ pdfinfo -js file.pdf

Extract an object With jsunpack

$ jsunpack-extractjs file.pdf

With pdf-parser

$ pdf-parser --object 32 --raw file.pdf > file.js

With pdfextract from Origami

$ pdfextract --js file.pdf

De-obfuscate

https://github.com/urule99/jsunpack-n

Online :
http://jsunpack.jeek.org/java/

Malzilla and SpiderMonkey can also help deobfuscate JavaScript.
Malzilla :
http://www.malzilla.org/downloads.html
SpiderMonkey :
http://www.didierstevens.com/files/software/js-1.7.0-mod.tar.gz
More details coming soon.

Add Javascript to PDF

https://didierstevens.com/files/software/make-pdf_V0_1_6.zip
https://neonprimetime.blogspot.fr/2015/03/how-to-add-javascript-to-pdf.html

Disarming a PDF

$ python pdfid.py --disarm file.pdf

Flash

Search for flash

$ python pdf-parser.py --search flash file.pdf

Extract flash with swf_mastah

$ python swf_mastah.py -f file.pdf -o ./
$ file *.swf

With pdf-parser

$ pdf-parser.py --object 32 --filter --raw file.pdf > flashFile.swf
$ file flashFile.swf

Analysing flash program

$ swfdump -Ddu flashFile.swf > flashFile.txt

More details coming soon.

Sources ℹ️

https://blog.didierstevens.com/category/pdf/
http://www.decalage.info/file_formats_security/pdf
https://zeltser.com/analyzing-malicious-documents/
https://code.google.com/archive/p/corkami/wikis/PDFTricks.wiki
https://www.sans.org/reading-room/whitepapers/malicious/owned-malicious-pdf-analysis-33443
https://digital-forensics.sans.org/blog/2009/12/14/pdf-malware-analysis/
http://fileformats.archiveteam.org/wiki/PDF

You can’t perform that action at this time.