I needed a parsable copy of Intel's x86 instruction set documentation for a personal project, so I downloaded volumes 2A and 2B of the Intel® 64 and IA-32 Architectures Software Developer's Manual (which can be found here and here, respectively), and used a online PDF-to-HTML tool to transform them to HTML files. Unfortunately, the result was beyond terrible and absolutely unusable.
They say that you're never better served than by yourself, so I took the matter into my own, pdfminer-gloved hands to extract HTML pages straight from the documentation PDF themselves.
This is still not perfect, but it's already much better than the other solution (and it doesn't involve an ugly third-party).
How To Run
- Get yourself a copy of the Volume A and Volume B PDFs.
pdfminerdoesn't understand how these are encrypted, so print them to PDF, both starting only from the first instruction in the document (not the whole document);
python extract.py vol2a.pdf vol2b.pdf;
- Go grab a coffee;
- Enjoy your documentation set.
The set is also available online at felixcloutier.com/x86.