Skip to content
Basic SAT model of x86 instructions using Z3, autogenerated from Intel docs
Python
Branch: master
Clone or download
Cannot retrieve the latest commit at this time.
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
sprdpl @ bf971e4 Update sprdpl version Dec 19, 2019
.gitmodules
README.md Convert example.py into a test script Jan 16, 2020
evaluate.py Print pretty tracebacks into pseudocode when something goes wrong, an… Jan 16, 2020
intr_builtins.py Move builtin functions to a new module, intr_builtins.py Dec 22, 2019
optimize.py Add a simple superoptimizer, and some minor API cleanups Dec 22, 2019
overview-1.png
parse.py
pos_popcnt.py Add a pretty non-trivial example: the equivalence proof of two differ… Dec 22, 2019
test.py Convert example.py into a test script Jan 16, 2020

README.md

x86-sat

This is a rudimentary attempt to build an autogenerated formal-ish model of x86 intrinsics by interpreting Intel's instruction pseudocode, transforming it into a model for Z3.

overview image

Description

Intel's Intrinsics Guide provides an interactive guide with data for each x86 intrinsic instruction, including a pseudocode that specifies the instruction's behavior. The intrinsics guide is backed by an XML file with all of this data in an easily parseable format, which we use here to build a Z3 model.

So far, only these features of Intel's pseudocode are supported:

  • Basic unary/binary arithmetic/bitwise operations, and ternary conditionals
  • Bit slices, both reading and writing
  • IF/CASE conditionals (which are predicated when they can't be statically resolved)
  • FOR loops
  • Function definitions/calls See tokens/rules in parse.py for the most up-to-date information.

There are many functions used in Intel's documentation that are not explicitly given in the XML. For now, these are almost all unsupported (see functions defined in intr_builtins.py for the current list).

This method of generating models is inherently limited. The intrinsics only cover a subset of x86 instructions (presumably until a PDF reader is added), so this will mostly be useful for investigating hand-rolled SIMD code, etc. Handling memory might be possible but would certainly be very slow. Handling control flow is likely out of scope for now too.

There are quite possibly bugs in this implementation, which can be pretty hard to find and fix. There are also definitely bugs in Intel's code (see below), so this shouldn't be relied on for anything serious.

So far, this can do some interesting non-trivial things, like derive lookup tables for vpternlogd:

check_print(_mm512_ternarylogic_epi32(_mm512_set1_epi8(0xAA),
       _mm512_set1_epi8(0xCC), _mm512_set1_epi8(0xF0), i) == _mm512_set1_epi8(0x57))
# -> [y = 0x1f]

...or find an index vector for vpermb that reverses its input:

values = range(2, 3*64, 3)
check(_mm512_set_epi8(*values) == _mm512_permutexvar_epi8(b, _mm512_set_epi8(*reversed(values))))
# <- [b = 0x000102030405060708090a0b0c0d0e0f1011...]

...or find a bug in Intel's pseudocode. Turns out that last one gives unsat, which didn't make sense. Investigating further, I noticed that I wasn't getting the right result for a _mm512_set_epi8(*range(64)), which led me to find this bug in the _mm512_set_epi8 pseudocode:

 dst[495:488] := e61
 dst[503:496] := e62
-dst[511:503] := e63
+dst[511:504] := e63
 dst[MAX:512] := 0

All of these intrinsic functions used in these examples (and the ones in test.py) are completely autogenerated from Intel's data; there's no manual intrinsics created whatsoever.

Usage

To use this, you must first download the latest XML from Intel's Intrinsics Guide (currently here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/files/data-3.4.6.xml) and save it as data.xml in the current directory. I don't believe I can legally redistribute this file myself.

This project requires Z3, and my sprdpl parsing library (included as a submodule).

The library has two primary APIs for looking up and using intrinsics:

  • The parse_whitelist() function, which returns a dictionary of intrinsic objects, suitable for injecting directly into global scope:
intrinsics = parse_whitelist('data.xml', regex='_mm256_set1_epi(8|32)')
globals().update(intrinsics)
check_print(_mm256_set1_epi8(0) == _mm256_set1_epi32(0))
  • The parse_meta() function, which returns a magic object that lazily parses intrinsics pseudocode whenever attributes are accessed, like so:
meta = parse_meta('data.xml')
check_print(meta._mm256_xor_si256(0, 1) == 1)
# .prefixed() allows reducing duplicated prefixes:
avx = meta.prefixed('_mm256_')
check_print(avx.set1_epi8(0) == 0)

See test.py for various example use cases.

You can’t perform that action at this time.