August is an assembler written from scratch in Ink for me to learn about assemblers, linkers, executable file formats, and compiler backends. It currently supports assembling and linking (in a single step) x86_64 ELF binaries for Linux, and might in the future support ELF executables for ARM, RISC-V, and x86 architectures. In the far long term, August might also become a code generation backend for a compiler written in Ink for some small subset of C if I feel adventurous. But for now, August is an educational project that assembles a subset of x86_64 to a Linux ELF binary.
August currently supports the following features:
- A good portable subset of the integer x86_64 instruction set
- Support for arguments as immediates, registers, and labels
- Embedded read-only data segments
- Symbol tables for debugging and disassembly
You can see some example assembly code that August can assemble and link under test/
.
August provides a CLI, ./src/cli.ink
, that currently takes a single assembly program and emits a single statically-linked x86_64 ELF executable. Under the hood, August reads the assembly program, parses it into a simple representation of symbols and sections in the source, assembles it into machine code, and links it all together with a minimal ELF linker.
At the moment, the assembler and linker are pretty tightly integrated. The ELF linker assumes that only two sections are used, .text
and .rodata
, and the assembler generates code with that assumption. The virtual address table for the generated executable is also currently hard-coded into the linker and relied on by the assembler when resolving symbols.
Here's a transcript of a shell session that demonstrates what August can do today. We take a bare-bones Hello World program for Linux on x86_64, assemble it with August, run it, and dump the generated assembly with objdump
.
$ cat test/asm/004-sym.asm
; Hello World
section .text ; implicit
_start:
mov eax 0x1 ; write syscall
mov edi 0x1 ; stdout
mov esi msg ; string to print
mov edx len ; length
syscall
exit:
mov eax 60 ; exit syscall
mov edi 0 ; exit code
syscall
section .rodata
msg:
db "Hello, World!" 0xa
len:
eq 14
Run the emitted program, which prints, "Hello, World!" and exits cleanly.
$ august test/asm/004.asm ./hello-world
executable written.
$ ./hello-world
Hello, World!
$ echo $?
0
If we disassemble the generated executable, we find the assembly we began with.
$ objdump -d ./hello-world
./hello-world: file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <_start>:
401000: b8 01 00 00 00 mov eax,0x1
401005: bf 01 00 00 00 mov edi,0x1
40100a: be 00 50 6b 00 mov esi,0x6b5000
40100f: ba 0e 00 00 00 mov edx,0xe
401014: 0f 05 syscall
0000000000401016 <exit>:
401016: b8 3c 00 00 00 mov eax,0x3c
40101b: bf 00 00 00 00 mov edi,0x0
401020: 0f 05 syscall
...
The instruction encoding is handled by the ./src/asm.ink
library within the project. Currently, August can assemble simple programs that work with 32-bit registers and the ALU, handle branches and jumps, make system calls and function calls per the x86 calling convention, and read or write to memory. Even with these basic building blocks, we can write programs that do interesting things like loop, manipulate memory, and make recursive calls. You can check out some examples in test/asm/
.
August uses a library for constructing ELF executable files located at ./src/elf.ink
. The ELF generated by the ELF library in August currently makes use of three sections:
.text
containing the program text, i.e. translated x64 assembly..rodata
containing read-only data loaded into process memory as read-only.shstrtab
containing section headers
The content of .text
and .rodata
sections can be provided to the ELF library, which will return a fully linked ELF binary as the result. All labels found in the assembly code are treated as local function symbols and placed into the generated symbol table.
The ELF file format is quite well documented, especially in source bases of various linkers, assemblers, and kernels, but the available reference material for implementing an ELF linker is not...what you would call super accessible. In the process of building August, I've found the following references particularly helpful.
man elf
on Linux and theelf
header file in the kernel sources, which provide the canonical reference for implementations of ELF files- A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux, which breaks down the ELF format for executable files at a high level
- LWN's write-up of the Linux kernel's view of ELF executables, with another breakdown of ELF executables
- Solaris's documentation on ELF object files, a good in-depth reference
- Notes on the ELF specification, which is long but very, very comprehensive, occasionally useful for studying edge cases
In writing an x86/x64 assembler, the following were especially helpful to get me up to speed.
- The x86asm.net ISA reference, which is comprehensive enough for a toy assembler and easy to navigate once you get used to the compact notation
- Encoding x86 Instructions, which was a helpful guide to understanding how x86 and x64 instructions are encoded
- The x64 cheat sheet for a handy list of the core x86/x64 instruction set
- The Calling Conventions article on OSDev Wiki
To work on August, you obviously need Ink installed. Inkfmt is also useful for auto-formatting code, which you can run with make format
or make f
.
When I work on August (especially the instruction encoder), I usually have two other panes open, running:
ls test/asm/*.asm lib/*.ink src/*.ink | entr -cr make
so every file change assembles and runs a program to testls ./b.out | entr -cr objdump -d -Mintel ./b.out
so that every time the executable is re-compiled, I can see the disassembly of the executable and check it against the intended assembly code.
There is a growing test suite for the assembler / x86 instruction encoder, which you can run with make check
or make t
.