Skip to content

Assembler from scratch written in Ink, supporting ELF on x86_64 and more.

License

Notifications You must be signed in to change notification settings

thesephist/august

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

August 🪓

August is an assembler written from scratch in Ink for me to learn about assemblers, linkers, executable file formats, and compiler backends. It currently supports assembling and linking (in a single step) x86_64 ELF binaries for Linux, and might in the future support ELF executables for ARM, RISC-V, and x86 architectures. In the far long term, August might also become a code generation backend for a compiler written in Ink for some small subset of C if I feel adventurous. But for now, August is an educational project that assembles a subset of x86_64 to a Linux ELF binary.

August currently supports the following features:

  • A good portable subset of the integer x86_64 instruction set
  • Support for arguments as immediates, registers, and labels
  • Embedded read-only data segments
  • Symbol tables for debugging and disassembly

You can see some example assembly code that August can assemble and link under test/.

Design

August provides a CLI, ./src/cli.ink, that currently takes a single assembly program and emits a single statically-linked x86_64 ELF executable. Under the hood, August reads the assembly program, parses it into a simple representation of symbols and sections in the source, assembles it into machine code, and links it all together with a minimal ELF linker.

At the moment, the assembler and linker are pretty tightly integrated. The ELF linker assumes that only two sections are used, .text and .rodata, and the assembler generates code with that assumption. The virtual address table for the generated executable is also currently hard-coded into the linker and relied on by the assembler when resolving symbols.

Here's a transcript of a shell session that demonstrates what August can do today. We take a bare-bones Hello World program for Linux on x86_64, assemble it with August, run it, and dump the generated assembly with objdump.

$ cat test/asm/004-sym.asm
; Hello World

section .text   ; implicit

_start:
    mov eax 0x1     ; write syscall
    mov edi 0x1     ; stdout
    mov esi msg     ; string to print
    mov edx len     ; length
    syscall

exit:
    mov eax 60      ; exit syscall
    mov edi 0       ; exit code
    syscall

section .rodata

msg:
    db "Hello, World!" 0xa
len:
    eq 14

Run the emitted program, which prints, "Hello, World!" and exits cleanly.

$ august test/asm/004.asm ./hello-world
executable written.

$ ./hello-world
Hello, World!

$ echo $?
0

If we disassemble the generated executable, we find the assembly we began with.

$ objdump -d ./hello-world

./hello-world:     file format elf64-x86-64

Disassembly of section .text:

0000000000401000 <_start>:
  401000:       b8 01 00 00 00          mov    eax,0x1
  401005:       bf 01 00 00 00          mov    edi,0x1
  40100a:       be 00 50 6b 00          mov    esi,0x6b5000
  40100f:       ba 0e 00 00 00          mov    edx,0xe
  401014:       0f 05                   syscall

0000000000401016 <exit>:
  401016:       b8 3c 00 00 00          mov    eax,0x3c
  40101b:       bf 00 00 00 00          mov    edi,0x0
  401020:       0f 05                   syscall
        ...

Assembler

The instruction encoding is handled by the ./src/asm.ink library within the project. Currently, August can assemble simple programs that work with 32-bit registers and the ALU, handle branches and jumps, make system calls and function calls per the x86 calling convention, and read or write to memory. Even with these basic building blocks, we can write programs that do interesting things like loop, manipulate memory, and make recursive calls. You can check out some examples in test/asm/.

ELF Linker

August uses a library for constructing ELF executable files located at ./src/elf.ink. The ELF generated by the ELF library in August currently makes use of three sections:

  • .text containing the program text, i.e. translated x64 assembly.
  • .rodata containing read-only data loaded into process memory as read-only
  • .shstrtab containing section headers

The content of .text and .rodata sections can be provided to the ELF library, which will return a fully linked ELF binary as the result. All labels found in the assembly code are treated as local function symbols and placed into the generated symbol table.

References and further reading

The ELF file format is quite well documented, especially in source bases of various linkers, assemblers, and kernels, but the available reference material for implementing an ELF linker is not...what you would call super accessible. In the process of building August, I've found the following references particularly helpful.

In writing an x86/x64 assembler, the following were especially helpful to get me up to speed.

Development

To work on August, you obviously need Ink installed. Inkfmt is also useful for auto-formatting code, which you can run with make format or make f.

When I work on August (especially the instruction encoder), I usually have two other panes open, running:

  • ls test/asm/*.asm lib/*.ink src/*.ink | entr -cr make so every file change assembles and runs a program to test
  • ls ./b.out | entr -cr objdump -d -Mintel ./b.out so that every time the executable is re-compiled, I can see the disassembly of the executable and check it against the intended assembly code.

There is a growing test suite for the assembler / x86 instruction encoder, which you can run with make check or make t.

About

Assembler from scratch written in Ink, supporting ELF on x86_64 and more.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published