This article is a complete reference of the spectnetide's Z80 Assembler implementation. The assembler was designed with high performance in mind. Internally it uses the Antlr tool for parsing the code.
The assembler language uses a special way of case-sensitivity. You can write the reserved words (such as assembly instructions, pragmas, or directives) either with lowercase or uppercase letters, but you cannot mix these cases. For example, this instructions use proper syntax:
LD c,A
JP #12ac
ldir
djnz MyLabel
However, in these samples, character cases are mixed, and do the compiler will refuse them:
Ld c,A
Jp #12ac
ldIR
djNZ MyLabel
In symbolic names (labels, identifiers, etc.), you can mix lowercase and uppercase letters. Nonetheless, the compiler applies case-insensitive comparison when mathcing symbolic names. So, these statement pairs are totally equivalent with each other:
jp MainEx
jp MAINEX
djnz mylabel
djnz MyLabel
ld hl,ErrNo
ld hl,errNo
Each line of the source code is a declaration unit and is parsed in its own context. Such a source code line can be one of these constructs:
- A Z80 instruction can be directly compiled to binary code (such as
ld bc,#12AC
) - A directive that is used by the preprocessor of the compiler (e.g
#include
,#if
, etc.) - A pragma that emits binary output or instructs the compiler for something (
.org
,.defb
, etc) - A comment that helps the understanding of the code.
Comments start with a semicolon (;
). The compiler takes the rest of the line into account as the body
of the comment. This sample illustrates this concept:
; This line is comment-only line
Wait: ld b,8
Wait1: djnz Wait1 ; wait while the counter reaches zero
If you need multi-line comments, you can add single-line comments after each other. The Z80 assembly in spectnetide does not have separate multi-line comment syntax.
The language syntax provides these types of literals:
- Decimal numbers. You can use up to 5 digits (0..9) to declare a decimal number. Examples: 16, 32768, 2354.
- Hexadecimal numbers. You can use up to 4 hexadecimal digits (0..9, a..f or A..F) to declare
a hexadecimal literal. The compiler looks for one of the
#
or0x
prefix, or one of theh
orH
suffixes to recognize them as hexadecimal. Here are a few samples:
#12AC
0x12ac
12ACh
12acH
You can use negative number with the minus sign in front of them. Actually, the sign is not the part of the numeric literal, it is an operator.
- Characters. You can put a character between double quotes (for example:
"Q"
). - Strings. You can put a series of character between double quotes (for example:
"Sinclair"
).
You can use escape sequences to define non-visible or control characters, as you will learn soon.
- The
$
token. This literal represents the current assembly address.
You can use identifiers to refer to labels and other constants. Identifiers must start with
a letter (a..z or A..Z) or the underscore character (_
). The subsequent characters
can be digits (0..9), too. Here are a few samples:
MyCycle
ERR_NO
Cycle_4_Wait
Theoretically, you can use as long identifiers as you want. I suggest you to make them no longer than 32 characters so that readers may read your code easily.
You have already learned that you can utilize character and string literals (wrapped into double quotes), such as in these samples:
"This is a string. The next sample is a single character"
"c"
ZX Spectrum has a character set with special control characters such as AT, INK, PAPER, and so on.
The spectnetide assembler allows you special escape sequences to define them:
Escape | Code | Character |
---|---|---|
\i |
0x10 | INK |
\p |
0x11 | PAPER |
\f |
0x12 | FLASH |
\b |
0x13 | BRIGHT |
\I |
0x14 | INVERSE |
\o |
0x15 | OVER |
\a |
0x16 | AT |
\t |
0x17 | TAB |
\P |
0x60 | pound sign |
\C |
0x7F | copyright sign |
\\ |
0x5C | backslash |
\' |
0x27 | single quote |
\" |
0x22 | double quote |
\0 |
0x00 | binary zero |
Observe, some of these sequences have different values than their corresponding pairs in other languages, such as C, C++, C#, or Java.
To declare a character by its binary code, you can use the \xH
or
\xHH
sequences (H
is a hexadecimal digit). For example, these
escape sequence pairs are equivalent:
"\i"
"\x10"
"\C by me"
"\x7f \x62y me"
The Z80 assembler has a rich syntax for evaluating expressions. You can use operands and operators just like in most programming languages. Nevertheless, the spectnetide implementation has its particular way of evaluating expressions:
- All expressions are evaluated as 16-bit unsigned integers. Overflows are ignored: the compiler internally uses 32-bit integers during the evaluations and keeps only the rightmost 16 bits of the result.
- When a Z80 operation (for example,
ld a,NN
) needs an 8-bit value, the rightmost 8-bits are used.
In the future, these compiler features may change by issuing a warning in case of arithmetic overflow.
- In logical expressions, any non-zero value represents
true
; the zero value - stands for
false
. - Instead of parentheses —
(
and)
— you can use square brackets —[
and]
— to group operations.
; This is a syntax error
ld hl,(Offset+#20)*2+BaseAddr
; This is valid
ld hl,[Offset+#20]*2+BaseAddr
You can use the following operands in epressions:
- Decimal and hexadecimal literals
- Character literals
- Identifiers
- The
$
token
String literals cannot be used as operands.
You can use about a dozen operators, including unary, binary and ternary ones. In this section you will learn about them. I will introduce them in descending order of their precendence.
The assembler supports using only one ternary operator, the conditional operator:
conditional-expression ?
true-value :
false-value
This operation results in -1:
2 > 3 ? 2 : -1
When the conditional-expression evaluates to true, the operation results in true-value; otherwise in false-value.
Conditional expressions are evaluated from right to left, in contrast to binary operators, which use left-to-right evaluation.
Operator token | Precedence | Description |
---|---|---|
``` | ``` | 1 |
^ |
2 | Bitwise XOR |
& |
3 | Bitwise AND |
Operator token | Precedence | Description |
---|---|---|
== |
4 | Equality |
!= |
4 | Non-equality |
< |
5 | Less than |
<= |
5 | Less than or equal |
> |
5 | Greater than |
>= |
5 | Greater than or equal |
The bits of the left operand are shifted by the number of bits given by the right operand.
Operator token | Precedence | Description |
---|---|---|
<< |
6 | Shift left |
>> |
6 | Shift right |
Operator token | Precedence | Description |
---|---|---|
+ |
7 | Addition |
- |
7 | Subtraction |
* |
8 | Multiplication |
/ |
8 | Division |
% |
8 | Modulo calculation |
Operator token | Precedence | Description |
---|---|---|
+ |
9 | Unary plus |
- |
9 | Unary minus |
Do not forget, you can change the defult precendence with
[
and]
.
spectnetide implements Every officially documented Z80 instruction as well as the non-official ones. During the implementation I used ClrHome.org as a reference.
Z80 instructions may start with a label. Labels are identifiers that can be terminated by an
optional colon (:
). Both labels in these samples are accepted by the compiler:
Start: ld b,#f0
Wait djnz Wait
The compiler accepts these mnemonics:
ADC
, ADD
, AND
, BIT
, CALL
, CCF
, CP
, CPD
,
CPDR
, CPI
, CPIR
, CPL
, DAA
, DEC
, DI
, DJNZ
,
EI
, EX
, EXX
, HALT
, IM
, IN
, INC
, IND
,
INDR
, INI
, INIR
, JP
, JR
, LD
, LDD
, LDDR
,
LDI
, LDIR
, NEG
, NOP
, OR
, OTDR
, OTIR
, OUT
,
OUTD
, OUTI
, POP
, PUSH
, RES
, RET
, RETI
, RETN
,
RL
, RLA
, RLC
, RLCA
, RLD
, RR
, RRA
, RRC
,
RRCA
, RRD
, RST
, SBC
, SCF
, SET
, SLA
, SLL
SRA
, SRL
, SUB
, XOR
.
The compiler uses the standard 8-bit and 16-bit register names, as specified in the official Zilog Z80 documentation:
- 8-bit registers:
A
,B
,C
,D
,E
,H
,L
,I
,R
- 16-bit registers:
AF
,BC
,DE
,HL
,SP
,IX
,IY
- For the 8-bit halves of the
IX
andIY
index registers, the compiler uses these names:XL
,XH
,YL
,YH
. Alternatively, the compiler accepts these names, too:IXL
,IXH
,IYL
,IYH
. As a kind of exception to general naming conventions, these mixed-case names are also accepted:IXl
,IXh
,IYl
,IYh
.
Z80 assemblers use two different syntax for the indirect JP
statements:
; Notation #1
jp hl
jp ix
jp iy
; Notation #2
jp (hl)
jp (ix)
jp (iy)
The spectnetide compiler accepts both notation.
Three standard ALU operations between A
and other operands (ADD
, ADC
, and SBC
) sign A
as their first operand:
add a,b
adc a,(hl)
sbc a,e
Hovewer, the five other standard ALU operations between A
and other operands (SUB
, AND
, XOR
, OR
, and CP
) omit A
from their notation:
sub e
and (hl)
xor e
or c
cp b
The spectnetide compiler accepts the second group of ALU operations with using the explicit A
operand, too:
sub a,e
and a,(hl)
xor a,e
or a,c
cp a,b
The compiler evaluates expressions in two trips. First, when the lines of the assembly codes are successfully parsed, and the compiler emits the output. This time might be expressions that cannot be promptly evaluated. So, after the code is emitted, the compiler start fixup trip to resolve the values of symbols that remained unresolved in the first trip. This a sample demonstrates this situation:
ld hl,Table
ld a,#20
; ...
Table: ld bc,#4000
When the compiler emits the code for the ld hl,Table
instruction, it does not know the
value of Table
, as this symbol will receive its value only later. In the first trip, the
compiler records the fact that later it should use the value of Table
to complete the
ld hl,NNNN
instruction.
While emitting the code, the compiler reaches the ld bc,#4000
instruction and at that point
it has a value for Table
.
After the code is emitted, in the second — fixup — trip, the compiler is able to replace
Table
with its value.
There might be situations where the compiler cannot resolve symbolic values. In this case it signs an error.
For example, in this example there is a circular reference between Addr1
and Addr2
:
Addr1: .equ Addr2+#20
; ...
Addr2 .equ Addr1-#10
The compiler understands several pragmas that — thought they are not Z80 instructions — influence the emitted code. Each pragma has two alternative syntax, one with a dot prefix and another without it.
For example, you can write ORG
or .ORG
to use the ORG pragma.
I prefer using the dot-prefixed versions of pragmas.
With the ORG pragma, you define where to place the compiled Z80 code when you run it. For example, the following line sets this location to the 0x6000 address:
.org #6000
If you do not use ORG, the default address is 0x8000.
You can apply multiple ORG in your source code. Each usage creates a new segment in the assembler output. Take a look at this code:
ld h,a
.org #8100
ld d,a
.org #8200
ld b,a
This code generates three output segment, each with one emitted byte that represents the
corresponding LD
operation. The first segment will start at 0x8000 (default),
the second at 0x8100, whilst the third at 0x8200.
The ENT pragma defines the entry code of the program when you run it from Visual Studio. If you do not apply ENT in your code, the entry point will be the first address of the very first output code segment. Here's a sample:
.org #6200
ld hl,#4000
.ent $
jp #6100
.org #6100
call MyCode
...
The .ent $
pragma will sign the address of the jp #6100
isntruction as the entry
address of the code. Should you omit the ENT pragma from this code, the entry point would be
0x6200, for this is the start of the very first output segment, even though there is another
segment starting at 0x6100.
The IDE provides a command, Export Z80 Program, which allows you to create a LOAD block
that automatically starts the code. The Run Z80 Program and Debug Z80 Program command
simply jump to the address you specify with the ENT pragma. However, the auto LOAD block uses
the RANDOMIZE USR address
pattern where you need to define a different entry address that
can be closed with a RET
statement. The XENT pragma sets this address.
Here's a sample:
start:
.org #8000
.ent #8000
call SetBorder
jp #12ac
SetBorder:
.xent $
ld a,4
out (#fe),a
ret
The IDE will use #8000 — according to the .ent #8000
pragma — when starting
the code with the Run Z80 Program. Nonetheless, the Export Z80 Program will offer #8006
— according to the .xent $
pragma — as the startup code address.
The DISP pragma allows you to define a displacement for the code. The value affects the
$
token that represents the current assembly address. Your code is placed according
to the ORG of the particular output segment, but the assembly address is always displaced
with the value according to DISP. Take a look at this sample:
.org #6000
.disp #1000
ld hl,$
The ld hl,$
instruction will be placed to the 0x6000 address, but it will be equivalent
with the ld hl,#7000
statement due to the .disp #1000
displacement.
Of course, you can use negative displacement, too.
The EQU pragma allows you assign a value to an identifier. The label before EQU is the name of the identifier (or symbol), the exression used in EQU is the value of the variable. This is a short sample:
.org #6200
ld hl,Sym1
Sym1: .equ #4000
ld bc,Sym2
Sym2: .equ $+4
This sample is equivalent with this one:
.org #6200
ld hl,#4000 ; Sym1 <-- #4000
ld bc,#620a ; Sym2 <-- #620a as an ld bc,NNNN operation and
an ld hl,NNNN each takes 3 bytes
The VAR pragma works similarly to EQU. However, while EQU does not allow using the same symbol with mulitple value assignments, VAR assigns a new value to the symbol every time it is used.
The DEFB pragma emits 8-bit expressions (bytes) from the current assembly position. here is a sample:
.org #6000
.defb #01, #02, $, #04
The DEFB pragma will emit these four bytes starting at 0x6000: 0x01, 0x02, 0x03, 0x04.
The $
expression will emit 0x03, because at the emission point the current assembly
address is 0x6003. The DEFB program takes into account only the rightmost 8 bits of any
expression: this is how $
results in 0x03.
The DEFW pragma is similar to DEFB, but it emits 16-bit values with LSB, MSB order.
.defw #1234, #abcd
This simple code above will emit these four bytes: 0x34, 0x12, 0xcd, 0xab.
The DEFM pragma emits the byte-array representation of a string. Each character in the string is replaced with the correcponding byte. Tak a look at this code:
.defm "\C by me"
Here, the DEFM pragma emits 7 bytes for the seven characters (the first escape sequence represents the copyrigh sign) : 0x7f, 0x20, 0x62, 0x69, 0x20, 0x6d, 0x65.
You can emit zero (0x00
) bytes with this pragma. It accepts a single argument,
the number of zeros to emit. This code sends 16 zeros to the generated output:
.defs 16
With FILLB, you can emit a particular count of a specific byte. The first argument
of the pragma sets the count, the second specifies the byte to emit. This code emits 24
bytes of #A5
values:
.fillb 24,#a5
With FILLW, you can emit a particular count of a specific 16-bit word. The first argument
of the pragma sets the count, the second specifies the word to emit. This code emits 8
words (16 bytes) of #12A5
values:
.fillw 8,#12a5
Of course, the bytes of a word are emitted in LSB/MSB order.
The SKIP pragma — as its name suggests — skips the number of bytes as specified in its argument. It fills up the skipped bytes with 0xff.
The EXTERN pragma is kept for future extension. The current compiler accepts it, but does not do any action when observing this pragma.
This pragma is used when you run or debug your Z80 code within the emulator. With Spectrum 128K, Spectrum +3,
and Spectrum Next models, you can run the Z80 code in differend contexts. The MODEL pragma lets you
specify on which model you want to run the code. You can use the SPECTRUM48
, SPECTRUM128
,
SPECTRUMP3
, or NEXT
identifiers to choose the model (identifiers are case-insensitive):
.model Spectrum48
.model Spectrum128
.model SpectrumP3
.model Next
For example, when you create code for Spectrum 128K, and add the .model Spectrum48
pragma to the code,
the Run Z80 Code command will start the virtual machine, turns the machine into Spectrum 48K mode, and ignites
the code just after that.
Note: With the #ifmod
and #ifnmod
directives, you can check the model type. For example, the following
Z80 code results green background on Spectrum 48K, cyan an Spectrum 128K:
.model Spectrum48
#ifmod Spectrum128
BorderColor: .equ 5
RetAddr: .equ #2604
#else
BorderColor: .equ 4
RetAddr: .equ #12a2
#endif
Start:
.org #8000
ld a,BorderColor
out (#fe),a
jp RetAddr
The directives of the spectnetide Z80 Assembler representation are used for preprocessing — similarly as in the C and C++ programming languages — though their semantics are different.
Although you can add comments to the end of directives, they may not have labels.
You can use this directive for conditional compilation. The argument of the directive is a conditional expression, and it determines on which branch the compilation goes on. #IF works in concert with #ELSE and #ENDIF:
; Block #1
#if 2 > 3
ld a,b
#endif
; Block #2;
#if 2 < 3
nop
#else
ld b,c
#endif
; Block #3
#if $ > $+2
nop
#else
ld b,c
#endif
Here, Block #1 does not generate output, since the condition is false. Block #2 emits
a nop
, as the condition is true. The fals condition value in Block #3 moves code
parsing to the #else
branch, so it emits a ld b,c
instruction.
These directives works similarly to #IF. However, these check if a particular symbol has (#IFDEF) or has not (#IFNDEF) defined. So their single argument is an identifier name.
These directives works similarly to #IF. However, these check if the code's current model is the one specified with the identifier following the #IFMOD or #IFNMOD pragma. Here is a short sample of using this directive:
.model Spectrum48
#ifmod Spectrum128
BorderColor: .equ 5
RetAddr: .equ #2604
#else
BorderColor: .equ 4
RetAddr: .equ #12a2
#endif
Start:
.org #8000
ld a,BorderColor
out (#fe),a
jp RetAddr
You can use only these identifiers with this pragma (case-insensitively): SPECTRUM48
,
SPECTRUM128
, SPECTRUMP3
, NEXT
.
With the #DEFINE directive, you can explicitly define a symbol. Such a symbol has no concrete value, just its existence. With #UNDEF you may declare a symbol undefined.
#define SYMB
; Block #1
#ifdef SYMB
ld a,b
#endif
#undef SYMB
; Block #2;
#ifdef SYMB
nop
#else
ld b,c
#endif
According to this definition, the first block emits a ld, a,b
instruction, the second one a
ld b,c
instruction.
You can use this directive to load and process a source file from within another source file.
#INCLUDE accepts a string that names a file with its extension. The file name may contain either an absolute or a relative path. When a relative path is provided, its strating point is always the source file that holds the #INCLUDE directive.
Assume that this code is in the C:\Work
folder:
#include "Symbol.z80Asm"
#include "./MyRules.z80Asm"
#include "/Common/scroll.z80Asm"
The compiler will check the C:\Work
folder for the first two include filem and
C:\Work\Commmon
for the third one.