New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

z80asm: implement zx next opcodes (was: serving fpga z80 variants) #312

Closed
aralbrec opened this Issue Aug 10, 2017 · 100 comments

Comments

Projects
None yet
5 participants
@aralbrec
Member

aralbrec commented Aug 10, 2017

Paulo do you think there can be an easy way to define additional opcodes for the z80? I am thinking in general here as z80 implementations exist for fpgas and we now have a specific case where new opcodes are being added for a machine.

The zx next, which I am working on now, is adding these instructions:

   swapnib    ED 23    A bits 7-4 swap with A bits 3-0
   mul        ED 30    multiply HL*DE = HLDE (no flags set)
   add  hl,a  ED 31    Add A to HL (no flags set)
   add  de,a  ED 32    Add A to DE (no flags set)
   add  bc,a  ED 33    Add A to BC (no flags set)
   outinb     ED 90    out (c),(hl), hl++
   ldix       ED A4    As LDI,  but if byte==A does not copy
   ldirx      ED B4    As LDIR, but if byte==A does not copy
   lddx       ED AC    As LDD,  but if byte==A does not copy, and DE is incremented
   lddrx      ED BC    As LDDR,  but if byte==A does not copy

That's a partial list as more is coming.

I have also been thinking about adding via m4 but that is not ideal because all asm would have to be in an m4 file and this would require changes in zcc. I could also do it in pre-processing with copt.

@pauloscustodio

This comment has been minimized.

Show comment
Hide comment
@pauloscustodio

pauloscustodio Aug 10, 2017

Member

Yes. I'm currently working on the full Rabbit implementation, and it would be easy to add a new cpu-type to the input files.

You can have a look at the still incomplete parser generator:
https://github.com/z88dk/z88dk/blob/feature/z80asm_rabbit/src/z80asm/tools/cpu.def
https://github.com/z88dk/z88dk/blob/feature/z80asm_rabbit/src/z80asm/tools/cpu.pl

Member

pauloscustodio commented Aug 10, 2017

Yes. I'm currently working on the full Rabbit implementation, and it would be easy to add a new cpu-type to the input files.

You can have a look at the still incomplete parser generator:
https://github.com/z88dk/z88dk/blob/feature/z80asm_rabbit/src/z80asm/tools/cpu.def
https://github.com/z88dk/z88dk/blob/feature/z80asm_rabbit/src/z80asm/tools/cpu.pl

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 11, 2017

Member

So it's possible to do in mainline z80asm now?

Then the assembler invoke would be "z80asm --cpu=cpuname". Have you thought of any naming to accommodate these devices? Maybe "z80asm --cpu=z80-zxn" (z80 variant of zx next).

Member

aralbrec commented Aug 11, 2017

So it's possible to do in mainline z80asm now?

Then the assembler invoke would be "z80asm --cpu=cpuname". Have you thought of any naming to accommodate these devices? Maybe "z80asm --cpu=z80-zxn" (z80 variant of zx next).

@pauloscustodio

This comment has been minimized.

Show comment
Hide comment
@pauloscustodio

pauloscustodio Aug 11, 2017

Member

Yes. Give me a couple of days.

I agree with your suggestion for the naming. I assume this z80-zxn CPU does not have any of the z180 and rabbit opcodes and it has all the undocumented z80 opcodes. Correct?

Member

pauloscustodio commented Aug 11, 2017

Yes. Give me a couple of days.

I agree with your suggestion for the naming. I assume this z80-zxn CPU does not have any of the z180 and rabbit opcodes and it has all the undocumented z80 opcodes. Correct?

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 11, 2017

Member

Yes everything z80 is in there, it's just the additional instructions and ones yet to come.

Member

aralbrec commented Aug 11, 2017

Yes everything z80 is in there, it's just the additional instructions and ones yet to come.

@pauloscustodio pauloscustodio self-assigned this Aug 11, 2017

@pauloscustodio

This comment has been minimized.

Show comment
Hide comment
@pauloscustodio

pauloscustodio Aug 11, 2017

Member

Just a note: MUL will be confusing:

  • in zxn it is coded as ED 30 and means HLDE := HL*DE
  • in r2k it is coded as F7 and means HLBC := BC*DE
Member

pauloscustodio commented Aug 11, 2017

Just a note: MUL will be confusing:

  • in zxn it is coded as ED 30 and means HLDE := HL*DE
  • in r2k it is coded as F7 and means HLBC := BC*DE
@feilipu

This comment has been minimized.

Show comment
Hide comment
@feilipu

feilipu Aug 12, 2017

Contributor

And, we're clear to use the MLT instruction with the z180 too?
It seems to generate correct code.

The MLT performs unsigned multiplication on two 8-bit numbers yielding a 16-bit result. MLT may specify BC, DE, HL, or SP registers. The 8-bit operands are loaded into each half of the 16-bit register and the 16-bit result is returned in that register.

Contributor

feilipu commented Aug 12, 2017

And, we're clear to use the MLT instruction with the z180 too?
It seems to generate correct code.

The MLT performs unsigned multiplication on two 8-bit numbers yielding a 16-bit result. MLT may specify BC, DE, HL, or SP registers. The 8-bit operands are loaded into each half of the 16-bit register and the 16-bit result is returned in that register.

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 12, 2017

Member

Yes z80asm has separate tables for z80, z80-zxn, z180, r2k and r3k.

Member

aralbrec commented Aug 12, 2017

Yes z80asm has separate tables for z80, z80-zxn, z180, r2k and r3k.

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 16, 2017

Member

Another update... sorry this may be a frequent activity now but people find it helpful to get it in.

New Z80 opcodes on the NEXT (more to come)
======================================================================================
T=4+           8T*    swapnib           ED 23           A bits 7-4 swap with A bits 3-0
T=4+           8T     mul d,e           ED 30           multiply DE = D*E (no flags set)
T=4+           8T     add  hl,a         ED 31           Add A to HL (no flags set) not sign extended
T=4+           8T*    add  de,a         ED 32           Add A to DE (no flags set) not sign extended
T=4+           8T*    add  bc,a         ED 33           Add A to BC (no flags set) not sign extended
M=3+, T=4           16T    add  hl,NNNN    ED 34 LO HI     Add NNNN to HL (no flags set)
M=3+, T=4           16T*   add  de,NNNN     ED 35 LO HI     Add NNNN to DE (no flags set)
M=3+, T=4           16T*   add  bc,NNNN     ED 36 LO HI     Add NNNN to BC (no flags set)
M=4+           16T    ldix              ED A4           As LDI,  but if byte==A does not copy
M=4+           21T    ldirx             ED B4           As LDIR, but if byte==A does not copy
M=4+           16T*   lddx              ED AC           As LDD,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   lddrx             ED BC           As LDDR,  but if byte==A does not copy, and DE is incremented
M=4+           14T    ldws              ED A5           (de)=(hl), l++, d++ for layer 2 vertical tile copy
M=4+           21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
M=4+           12T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A
T=4+           8T     mirror a          ED 24           mirror the bits in A
M=6+           22T*   push NNNN        ED 8A HI LO     push 16bit immediate value note big endian order

M=5+           16T*   nextreg reg,val   ED 91 reg,val   Set a NEXT register (like doing out($243b),reg then out($253b),val
M=4+           12T*   nextreg reg,a     ED 92 reg       Set a NEXT register using A (like doing out($243b),reg then out($253b),A )
** reg,val are both 8-bit numbers

T=4+           8T   pixeldn           ED 93           Move down a line on the ULA screen
T=4+           8T   pixelad           ED 94           using D,E (as Y,X) calculate the ULA screen address and store in HL
T=4+           8T   setae             ED 95           Using the lower 3 bits of E (X coordinate), set the correct bit value in A
M=2+          11T   test NN           ED 27 NN       And A with NN and set all flags. A is not affected.

** Instructions that are expected to be removed due to limited fpga space.
   They have been removed from z80asm in the current main branch.

"mul" "ld a32,dehl" "ld dehl,a32" "ex a32, dehl" "ld hl,sp" "inc dehl" "dec dehl" "add dehl,a"
"add dehl,bc" "add dehl,NN" "sub dehl,a" "sub dehl,bc" "popx" "fillde"
Memory mapping - specify which 8k ram page is placed into
the corresponding 8k slot of the z80's 64k memory space.

Originally, `mmux` were intended as instructions but they have since
been demoted to TBBLUE registers set via `nextreg`.  We're keeping
these as effective macros.

16T*  mmu0 NN           ED 91 50 NN      macro: Ram page in slot 0-8k
16T*  mmu1 NN           ED 91 51 NN      macro: Ram page in slot 8k-16k
16T*  mmu2 NN           ED 91 52 NN      macro: Ram page in slot 16k-24k
16T*  mmu3 NN           ED 91 53 NN      macro: Ram page in slot 24k-32k
16T*  mmu4 NN           ED 91 54 NN      macro: Ram page in slot 32k-40k
16T*  mmu5 NN           ED 91 55 NN      macro: Ram page in slot 40k-48k
16T*  mmu6 NN           ED 91 56 NN      macro: Ram page in slot 48k-56k
16T*  mmu7 NN           ED 91 57 NN      macro: Ram page in slot 56k-64k

12T*   mmu0 a            ED 92 50         macro: Ram page in slot 0-8k
12T*   mmu1 a            ED 92 51         macro: Ram page in slot 8k-16k
12T*   mmu2 a            ED 92 52         macro: Ram page in slot 16k-24k
12T*   mmu3 a            ED 92 53         macro: Ram page in slot 24k-32k
12T*   mmu4 a            ED 92 54         macro: Ram page in slot 32k-40k
12T*   mmu5 a            ED 92 55         macro: Ram page in slot 40k-48k
12T*   mmu6 a            ED 92 56         macro: Ram page in slot 48k-56k
12T*   mmu7 a            ED 92 57         macro: Ram page in slot 56k-64k

* Times are guesses based on other instruction times.  All of this subject to change.
Pseudo-instructions for the copper unit have been defined so that copper instructions can be
generated within z80 asm.  The instructions are namespaced with a leading "cu."
so that they are not confused with regular z80 instructions.  Each is 16-bits.

cu.wait VER,HOR    (0<=VER<=311, 0<=HOR<=55)
cu.move REG,VAL    (0<=REG<=127, 0<=VAL<=255)
cu.nop             (0x0000 equivalent to ignored "cu.move 0,0")
cu.stop            (0xffff equivalent to impossible "cu.wait 511,63")
Pseudo-instructions for the dma unit have been defined so that dma programs can be
generated within z80 asm.  The instructions are namespaced with a leading "dma." so that they
are not confused with regular z80 instructions.  The parameters to the dma instructions are
error checked.

dma.wr0 ...
dma.wr1 ...
dma.wr2 ...
dma.wr3 ...
dma.wr4 ...
dma.wr5 ...
dma.wr6 ... \
dma.cmd ... / aliases

This post gives an example dma program and this link gives technical information.

Member

aralbrec commented Aug 16, 2017

Another update... sorry this may be a frequent activity now but people find it helpful to get it in.

New Z80 opcodes on the NEXT (more to come)
======================================================================================
T=4+           8T*    swapnib           ED 23           A bits 7-4 swap with A bits 3-0
T=4+           8T     mul d,e           ED 30           multiply DE = D*E (no flags set)
T=4+           8T     add  hl,a         ED 31           Add A to HL (no flags set) not sign extended
T=4+           8T*    add  de,a         ED 32           Add A to DE (no flags set) not sign extended
T=4+           8T*    add  bc,a         ED 33           Add A to BC (no flags set) not sign extended
M=3+, T=4           16T    add  hl,NNNN    ED 34 LO HI     Add NNNN to HL (no flags set)
M=3+, T=4           16T*   add  de,NNNN     ED 35 LO HI     Add NNNN to DE (no flags set)
M=3+, T=4           16T*   add  bc,NNNN     ED 36 LO HI     Add NNNN to BC (no flags set)
M=4+           16T    ldix              ED A4           As LDI,  but if byte==A does not copy
M=4+           21T    ldirx             ED B4           As LDIR, but if byte==A does not copy
M=4+           16T*   lddx              ED AC           As LDD,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   lddrx             ED BC           As LDDR,  but if byte==A does not copy, and DE is incremented
M=4+           14T    ldws              ED A5           (de)=(hl), l++, d++ for layer 2 vertical tile copy
M=4+           21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
M=4+           12T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A
T=4+           8T     mirror a          ED 24           mirror the bits in A
M=6+           22T*   push NNNN        ED 8A HI LO     push 16bit immediate value note big endian order

M=5+           16T*   nextreg reg,val   ED 91 reg,val   Set a NEXT register (like doing out($243b),reg then out($253b),val
M=4+           12T*   nextreg reg,a     ED 92 reg       Set a NEXT register using A (like doing out($243b),reg then out($253b),A )
** reg,val are both 8-bit numbers

T=4+           8T   pixeldn           ED 93           Move down a line on the ULA screen
T=4+           8T   pixelad           ED 94           using D,E (as Y,X) calculate the ULA screen address and store in HL
T=4+           8T   setae             ED 95           Using the lower 3 bits of E (X coordinate), set the correct bit value in A
M=2+          11T   test NN           ED 27 NN       And A with NN and set all flags. A is not affected.

** Instructions that are expected to be removed due to limited fpga space.
   They have been removed from z80asm in the current main branch.

"mul" "ld a32,dehl" "ld dehl,a32" "ex a32, dehl" "ld hl,sp" "inc dehl" "dec dehl" "add dehl,a"
"add dehl,bc" "add dehl,NN" "sub dehl,a" "sub dehl,bc" "popx" "fillde"
Memory mapping - specify which 8k ram page is placed into
the corresponding 8k slot of the z80's 64k memory space.

Originally, `mmux` were intended as instructions but they have since
been demoted to TBBLUE registers set via `nextreg`.  We're keeping
these as effective macros.

16T*  mmu0 NN           ED 91 50 NN      macro: Ram page in slot 0-8k
16T*  mmu1 NN           ED 91 51 NN      macro: Ram page in slot 8k-16k
16T*  mmu2 NN           ED 91 52 NN      macro: Ram page in slot 16k-24k
16T*  mmu3 NN           ED 91 53 NN      macro: Ram page in slot 24k-32k
16T*  mmu4 NN           ED 91 54 NN      macro: Ram page in slot 32k-40k
16T*  mmu5 NN           ED 91 55 NN      macro: Ram page in slot 40k-48k
16T*  mmu6 NN           ED 91 56 NN      macro: Ram page in slot 48k-56k
16T*  mmu7 NN           ED 91 57 NN      macro: Ram page in slot 56k-64k

12T*   mmu0 a            ED 92 50         macro: Ram page in slot 0-8k
12T*   mmu1 a            ED 92 51         macro: Ram page in slot 8k-16k
12T*   mmu2 a            ED 92 52         macro: Ram page in slot 16k-24k
12T*   mmu3 a            ED 92 53         macro: Ram page in slot 24k-32k
12T*   mmu4 a            ED 92 54         macro: Ram page in slot 32k-40k
12T*   mmu5 a            ED 92 55         macro: Ram page in slot 40k-48k
12T*   mmu6 a            ED 92 56         macro: Ram page in slot 48k-56k
12T*   mmu7 a            ED 92 57         macro: Ram page in slot 56k-64k

* Times are guesses based on other instruction times.  All of this subject to change.
Pseudo-instructions for the copper unit have been defined so that copper instructions can be
generated within z80 asm.  The instructions are namespaced with a leading "cu."
so that they are not confused with regular z80 instructions.  Each is 16-bits.

cu.wait VER,HOR    (0<=VER<=311, 0<=HOR<=55)
cu.move REG,VAL    (0<=REG<=127, 0<=VAL<=255)
cu.nop             (0x0000 equivalent to ignored "cu.move 0,0")
cu.stop            (0xffff equivalent to impossible "cu.wait 511,63")
Pseudo-instructions for the dma unit have been defined so that dma programs can be
generated within z80 asm.  The instructions are namespaced with a leading "dma." so that they
are not confused with regular z80 instructions.  The parameters to the dma instructions are
error checked.

dma.wr0 ...
dma.wr1 ...
dma.wr2 ...
dma.wr3 ...
dma.wr4 ...
dma.wr5 ...
dma.wr6 ... \
dma.cmd ... / aliases

This post gives an example dma program and this link gives technical information.

@suborb

This comment has been minimized.

Show comment
Hide comment
@suborb

suborb Aug 16, 2017

Member

Any chance of an ld hl,(sp+dd) - that would be really useful.

And yes, swapping round to use dehl would allow us to use it without an ex.

Member

suborb commented Aug 16, 2017

Any chance of an ld hl,(sp+dd) - that would be really useful.

And yes, swapping round to use dehl would allow us to use it without an ex.

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 16, 2017

Member

They've changed the order to DEHL for us, very nice :) I'll update the instruction list above.

Any chance of an ld hl,(sp+dd) - that would be really useful.

It sounds like they are open to that too.

What is missing really?

ld r,(sp+d)
ld rp,(sp+d)
ld (sp+d),r
ld (sp+d),rp

ld ix,sp ;; mirrors ld hl,sp already added
ld iy,sp

add sp,nn

Member

aralbrec commented Aug 16, 2017

They've changed the order to DEHL for us, very nice :) I'll update the instruction list above.

Any chance of an ld hl,(sp+dd) - that would be really useful.

It sounds like they are open to that too.

What is missing really?

ld r,(sp+d)
ld rp,(sp+d)
ld (sp+d),r
ld (sp+d),rp

ld ix,sp ;; mirrors ld hl,sp already added
ld iy,sp

add sp,nn

@suborb

This comment has been minimized.

Show comment
Hide comment
@suborb

suborb Aug 16, 2017

Member

That's brilliant. Creating a zxn_rules.1 along the lines of the rabbit version should be easy and allow us to use some of those opcodes in a trivial way.

From the rabbit file there's these extra addressing modes that are useful for C code:

ld hl,(sp+n) - n unsigned byte
ld ix,(sp +n)
ld iy,(sp +n)
ld (sp+n),hl
ld (sp+n),ix
ld (sp+n),iy
ld hl,(hl + d) - d is signed byte
ld hl,(ix + d)
ld hl,(iy + d)
add sp,d

Supporting the other pairs instead of hl would be useful, but not hugely critical really.

From the rabbit file, it looks like I made quite a lot of use of bool hl, this basically turns hl into a boolean and sets flags. Thus a comparison to zero is easy and a true boolean value is yielded. In the rabbit world, this is a single byte opcode which makes it particularly efficient.

These are useful:

and hl, de
or hl, de
xor hl,de

So: and|or|xor hl,NNNN to cut out the ld de,NNNN

neg hl
rr hl
rl hl

Member

suborb commented Aug 16, 2017

That's brilliant. Creating a zxn_rules.1 along the lines of the rabbit version should be easy and allow us to use some of those opcodes in a trivial way.

From the rabbit file there's these extra addressing modes that are useful for C code:

ld hl,(sp+n) - n unsigned byte
ld ix,(sp +n)
ld iy,(sp +n)
ld (sp+n),hl
ld (sp+n),ix
ld (sp+n),iy
ld hl,(hl + d) - d is signed byte
ld hl,(ix + d)
ld hl,(iy + d)
add sp,d

Supporting the other pairs instead of hl would be useful, but not hugely critical really.

From the rabbit file, it looks like I made quite a lot of use of bool hl, this basically turns hl into a boolean and sets flags. Thus a comparison to zero is easy and a true boolean value is yielded. In the rabbit world, this is a single byte opcode which makes it particularly efficient.

These are useful:

and hl, de
or hl, de
xor hl,de

So: and|or|xor hl,NNNN to cut out the ld de,NNNN

neg hl
rr hl
rl hl

@suborb

This comment has been minimized.

Show comment
Hide comment
@suborb

suborb Aug 16, 2017

Member

I've checked in a zxn_rules.1 file, but not hooked it in as of yet, you'll need to add:

COPTRULESCPU    DESTDIR/lib/zxn_rules.1

to the zxn.cfg.

Test file attached to show it working (probably only with sccz80 at the moment)

Issue_312_zxn_optimisations.txt

Member

suborb commented Aug 16, 2017

I've checked in a zxn_rules.1 file, but not hooked it in as of yet, you'll need to add:

COPTRULESCPU    DESTDIR/lib/zxn_rules.1

to the zxn.cfg.

Test file attached to show it working (probably only with sccz80 at the moment)

Issue_312_zxn_optimisations.txt

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 17, 2017

Member

There is only one emulator currently that is accepting the opcodes and it's only a partial emulation so I think we should wait a bit before enabling the opcodes outside the assembler.

Member

aralbrec commented Aug 17, 2017

There is only one emulator currently that is accepting the opcodes and it's only a partial emulation so I think we should wait a bit before enabling the opcodes outside the assembler.

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 19, 2017

Member

@pauloscustodio Any chance the new list up there can be included soon?

A complete emulator (ZEsarUX) is going to implement the instructions so once that's there, I think it's safe to enable them in the entire toolchain.

Member

aralbrec commented Aug 19, 2017

@pauloscustodio Any chance the new list up there can be included soon?

A complete emulator (ZEsarUX) is going to implement the instructions so once that's there, I think it's safe to enable them in the entire toolchain.

@pauloscustodio

This comment has been minimized.

Show comment
Hide comment
@pauloscustodio

pauloscustodio Aug 19, 2017

Member

I'm on it...

  1. Shouldn't fillde be fill de (as in verb-object)?
  2. Shouldn't test N be tst N to be the same as on the Z180?
Member

pauloscustodio commented Aug 19, 2017

I'm on it...

  1. Shouldn't fillde be fill de (as in verb-object)?
  2. Shouldn't test N be tst N to be the same as on the Z180?
@pauloscustodio

This comment has been minimized.

Show comment
Hide comment
@pauloscustodio

pauloscustodio Aug 19, 2017

Member

I've committed the change to implement the current opcodes.
I've changed test to be the same as the Z180, tst N.

Member

pauloscustodio commented Aug 19, 2017

I've committed the change to implement the current opcodes.
I've changed test to be the same as the Z180, tst N.

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 20, 2017

Member

We don't have any control of the opcode names... they should probably conform to existing instructions on other z80 derivatives where they are the same but all we can do is suggest.

test should be tst (z180)
swapnib should be swap (gameboy z80)

I'm not sure if there are any others in there.

Member

aralbrec commented Aug 20, 2017

We don't have any control of the opcode names... they should probably conform to existing instructions on other z80 derivatives where they are the same but all we can do is suggest.

test should be tst (z180)
swapnib should be swap (gameboy z80)

I'm not sure if there are any others in there.

@pauloscustodio

This comment has been minimized.

Show comment
Hide comment
@pauloscustodio

pauloscustodio Aug 20, 2017

Member

I've committed the following variants of z80-zxn opcodes:

  • swap as synonym to swapnib
  • fill de as synonym to fillde
  • test as synonym to tst
Member

pauloscustodio commented Aug 20, 2017

I've committed the following variants of z80-zxn opcodes:

  • swap as synonym to swapnib
  • fill de as synonym to fillde
  • test as synonym to tst
@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 24, 2017

Member

New instructions that would help most directly would include stack relative addressing.

I looked at two sources for inspiration:

  1. The Rabbit family of processors which are based on the z80 and z180. They were
    also constrained by opcode space and added some instructions for their compiler.

https://github.com/z88dk/techdocs/blob/master/rabbit/RabbitInstructions.pdf

  1. The Z380 family by Zilog. They expanded the instruction set and have a full
    set of stack relative instructions as well as 16-bit logical, multiplication,
    division and accessing the exx set in a finer grain. I highly recommend that Victor
    look at how Zilog added the instructions while remaining compatible (they defined
    escape opcodes in the instruction listing, eg) and borrow some of their names.
    Their degree of compatibility won't be as high because the z80 in the next must
    also implement crazy undocumented opcodes.

See Appendix B page 228 for an alphabetical listing:
https://github.com/z88dk/techdocs/blob/master/zilog/z380_cpu_um.pdf

Instructions that would help most for the c compilers, given constraints:

  • ld hl,(sp+n)
    ld rp,(sp+n) \
    ld r,(sp+n) / Take up opcode space, in z380

  • ld (sp+n),hl
    ld (sp+n),rp \
    ld (sp+n),r / Take up opcode space, in z380

  • add sp,d

ld ix,sp

more important if stack relative addressing isn't possible:

  • ld hl,(ix+d)
    ld rp,(ix+d)

  • ld (ix+d),hl
    ld (ix+d),rp

In the above "d" is a two's complement 8-bit number but "n"
is an unsigned 8-bit number. The z380 treats "n" as signed
for stack relative addressing too but it doesn't make too
much sense to index below the stack pointer. If you're sharing
the ix+d logic with sp+n then two's complement will happen.

the rabbit constrained itself to these but the z380 goes whole hog
including constant NNNN operand:

andw hl,de
orw hl,de
xorw hl,de

negw hl
rlw de
rrw hl
rrw de

The rabbit added this:

bool hl (or rp)

to convert non-zero value to 1.

If you can wait another week I will be rewriting the integer math so maybe
something will come out of that.

BTW, the "TEST" instruction seems to be the same as "TST" for z180 and later.
"SWAPNIB" has similar function to the gameboy's swap and the z380's SWAP.
It may be a good idea to use the same mnemonics where it makes sense.

Member

aralbrec commented Aug 24, 2017

New instructions that would help most directly would include stack relative addressing.

I looked at two sources for inspiration:

  1. The Rabbit family of processors which are based on the z80 and z180. They were
    also constrained by opcode space and added some instructions for their compiler.

https://github.com/z88dk/techdocs/blob/master/rabbit/RabbitInstructions.pdf

  1. The Z380 family by Zilog. They expanded the instruction set and have a full
    set of stack relative instructions as well as 16-bit logical, multiplication,
    division and accessing the exx set in a finer grain. I highly recommend that Victor
    look at how Zilog added the instructions while remaining compatible (they defined
    escape opcodes in the instruction listing, eg) and borrow some of their names.
    Their degree of compatibility won't be as high because the z80 in the next must
    also implement crazy undocumented opcodes.

See Appendix B page 228 for an alphabetical listing:
https://github.com/z88dk/techdocs/blob/master/zilog/z380_cpu_um.pdf

Instructions that would help most for the c compilers, given constraints:

  • ld hl,(sp+n)
    ld rp,(sp+n) \
    ld r,(sp+n) / Take up opcode space, in z380

  • ld (sp+n),hl
    ld (sp+n),rp \
    ld (sp+n),r / Take up opcode space, in z380

  • add sp,d

ld ix,sp

more important if stack relative addressing isn't possible:

  • ld hl,(ix+d)
    ld rp,(ix+d)

  • ld (ix+d),hl
    ld (ix+d),rp

In the above "d" is a two's complement 8-bit number but "n"
is an unsigned 8-bit number. The z380 treats "n" as signed
for stack relative addressing too but it doesn't make too
much sense to index below the stack pointer. If you're sharing
the ix+d logic with sp+n then two's complement will happen.

the rabbit constrained itself to these but the z380 goes whole hog
including constant NNNN operand:

andw hl,de
orw hl,de
xorw hl,de

negw hl
rlw de
rrw hl
rrw de

The rabbit added this:

bool hl (or rp)

to convert non-zero value to 1.

If you can wait another week I will be rewriting the integer math so maybe
something will come out of that.

BTW, the "TEST" instruction seems to be the same as "TST" for z180 and later.
"SWAPNIB" has similar function to the gameboy's swap and the z380's SWAP.
It may be a good idea to use the same mnemonics where it makes sense.

@aralbrec aralbrec referenced this issue Aug 25, 2017

Merged

Ticks debugger #325

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Aug 27, 2017

Member

@pauloscustodio @suborb

A couple of new instructions also added to the main list above:

21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
14T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A

* = guessing

ldirscale is going to scale a source graphic up or down in size. If DE' > 1 then there will be an exploding effect in the destination (pixels will be skipped).

ldpirx is intended as a pattern lookup for fills.

Member

aralbrec commented Aug 27, 2017

@pauloscustodio @suborb

A couple of new instructions also added to the main list above:

21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
14T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A

* = guessing

ldirscale is going to scale a source graphic up or down in size. If DE' > 1 then there will be an exploding effect in the destination (pixels will be skipped).

ldpirx is intended as a pattern lookup for fills.

@aralbrec aralbrec reopened this Aug 27, 2017

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Mar 11, 2018

Member

I assume that here too the OR is not to be done.

In the rest of them the OR should be there. Those bits must be set to identify wr1, wr2, etc.

Member

aralbrec commented Mar 11, 2018

I assume that here too the OR is not to be done.

In the rest of them the OR should be there. Those bits must be set to identify wr1, wr2, etc.

pauloscustodio added a commit that referenced this issue Mar 12, 2018

pauloscustodio added a commit that referenced this issue Mar 14, 2018

pauloscustodio added a commit that referenced this issue Mar 14, 2018

pauloscustodio added a commit that referenced this issue Mar 14, 2018

pauloscustodio added a commit that referenced this issue Mar 15, 2018

@pauloscustodio

This comment has been minimized.

Show comment
Hide comment
@pauloscustodio

pauloscustodio Mar 15, 2018

Member

I've completed the implementation of the dma commands on branch feature/z80asm_zxn_dma.
Please have a run before I merge.

The dma commands are available in all CPUs, not only z80-zxn, but I suppose some of the error messages are ZX Next-specific. Should the dma commands be available only in --cpu=z80-zxn?

Member

pauloscustodio commented Mar 15, 2018

I've completed the implementation of the dma commands on branch feature/z80asm_zxn_dma.
Please have a run before I merge.

The dma commands are available in all CPUs, not only z80-zxn, but I suppose some of the error messages are ZX Next-specific. Should the dma commands be available only in --cpu=z80-zxn?

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Mar 16, 2018

Member

That's great. I will give them a try now.

Yes for now they should be z80-zxn specific and I will start issues for z80 and z180. I don't know if the rabbits have dma units or not. The z80 dma is very close but it has a few more things in.

Member

aralbrec commented Mar 16, 2018

That's great. I will give them a try now.

Yes for now they should be z80-zxn specific and I will start issues for z80 and z180. I don't know if the rabbits have dma units or not. The z80 dma is very close but it has a few more things in.

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Mar 16, 2018

Member

Yes I think it's working. I also tested a few externs with the argument list.

The example dma program from a few posts up:

include "include/zxn-dma.inc"

dma.wr0 __D_WR0_TRANSFER_A_TO_B | __D_WR0_X34_A_START | __D_WR0_X56_LEN, 0xc000, 6912  ;; 6912 was 0x4000
dma.wr1 __D_WR1_A_IS_MEM_INC | __D_WR1_X6_A_TIMING, __D_WR1X6_A_CLEN_2
dma.wr2 __D_WR2_B_IS_MEM_INC | __D_WR2_X6_B_TIMING, __D_WR2X6_B_CLEN_2
dma.wr4 __D_WR4_CONT | __D_WR4_X23_B_START, 0x4000
dma.wr5 __D_WR5_STOP
dma.cmd __D_LOAD
dma.cmd __D_ENABLE_DMA

Also can you add pop x with a space before the x? This is going to be the official mnemonic. At the moment we have popx without space.

Member

aralbrec commented Mar 16, 2018

Yes I think it's working. I also tested a few externs with the argument list.

The example dma program from a few posts up:

include "include/zxn-dma.inc"

dma.wr0 __D_WR0_TRANSFER_A_TO_B | __D_WR0_X34_A_START | __D_WR0_X56_LEN, 0xc000, 6912  ;; 6912 was 0x4000
dma.wr1 __D_WR1_A_IS_MEM_INC | __D_WR1_X6_A_TIMING, __D_WR1X6_A_CLEN_2
dma.wr2 __D_WR2_B_IS_MEM_INC | __D_WR2_X6_B_TIMING, __D_WR2X6_B_CLEN_2
dma.wr4 __D_WR4_CONT | __D_WR4_X23_B_START, 0x4000
dma.wr5 __D_WR5_STOP
dma.cmd __D_LOAD
dma.cmd __D_ENABLE_DMA

Also can you add pop x with a space before the x? This is going to be the official mnemonic. At the moment we have popx without space.

@pauloscustodio

This comment has been minimized.

Show comment
Hide comment
@pauloscustodio

pauloscustodio Mar 16, 2018

Member

Please create a new issue in the future. It gets hard to track what was done when with all the re-opening.
I will create one issue for pop x.

Member

pauloscustodio commented Mar 16, 2018

Please create a new issue in the future. It gets hard to track what was done when with all the re-opening.
I will create one issue for pop x.

@spth

This comment has been minimized.

Show comment
Hide comment
@spth

spth May 11, 2018

The Z180 multiplication instructions also were quite useful in SDCC (or having at least one of them, preferably the one for hl). After all, often an 8x8->16 multiplication is sufficient. It seems even more useful than the 16x16->32 Rabbit one, since the 16x16->32 makes too many registers unavailable for other purposes.

Philipp

spth commented May 11, 2018

The Z180 multiplication instructions also were quite useful in SDCC (or having at least one of them, preferably the one for hl). After all, often an 8x8->16 multiplication is sufficient. It seems even more useful than the 16x16->32 Rabbit one, since the 16x16->32 makes too many registers unavailable for other purposes.

Philipp

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec May 12, 2018

Member

They have the one for de only, is that still useful without excessive shuffling to hl?

We've started on some integer multiplies for the target library here.

I would like to see them bring in signed versions of some of these instructions (add hl,a ; mul d,e) and return of the 16x16->32 bit multiply they dropped due to limited fpga space. But this is not likely to happen from the core team itself - more as a proposed patch from reviewers - so whether it will or can happen is a question.

Member

aralbrec commented May 12, 2018

They have the one for de only, is that still useful without excessive shuffling to hl?

We've started on some integer multiplies for the target library here.

I would like to see them bring in signed versions of some of these instructions (add hl,a ; mul d,e) and return of the 16x16->32 bit multiply they dropped due to limited fpga space. But this is not likely to happen from the core team itself - more as a proposed patch from reviewers - so whether it will or can happen is a question.

@spth

This comment has been minimized.

Show comment
Hide comment
@spth

spth May 12, 2018

Where can I find the instruction set?
http://devnext.referata.com/wiki/Extended_Z80_instruction_set
Mentions the 16x16->32 multiply, but not 8x8->16.
Also, it seems quite unfortunate, that there is no equivalent of the Rabbits 16-bit load instructions ld hl, (hl+d) and ld hl, (ix+d)/ld hl, (sp+d) and ld (sp+d), hl/ld (ix+d), hl. Thos really make a big difference for 16-bit code.

mlt de is fine, the advantage of mlt hl over mlt de should be very small.

Philipp

spth commented May 12, 2018

Where can I find the instruction set?
http://devnext.referata.com/wiki/Extended_Z80_instruction_set
Mentions the 16x16->32 multiply, but not 8x8->16.
Also, it seems quite unfortunate, that there is no equivalent of the Rabbits 16-bit load instructions ld hl, (hl+d) and ld hl, (ix+d)/ld hl, (sp+d) and ld (sp+d), hl/ld (ix+d), hl. Thos really make a big difference for 16-bit code.

mlt de is fine, the advantage of mlt hl over mlt de should be very small.

Philipp

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec May 13, 2018

Member

The most accurate information is in z88dk. The wiki is maintained by the community who don't keep up / have access to some of the information.

#312 (comment)

Member

aralbrec commented May 13, 2018

The most accurate information is in z88dk. The wiki is maintained by the community who don't keep up / have access to some of the information.

#312 (comment)

@spth

This comment has been minimized.

Show comment
Hide comment
@spth

spth May 25, 2018

So here's my comment on the instruction in #312 (comment) from a compiler writer perspective (SDCC). I'll comment on how useful I consider the new instructions, and suggest to rename some (since they already exist in other Z80-derivatives under a different name). I will make a later post with suggestions for additional instructions.

T=4+           8T*    swapnib           ED 23           A bits 7-4 swap with A bits 3-0

This one is useful for speeding up some shifts. SDCC already emits this instruction for gbz80. For consistency with gbz80. I suggest to rename this instruction "swap" (as it is called on the GameBoy).

T=4+           8T     mul d,e           ED 30           multiply DE = D*E (no flags set)

This one is very useful. Both for the very common 8x8->16 multiplications (either explicit or for array addressing) and as building block for the support routines for wider multiplications. SDCC already emits this instruction for z180. I suggest to rename this instruction "mlt de" (as it is called on the Z180).

T=4+           8T     add  hl,a         ED 31           Add A to HL (no flags set) not sign extended
T=4+           8T*    add  de,a         ED 32           Add A to DE (no flags set) not sign extended
T=4+           8T*    add  bc,a         ED 33           Add A to BC (no flags set) not sign extended

I do not have experience with such instructions. But I believe they will be useful, e.g. for using an 8-bit index into a char array.

M=3+, T=4           16T    add  hl,NNNN    ED 34 LO HI     Add NNNN to HL (no flags set)
M=3+, T=4           16T*   add  de,NNNN     ED 35 LO HI     Add NNNN to DE (no flags set)

I don't see much point in those. The only point is reducing register pressure a bit. I don't see a godd use case in SDCC that benefits from not setting flags. So those are not really an advantage over something like

ld de, NNNN
ld hl, de

Which is the same code size as the first proposed new instruction, and via ex de, hl can also be used instead of the second proposed new instruction at just one byte of extra code size.

M=3+, T=4           16T*   add  bc,NNNN     ED 36 LO HI     Add NNNN to BC (no flags set)

This one seems a bit more useful, since transferring the addition result from hl to bc would take two bytes. However, I again don't see an advantage for not setting flags. In fact, if this one would set flags (or wouldbe changed to adc) it could be even more useful as a building block for 32- and 64-bit additions.

M=4+           16T*      outinb            ED 90           out (c),(hl), hl++

I can see the point in this one, even though SDCC would not emit it (but it looks good for use in asm code).

M=4+           16T    ldix              ED A4           As LDI,  but if byte==A does not copy
M=4+           21T    ldirx             ED B4           As LDIR, but if byte==A does not copy
M=4+           16T*   lddx              ED AC           As LDD,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   lddrx             ED BC           As LDDR,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
M=4+           12T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A

I currently do not see the point in those. They look like made for some specific use case that I do not know about. Do they really just skip a byte if it is A (if they stopped instead they might be useful for string processing)?

T=4+           8T     mirror a          ED 24           mirror the bits in A     
T=4+           8T     mirror de         ED 26           mirror the bits in DE     

Those don't look that useful to me. I can see that mirroring bits is hard to do without them in the Z80 instruction set. But the need for mirroring bits is very rare in my experience. And SDCC would not be able to detect C code mirroring bits easily, so it would not emit those.

M=6+           22T*   push NNNN        ED 8A HI LO     push 16bit immediate value note big endian order

I don't see much point in this one. The only advantage it provides is reducing register pressure. Otherwise it provides no advantage over

ld qq, NNNN
push qq

Which is also just 4 bytes of code.

M=3+           8T*    pop x            ED 8B           discard word on stack (inc sp; inc sp)

I don't see the point at all.

inc sp
inc sp

Does exactly the same at exactly the same cost in code size.

M=5+           16T*   nextreg reg,val   ED 91 reg,val   Set a NEXT register (like doing out($243b),reg then out($253b),val
M=4+           12T*   nextreg reg,a     ED 92 reg       Set a NEXT register using A (like doing out($243b),reg then out($253b),A )
** reg,val are both 8-bit numbers

T=4+           8T   pixeldn           ED 93           Move down a line on the ULA screen
T=4+           8T   pixelad           ED 94           using D,E (as Y,X) calculate the ULA screen address and store in HL
T=4+           8T   setae             ED 95           Using the lower 3 bits of E (X coordinate), set the correct bit value in A

Looks like stuff specific to the peripherals of the device. Is it really worth using opcodes as opposed to some I/O location?

M=2+          11T   test NN           ED 27 NN       And A with NN and set all flags. A is not affected.

This can be useful for some code, but such code is not that common. SDCC already emits this instruction for z180. I suggest to rename it "tst" (as it is called on the Z180).

Summary:

Great, very useful in SDCC:
mul d, e/mlt de

Useful for SDCC:
swapnib/swap a
add hl,a
add de,a
add bc,a
add bc,NNNN
test/tst NN

Marginally useful for SDCC:
push NNNN
add de,NNNN
push NNNN

Nearly useless for SDCC:
add hl,NNNN
mirror a
mirror de
pop x

spth commented May 25, 2018

So here's my comment on the instruction in #312 (comment) from a compiler writer perspective (SDCC). I'll comment on how useful I consider the new instructions, and suggest to rename some (since they already exist in other Z80-derivatives under a different name). I will make a later post with suggestions for additional instructions.

T=4+           8T*    swapnib           ED 23           A bits 7-4 swap with A bits 3-0

This one is useful for speeding up some shifts. SDCC already emits this instruction for gbz80. For consistency with gbz80. I suggest to rename this instruction "swap" (as it is called on the GameBoy).

T=4+           8T     mul d,e           ED 30           multiply DE = D*E (no flags set)

This one is very useful. Both for the very common 8x8->16 multiplications (either explicit or for array addressing) and as building block for the support routines for wider multiplications. SDCC already emits this instruction for z180. I suggest to rename this instruction "mlt de" (as it is called on the Z180).

T=4+           8T     add  hl,a         ED 31           Add A to HL (no flags set) not sign extended
T=4+           8T*    add  de,a         ED 32           Add A to DE (no flags set) not sign extended
T=4+           8T*    add  bc,a         ED 33           Add A to BC (no flags set) not sign extended

I do not have experience with such instructions. But I believe they will be useful, e.g. for using an 8-bit index into a char array.

M=3+, T=4           16T    add  hl,NNNN    ED 34 LO HI     Add NNNN to HL (no flags set)
M=3+, T=4           16T*   add  de,NNNN     ED 35 LO HI     Add NNNN to DE (no flags set)

I don't see much point in those. The only point is reducing register pressure a bit. I don't see a godd use case in SDCC that benefits from not setting flags. So those are not really an advantage over something like

ld de, NNNN
ld hl, de

Which is the same code size as the first proposed new instruction, and via ex de, hl can also be used instead of the second proposed new instruction at just one byte of extra code size.

M=3+, T=4           16T*   add  bc,NNNN     ED 36 LO HI     Add NNNN to BC (no flags set)

This one seems a bit more useful, since transferring the addition result from hl to bc would take two bytes. However, I again don't see an advantage for not setting flags. In fact, if this one would set flags (or wouldbe changed to adc) it could be even more useful as a building block for 32- and 64-bit additions.

M=4+           16T*      outinb            ED 90           out (c),(hl), hl++

I can see the point in this one, even though SDCC would not emit it (but it looks good for use in asm code).

M=4+           16T    ldix              ED A4           As LDI,  but if byte==A does not copy
M=4+           21T    ldirx             ED B4           As LDIR, but if byte==A does not copy
M=4+           16T*   lddx              ED AC           As LDD,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   lddrx             ED BC           As LDDR,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
M=4+           12T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A

I currently do not see the point in those. They look like made for some specific use case that I do not know about. Do they really just skip a byte if it is A (if they stopped instead they might be useful for string processing)?

T=4+           8T     mirror a          ED 24           mirror the bits in A     
T=4+           8T     mirror de         ED 26           mirror the bits in DE     

Those don't look that useful to me. I can see that mirroring bits is hard to do without them in the Z80 instruction set. But the need for mirroring bits is very rare in my experience. And SDCC would not be able to detect C code mirroring bits easily, so it would not emit those.

M=6+           22T*   push NNNN        ED 8A HI LO     push 16bit immediate value note big endian order

I don't see much point in this one. The only advantage it provides is reducing register pressure. Otherwise it provides no advantage over

ld qq, NNNN
push qq

Which is also just 4 bytes of code.

M=3+           8T*    pop x            ED 8B           discard word on stack (inc sp; inc sp)

I don't see the point at all.

inc sp
inc sp

Does exactly the same at exactly the same cost in code size.

M=5+           16T*   nextreg reg,val   ED 91 reg,val   Set a NEXT register (like doing out($243b),reg then out($253b),val
M=4+           12T*   nextreg reg,a     ED 92 reg       Set a NEXT register using A (like doing out($243b),reg then out($253b),A )
** reg,val are both 8-bit numbers

T=4+           8T   pixeldn           ED 93           Move down a line on the ULA screen
T=4+           8T   pixelad           ED 94           using D,E (as Y,X) calculate the ULA screen address and store in HL
T=4+           8T   setae             ED 95           Using the lower 3 bits of E (X coordinate), set the correct bit value in A

Looks like stuff specific to the peripherals of the device. Is it really worth using opcodes as opposed to some I/O location?

M=2+          11T   test NN           ED 27 NN       And A with NN and set all flags. A is not affected.

This can be useful for some code, but such code is not that common. SDCC already emits this instruction for z180. I suggest to rename it "tst" (as it is called on the Z180).

Summary:

Great, very useful in SDCC:
mul d, e/mlt de

Useful for SDCC:
swapnib/swap a
add hl,a
add de,a
add bc,a
add bc,NNNN
test/tst NN

Marginally useful for SDCC:
push NNNN
add de,NNNN
push NNNN

Nearly useless for SDCC:
add hl,NNNN
mirror a
mirror de
pop x

@spth

This comment has been minimized.

Show comment
Hide comment
@spth

spth May 25, 2018

Having worked on various SDCC backends, including all z80-related ones (gbz80, z180, r2k, r3ka, tlcs90) I noticed some instruction being particularly useful, and making a big difference in code size and code speed. In particular, there are some instructions in the Rabbit that are used by SDCC resulting in much lower code size for the r2k/r3ka backends vs. the z80 backend. If possible, I'd like to see some implemented in the zxn.

ld hl, (hl+NN)

Load hl with the value at the address sum of hl and an 8-bit offset (unsigned is preferable but it deosn't matter much).
This instruction, present in the Rabbits is very useful when working with pointers. Pointers are very common in C code, sometimes explicit, sometimes implicit. Example use cases: Reading a 16-bit value from a pointer. Reading a 16-bit value from a fixed offset into an array. Reading a 16-bit value from a member of a pointed-to struct (e.g. for traversing linked lists).

ld (sp+N), hl
ld (sp+N), de
ld (sp+N), bc
ld (ix+d), hl
ld (ix+d), de
ld (ix+d), bc
ld hl, (sp+N)
ld de, (sp+N)
ld bc, (sp+N)
ld hl, (ix+d)
ld de, (ix+d)
ld bc, (ix+d)

The ix variant is essentially an alternative to the sp variant. Implementing both probably doesn't make that much sense. These instruction are present in the Rabbit.

For C, variables (and function arguments) that cannot be allocated to registers are placed on the stack. Using ix as a stack pointer is an okish way of accessing the stack for 8-bit variables, but it still comes with too much overhead. These instruction allow efficient transfer of 16-bit values between registers and the stack.

bool hl

This instruction present in the Rabbit casts the value in hl to bool and sets the flags accordingly (the z flag is what really matters).
The instruction has a variety of uses. The first one obviously being casts to bool. It also helps a lot with testing 16-bit values for being zero, which is quite common, also for pointers. Another use is efficient zeroing of h (e.g. before loading an 8-bit value into l, and then adding some 16-bit value present in de or bc).

sex gg

This instruction (not implemented in any of the architectures currently supported by SDCC) would for an 16-bit register gg, sign-extend the value in the lower 8 bits into the full 16-bit register. Even having this instruction for just one register pair out of hl, de, bc would be very useful.
For efficiency, on 8-bit systems 8-bit values are used a lot. But sometimes 16 bits are needed for range; the Z80 has 16-bit addresses and the C standard sometimes requires promotion to int. Thus 8-bit values often need to be cast to 16-bit values. For unsigned values, this can easily be done by zero-extending. But for signed values one has to generate relatively complex code. Having direct support in the instruction set would be quite useful; the instruction could also be sued as a building block for wider casts (i.e. 8 to 32 bits, 16 to 32 bits, to 64 bits).

add sp, d

This instruction present on the Rabbit adds a signed 8-bit value to the stack pointer.
After a function call with stack parameters, the stack pointer needs to be adjusted. Similar for function entry and exit at functions that store local variables on the stack. Adjusting the stack pointer is thus a very common task. Unfortunately, on the Z80 doing so is quite complex (except for small values, where inc/dec sp and push/pop can be sued).

Philipp

spth commented May 25, 2018

Having worked on various SDCC backends, including all z80-related ones (gbz80, z180, r2k, r3ka, tlcs90) I noticed some instruction being particularly useful, and making a big difference in code size and code speed. In particular, there are some instructions in the Rabbit that are used by SDCC resulting in much lower code size for the r2k/r3ka backends vs. the z80 backend. If possible, I'd like to see some implemented in the zxn.

ld hl, (hl+NN)

Load hl with the value at the address sum of hl and an 8-bit offset (unsigned is preferable but it deosn't matter much).
This instruction, present in the Rabbits is very useful when working with pointers. Pointers are very common in C code, sometimes explicit, sometimes implicit. Example use cases: Reading a 16-bit value from a pointer. Reading a 16-bit value from a fixed offset into an array. Reading a 16-bit value from a member of a pointed-to struct (e.g. for traversing linked lists).

ld (sp+N), hl
ld (sp+N), de
ld (sp+N), bc
ld (ix+d), hl
ld (ix+d), de
ld (ix+d), bc
ld hl, (sp+N)
ld de, (sp+N)
ld bc, (sp+N)
ld hl, (ix+d)
ld de, (ix+d)
ld bc, (ix+d)

The ix variant is essentially an alternative to the sp variant. Implementing both probably doesn't make that much sense. These instruction are present in the Rabbit.

For C, variables (and function arguments) that cannot be allocated to registers are placed on the stack. Using ix as a stack pointer is an okish way of accessing the stack for 8-bit variables, but it still comes with too much overhead. These instruction allow efficient transfer of 16-bit values between registers and the stack.

bool hl

This instruction present in the Rabbit casts the value in hl to bool and sets the flags accordingly (the z flag is what really matters).
The instruction has a variety of uses. The first one obviously being casts to bool. It also helps a lot with testing 16-bit values for being zero, which is quite common, also for pointers. Another use is efficient zeroing of h (e.g. before loading an 8-bit value into l, and then adding some 16-bit value present in de or bc).

sex gg

This instruction (not implemented in any of the architectures currently supported by SDCC) would for an 16-bit register gg, sign-extend the value in the lower 8 bits into the full 16-bit register. Even having this instruction for just one register pair out of hl, de, bc would be very useful.
For efficiency, on 8-bit systems 8-bit values are used a lot. But sometimes 16 bits are needed for range; the Z80 has 16-bit addresses and the C standard sometimes requires promotion to int. Thus 8-bit values often need to be cast to 16-bit values. For unsigned values, this can easily be done by zero-extending. But for signed values one has to generate relatively complex code. Having direct support in the instruction set would be quite useful; the instruction could also be sued as a building block for wider casts (i.e. 8 to 32 bits, 16 to 32 bits, to 64 bits).

add sp, d

This instruction present on the Rabbit adds a signed 8-bit value to the stack pointer.
After a function call with stack parameters, the stack pointer needs to be adjusted. Similar for function entry and exit at functions that store local variables on the stack. Adjusting the stack pointer is thus a very common task. Unfortunately, on the Z80 doing so is quite complex (except for small values, where inc/dec sp and push/pop can be sued).

Philipp

@spth

This comment has been minimized.

Show comment
Hide comment
@spth

spth May 25, 2018

I've just done a quick test on how often some of the proposed instructions are actually used by SDCC (by compiling the SDCC regression tests for gbz80, z180, r2k). Of course for the total effect one needs to consider more than just their frequency (after all, a rare instruction could save a lot of code at each of the few places where it can be used). Still the data seems helpful.

tst [z180]: 2
bool [r2k]: 14
ld hl, d(iy) [r2k]: 26
ld d (ix), hl [r2k]: 39
mlt [z180]: 103
swap [gbz80]: 141
ld hl, d(ix) [r2k]: 143
ld hl, d(hl) [r2k]: 211
ld d(sp), hl [r2k]: 1361
ld hl, d(sp) [r2k]: 3969
add sp, d [r2k]: 15281

Philipp

spth commented May 25, 2018

I've just done a quick test on how often some of the proposed instructions are actually used by SDCC (by compiling the SDCC regression tests for gbz80, z180, r2k). Of course for the total effect one needs to consider more than just their frequency (after all, a rare instruction could save a lot of code at each of the few places where it can be used). Still the data seems helpful.

tst [z180]: 2
bool [r2k]: 14
ld hl, d(iy) [r2k]: 26
ld d (ix), hl [r2k]: 39
mlt [z180]: 103
swap [gbz80]: 141
ld hl, d(ix) [r2k]: 143
ld hl, d(hl) [r2k]: 211
ld d(sp), hl [r2k]: 1361
ld hl, d(sp) [r2k]: 3969
add sp, d [r2k]: 15281

Philipp

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec May 26, 2018

Member

The special instructions:

M=4+           16T    ldix              ED A4           As LDI,  but if byte==A does not copy
M=4+           21T    ldirx             ED B4           As LDIR, but if byte==A does not copy
M=4+           16T*   lddx              ED AC           As LDD,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   lddrx             ED BC           As LDDR,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
M=4+           12T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A

T=4+           8T     mirror a          ED 24           mirror the bits in A     
T=4+           8T     mirror de         ED 26           mirror the bits in DE     

T=4+           8T   pixeldn           ED 93           Move down a line on the ULA screen
T=4+           8T   pixelad           ED 94           using D,E (as Y,X) calculate the ULA screen address and store in HL
T=4+           8T   setae             ED 95           Using the lower 3 bits of E (X coordinate), set the correct bit value in A

are specifically for games and graphics.

The ldix-family of instructions is for copying graphics while skipping over transparent bytes,

ldirscale is for exploded sprites - the additions are implementing fixed point adjustments to display position and source address.

mirror is for reversing images, pixeldn / pixelad / setae are very specialized for the spectrum's native display file organization.

nextreg is for controlling the hardware state of the machine. These are very useful in practice - I would say it's one of the better additions.

All the above I wouldn't expect the compiler to generate, however they would be present in the libraries and user code.

I agree pop x really doesn't have much use. It would make sense to replace it with add sp,d. push nnn I would keep an eye out for using to push constants on the stack for function calls.

There are some other issues with, eg, the 8x8->16 multiply. There are requests for having a signed counterpart and for bringing back the 32-bit multiply which was found to be very useful for fixed point calculations.

The added instructions do not affect flags because they are implemented outside the z80 alu but I do agree many would be more useful if they did affect flags.

Available space on the fpga also cramps what can be added. We'll see what happens - there is a deadline approaching.

Member

aralbrec commented May 26, 2018

The special instructions:

M=4+           16T    ldix              ED A4           As LDI,  but if byte==A does not copy
M=4+           21T    ldirx             ED B4           As LDIR, but if byte==A does not copy
M=4+           16T*   lddx              ED AC           As LDD,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   lddrx             ED BC           As LDDR,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
M=4+           12T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A

T=4+           8T     mirror a          ED 24           mirror the bits in A     
T=4+           8T     mirror de         ED 26           mirror the bits in DE     

T=4+           8T   pixeldn           ED 93           Move down a line on the ULA screen
T=4+           8T   pixelad           ED 94           using D,E (as Y,X) calculate the ULA screen address and store in HL
T=4+           8T   setae             ED 95           Using the lower 3 bits of E (X coordinate), set the correct bit value in A

are specifically for games and graphics.

The ldix-family of instructions is for copying graphics while skipping over transparent bytes,

ldirscale is for exploded sprites - the additions are implementing fixed point adjustments to display position and source address.

mirror is for reversing images, pixeldn / pixelad / setae are very specialized for the spectrum's native display file organization.

nextreg is for controlling the hardware state of the machine. These are very useful in practice - I would say it's one of the better additions.

All the above I wouldn't expect the compiler to generate, however they would be present in the libraries and user code.

I agree pop x really doesn't have much use. It would make sense to replace it with add sp,d. push nnn I would keep an eye out for using to push constants on the stack for function calls.

There are some other issues with, eg, the 8x8->16 multiply. There are requests for having a signed counterpart and for bringing back the 32-bit multiply which was found to be very useful for fixed point calculations.

The added instructions do not affect flags because they are implemented outside the z80 alu but I do agree many would be more useful if they did affect flags.

Available space on the fpga also cramps what can be added. We'll see what happens - there is a deadline approaching.

@spth

This comment has been minimized.

Show comment
Hide comment
@spth

spth May 26, 2018

There won't be much use for push NNNN as long as there is at least one free register pair. It is a 4-byte, 22T instruction. Using two old Z80 instructions (ld qq, NNNN; push qq) is 4 bytes, too and at 21T actually faster!

Philipp

spth commented May 26, 2018

There won't be much use for push NNNN as long as there is at least one free register pair. It is a 4-byte, 22T instruction. Using two old Z80 instructions (ld qq, NNNN; push qq) is 4 bytes, too and at 21T actually faster!

Philipp

@feilipu

This comment has been minimized.

Show comment
Hide comment
@feilipu

feilipu May 26, 2018

Contributor

And just to note (because it is buried in lots of comments) the NNNN in push NNNN is stored big endian too. I'll be keeping a safe distance from that very bodged op code.

@spth work showing the use of add sp,d looks very promising.
Perhaps that's a strong proposal from this community?
Can we assist / vote / agitate anywhere?

Contributor

feilipu commented May 26, 2018

And just to note (because it is buried in lots of comments) the NNNN in push NNNN is stored big endian too. I'll be keeping a safe distance from that very bodged op code.

@spth work showing the use of add sp,d looks very promising.
Perhaps that's a strong proposal from this community?
Can we assist / vote / agitate anywhere?

@suborb

This comment has been minimized.

Show comment
Hide comment
@suborb

suborb May 26, 2018

Member

add sp,d would be usable from both sdcc and sccz80 and stop sccz80 from jumping through hoops to preserve the return value with a large frame.

ld hl,(sp+n) and ld (sp+n),hl make significant improvements to the Rabbit generator in sccz80 and are also used by sdcc.

On the Rabbit these instructions are very cheap (2 bytes and 11 or less Rabbit clocks so ~22T). I think @aralbrec has put forward a case for these on several occasions but has sadly been rejected.

In terms of what's being used by sccz80, I think add hl,nnnn is the only one at the moment that's used (I think we do a ld bc,nnn, add hl, bc for structure access), I can see a rules file uses push nnnn, but this will be from the days when that was a quick instruction.

Member

suborb commented May 26, 2018

add sp,d would be usable from both sdcc and sccz80 and stop sccz80 from jumping through hoops to preserve the return value with a large frame.

ld hl,(sp+n) and ld (sp+n),hl make significant improvements to the Rabbit generator in sccz80 and are also used by sdcc.

On the Rabbit these instructions are very cheap (2 bytes and 11 or less Rabbit clocks so ~22T). I think @aralbrec has put forward a case for these on several occasions but has sadly been rejected.

In terms of what's being used by sccz80, I think add hl,nnnn is the only one at the moment that's used (I think we do a ld bc,nnn, add hl, bc for structure access), I can see a rules file uses push nnnn, but this will be from the days when that was a quick instruction.

@feilipu feilipu referenced this issue Jun 20, 2018

Merged

z180 - integer math routines #820

10 of 10 tasks complete

@feilipu feilipu referenced this issue Jul 4, 2018

Merged

z80-zxn - integer math routines #837

10 of 10 tasks complete
@feilipu

This comment has been minimized.

Show comment
Hide comment
@feilipu

feilipu Jul 5, 2018

Contributor

I dislike the mul d,e mnemonic.
It incorrectly states that it does mul with e, and the result is stored in d.

Whereas mul de, or the z180 mnemonic mlt de, more obviously and correctly refers to both the d and e registers being modified.

There seems to be no history of discussion about the instruction mnemonics on the SpecNEXT forum, or elsewhere. So, there seems no avenue to discuss this.

Would it be appropriate to make mul de a synonym of mul d,e, like was done with swap, fill and tst?

For #837.

Contributor

feilipu commented Jul 5, 2018

I dislike the mul d,e mnemonic.
It incorrectly states that it does mul with e, and the result is stored in d.

Whereas mul de, or the z180 mnemonic mlt de, more obviously and correctly refers to both the d and e registers being modified.

There seems to be no history of discussion about the instruction mnemonics on the SpecNEXT forum, or elsewhere. So, there seems no avenue to discuss this.

Would it be appropriate to make mul de a synonym of mul d,e, like was done with swap, fill and tst?

For #837.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment