Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

z80asm: implement zx next opcodes as z80n (was: serving fpga z80 variants) #312

Closed
aralbrec opened this issue Aug 10, 2017 · 107 comments
Closed
Assignees

Comments

@aralbrec
Copy link
Member

Paulo do you think there can be an easy way to define additional opcodes for the z80? I am thinking in general here as z80 implementations exist for fpgas and we now have a specific case where new opcodes are being added for a machine.

The zx next, which I am working on now, is adding these instructions:

   swapnib    ED 23    A bits 7-4 swap with A bits 3-0
   mul        ED 30    multiply HL*DE = HLDE (no flags set)
   add  hl,a  ED 31    Add A to HL (no flags set)
   add  de,a  ED 32    Add A to DE (no flags set)
   add  bc,a  ED 33    Add A to BC (no flags set)
   outinb     ED 90    out (c),(hl), hl++
   ldix       ED A4    As LDI,  but if byte==A does not copy
   ldirx      ED B4    As LDIR, but if byte==A does not copy
   lddx       ED AC    As LDD,  but if byte==A does not copy, and DE is incremented
   lddrx      ED BC    As LDDR,  but if byte==A does not copy

That's a partial list as more is coming.

I have also been thinking about adding via m4 but that is not ideal because all asm would have to be in an m4 file and this would require changes in zcc. I could also do it in pre-processing with copt.

@pauloscustodio
Copy link
Member

pauloscustodio commented Aug 10, 2017

Yes. I'm currently working on the full Rabbit implementation, and it would be easy to add a new cpu-type to the input files.

You can have a look at the still incomplete parser generator:
https://github.com/z88dk/z88dk/blob/feature/z80asm_rabbit/src/z80asm/tools/cpu.def
https://github.com/z88dk/z88dk/blob/feature/z80asm_rabbit/src/z80asm/tools/cpu.pl

@aralbrec
Copy link
Member Author

So it's possible to do in mainline z80asm now?

Then the assembler invoke would be "z80asm --cpu=cpuname". Have you thought of any naming to accommodate these devices? Maybe "z80asm --cpu=z80-zxn" (z80 variant of zx next).

@pauloscustodio
Copy link
Member

pauloscustodio commented Aug 11, 2017

Yes. Give me a couple of days.

I agree with your suggestion for the naming. I assume this z80-zxn CPU does not have any of the z180 and rabbit opcodes and it has all the undocumented z80 opcodes. Correct?

@aralbrec
Copy link
Member Author

Yes everything z80 is in there, it's just the additional instructions and ones yet to come.

@pauloscustodio pauloscustodio self-assigned this Aug 11, 2017
@pauloscustodio
Copy link
Member

Just a note: MUL will be confusing:

  • in zxn it is coded as ED 30 and means HLDE := HL*DE
  • in r2k it is coded as F7 and means HLBC := BC*DE

@feilipu
Copy link
Collaborator

feilipu commented Aug 12, 2017

And, we're clear to use the MLT instruction with the z180 too?
It seems to generate correct code.

The MLT performs unsigned multiplication on two 8-bit numbers yielding a 16-bit result. MLT may specify BC, DE, HL, or SP registers. The 8-bit operands are loaded into each half of the 16-bit register and the 16-bit result is returned in that register.

@aralbrec
Copy link
Member Author

Yes z80asm has separate tables for z80, z80-zxn, z180, r2k and r3k.

@aralbrec
Copy link
Member Author

aralbrec commented Aug 16, 2017

Another update... sorry this may be a frequent activity now but people find it helpful to get it in.

New Z80 opcodes on the NEXT (more to come)
======================================================================================
T=4+           8T     swapnib           ED 23           A bits 7-4 swap with A bits 3-0
T=4+           8T     mirror a          ED 24           mirror the bits in A
M=2+          11T   test NN (tst A,NN)  ED 27 NN       AND A with NN and set all flags. A is not affected.
               8T   bsla de,b          ED 28          shift DE left by B places - uses bits 4..0 of B only
               8T   bsra de,b          ED 29          arithmetic shift right DE by B places - uses bits 4..0 of B only
               8T   bsrl de,b          ED 2A          logical shift right DE by B places - uses bits 4..0 of B only
               8T   bsrf de,b          ED 2B          shift right DE by B places, filling from left with 1s - uses bits 4..0 of B only
               8T   brlc de,b          ED 2C          rotate DE left by B places - uses bits 3..0 of B only
T=4+           8T     mul d,e (mlt de)  ED 30           multiply DE = D*E (no flags set)
T=4+           8T     add  hl,a         ED 31           Add A to HL (no flags set) not sign extended
T=4+           8T     add  de,a         ED 32           Add A to DE (no flags set) not sign extended
T=4+           8T     add  bc,a         ED 33           Add A to BC (no flags set) not sign extended
M=3+, T=4           16T    add  hl,NNNN    ED 34 LO HI     Add NNNN to HL (no flags set)
M=3+, T=4           16T    add  de,NNNN     ED 35 LO HI     Add NNNN to DE (no flags set)
M=3+, T=4           16T    add  bc,NNNN     ED 36 LO HI     Add NNNN to BC (no flags set)
M=6+           23T    push NNNN        ED 8A HI LO     push 16bit immediate value, note big endian order
              16T   outinb             ED 90          outi without modifying B

M=5+           20T    nextreg reg,val   ED 91 reg,val   Set a NEXT register (like doing out($243b),reg then out($253b),val
M=4+           17T    nextreg reg,a     ED 92 reg       Set a NEXT register using A (like doing out($243b),reg then out($253b),A )
** reg,val are both 8-bit numbers

T=4+           8T   pixeldn           ED 93           move down a line on the ULA screen
T=4+           8T   pixelad           ED 94           using D,E (as Y,X) calculate the ULA screen address and store in HL
T=4+           8T   setae             ED 95           Using the lower 3 bits of E (X coordinate), set the correct bit value in A

              13T   jp (c)             ED 98          PC[13:0] = IN (C) << 6

M=4+           16T    ldix              ED A4           As LDI,  but if byte==A does not copy
M=4+           14T    ldws              ED A5           (de)=(hl), l++, d++ for layer 2 vertical tile copy
M=4+           16T    lddx              ED AC           As LDD,  but if byte==A does not copy, and DE is incremented
M=4+           21T    ldirx             ED B4           As LDIR, but if byte==A does not copy
M=4+           21T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A
M=4+           21T    lddrx             ED BC           As LDDR,  but if byte==A does not copy, and DE is incremented

** Instructions that have been removed due to limited fpga space.
   They have been removed from z80asm in the current main branch.

"mul" "ld a32,dehl" "ld dehl,a32" "ex a32, dehl" "ld hl,sp" "inc dehl" "dec dehl" "add dehl,a"
"add dehl,bc" "add dehl,NN" "sub dehl,a" "sub dehl,bc" "popx" "fillde" "ldirscale"
Memory mapping - specify which 8k ram page is placed into
the corresponding 8k slot of the z80's 64k memory space.

Originally, `mmux` were intended as instructions but they have since
been demoted to TBBLUE registers set via `nextreg`.  We're keeping
these as effective macros.

20T  mmu0 NN           ED 91 50 NN      macro: Ram page in slot 0-8k
20T  mmu1 NN           ED 91 51 NN      macro: Ram page in slot 8k-16k
20T  mmu2 NN           ED 91 52 NN      macro: Ram page in slot 16k-24k
20T  mmu3 NN           ED 91 53 NN      macro: Ram page in slot 24k-32k
20T  mmu4 NN           ED 91 54 NN      macro: Ram page in slot 32k-40k
20T  mmu5 NN           ED 91 55 NN      macro: Ram page in slot 40k-48k
20T  mmu6 NN           ED 91 56 NN      macro: Ram page in slot 48k-56k
20T  mmu7 NN           ED 91 57 NN      macro: Ram page in slot 56k-64k

17T   mmu0 a            ED 92 50         macro: Ram page in slot 0-8k
17T   mmu1 a            ED 92 51         macro: Ram page in slot 8k-16k
17T   mmu2 a            ED 92 52         macro: Ram page in slot 16k-24k
17T   mmu3 a            ED 92 53         macro: Ram page in slot 24k-32k
17T   mmu4 a            ED 92 54         macro: Ram page in slot 32k-40k
17T   mmu5 a            ED 92 55         macro: Ram page in slot 40k-48k
17T   mmu6 a            ED 92 56         macro: Ram page in slot 48k-56k
17T   mmu7 a            ED 92 57         macro: Ram page in slot 56k-64k

* Times are guesses based on other instruction times.  All of this subject to change.
Pseudo-instructions for the copper unit have been defined so that copper instructions can be
generated within z80 asm.  The instructions are namespaced with a leading "cu."
so that they are not confused with regular z80 instructions.  Each is 16-bits.

cu.wait VER,HOR    (0<=VER<=311, 0<=HOR<=55)
cu.move REG,VAL    (0<=REG<=127, 0<=VAL<=255)
cu.nop             (0x0000 equivalent to ignored "cu.move 0,0")
cu.stop            (0xffff equivalent to impossible "cu.wait 511,63")
Pseudo-instructions for the dma unit have been defined so that dma programs can be
generated within z80 asm.  The instructions are namespaced with a leading "dma." so that they
are not confused with regular z80 instructions.  The parameters to the dma instructions are
error checked.

dma.wr0 ...
dma.wr1 ...
dma.wr2 ...
dma.wr3 ...
dma.wr4 ...
dma.wr5 ...
dma.wr6 ... \
dma.cmd ... / aliases

This post gives an example dma program and this link gives technical information.

@suborb
Copy link
Member

suborb commented Aug 16, 2017

Any chance of an ld hl,(sp+dd) - that would be really useful.

And yes, swapping round to use dehl would allow us to use it without an ex.

@aralbrec
Copy link
Member Author

They've changed the order to DEHL for us, very nice :) I'll update the instruction list above.

Any chance of an ld hl,(sp+dd) - that would be really useful.

It sounds like they are open to that too.

What is missing really?

ld r,(sp+d)
ld rp,(sp+d)
ld (sp+d),r
ld (sp+d),rp

ld ix,sp ;; mirrors ld hl,sp already added
ld iy,sp

add sp,nn

@suborb
Copy link
Member

suborb commented Aug 16, 2017

That's brilliant. Creating a zxn_rules.1 along the lines of the rabbit version should be easy and allow us to use some of those opcodes in a trivial way.

From the rabbit file there's these extra addressing modes that are useful for C code:

ld hl,(sp+n) - n unsigned byte
ld ix,(sp +n)
ld iy,(sp +n)
ld (sp+n),hl
ld (sp+n),ix
ld (sp+n),iy
ld hl,(hl + d) - d is signed byte
ld hl,(ix + d)
ld hl,(iy + d)
add sp,d

Supporting the other pairs instead of hl would be useful, but not hugely critical really.

From the rabbit file, it looks like I made quite a lot of use of bool hl, this basically turns hl into a boolean and sets flags. Thus a comparison to zero is easy and a true boolean value is yielded. In the rabbit world, this is a single byte opcode which makes it particularly efficient.

These are useful:

and hl, de
or hl, de
xor hl,de

So: and|or|xor hl,NNNN to cut out the ld de,NNNN

neg hl
rr hl
rl hl

@suborb
Copy link
Member

suborb commented Aug 16, 2017

I've checked in a zxn_rules.1 file, but not hooked it in as of yet, you'll need to add:

COPTRULESCPU    DESTDIR/lib/zxn_rules.1

to the zxn.cfg.

Test file attached to show it working (probably only with sccz80 at the moment)

Issue_312_zxn_optimisations.txt

@aralbrec
Copy link
Member Author

There is only one emulator currently that is accepting the opcodes and it's only a partial emulation so I think we should wait a bit before enabling the opcodes outside the assembler.

@aralbrec
Copy link
Member Author

@pauloscustodio Any chance the new list up there can be included soon?

A complete emulator (ZEsarUX) is going to implement the instructions so once that's there, I think it's safe to enable them in the entire toolchain.

@pauloscustodio
Copy link
Member

pauloscustodio commented Aug 19, 2017

I'm on it...

  1. Shouldn't fillde be fill de (as in verb-object)?
  2. Shouldn't test N be tst N to be the same as on the Z180?

@pauloscustodio
Copy link
Member

I've committed the change to implement the current opcodes.
I've changed test to be the same as the Z180, tst N.

@aralbrec
Copy link
Member Author

We don't have any control of the opcode names... they should probably conform to existing instructions on other z80 derivatives where they are the same but all we can do is suggest.

test should be tst (z180)
swapnib should be swap (gameboy z80)

I'm not sure if there are any others in there.

@pauloscustodio
Copy link
Member

I've committed the following variants of z80-zxn opcodes:

  • swap as synonym to swapnib
  • fill de as synonym to fillde
  • test as synonym to tst

@aralbrec
Copy link
Member Author

aralbrec commented Aug 24, 2017

New instructions that would help most directly would include stack relative addressing.

I looked at two sources for inspiration:

  1. The Rabbit family of processors which are based on the z80 and z180. They were
    also constrained by opcode space and added some instructions for their compiler.

https://github.com/z88dk/techdocs/blob/master/rabbit/RabbitInstructions.pdf

  1. The Z380 family by Zilog. They expanded the instruction set and have a full
    set of stack relative instructions as well as 16-bit logical, multiplication,
    division and accessing the exx set in a finer grain. I highly recommend that Victor
    look at how Zilog added the instructions while remaining compatible (they defined
    escape opcodes in the instruction listing, eg) and borrow some of their names.
    Their degree of compatibility won't be as high because the z80 in the next must
    also implement crazy undocumented opcodes.

See Appendix B page 228 for an alphabetical listing:
https://github.com/z88dk/techdocs/blob/master/zilog/z380_cpu_um.pdf

Instructions that would help most for the c compilers, given constraints:

  • ld hl,(sp+n)
    ld rp,(sp+n) \
    ld r,(sp+n) / Take up opcode space, in z380

  • ld (sp+n),hl
    ld (sp+n),rp \
    ld (sp+n),r / Take up opcode space, in z380

  • add sp,d

ld ix,sp

more important if stack relative addressing isn't possible:

  • ld hl,(ix+d)
    ld rp,(ix+d)

  • ld (ix+d),hl
    ld (ix+d),rp

In the above "d" is a two's complement 8-bit number but "n"
is an unsigned 8-bit number. The z380 treats "n" as signed
for stack relative addressing too but it doesn't make too
much sense to index below the stack pointer. If you're sharing
the ix+d logic with sp+n then two's complement will happen.

the rabbit constrained itself to these but the z380 goes whole hog
including constant NNNN operand:

andw hl,de
orw hl,de
xorw hl,de

negw hl
rlw de
rrw hl
rrw de

The rabbit added this:

bool hl (or rp)

to convert non-zero value to 1.

If you can wait another week I will be rewriting the integer math so maybe
something will come out of that.

BTW, the "TEST" instruction seems to be the same as "TST" for z180 and later.
"SWAPNIB" has similar function to the gameboy's swap and the z380's SWAP.
It may be a good idea to use the same mnemonics where it makes sense.

@aralbrec
Copy link
Member Author

aralbrec commented Aug 27, 2017

@pauloscustodio @suborb

A couple of new instructions also added to the main list above:

21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
14T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A

* = guessing

ldirscale is going to scale a source graphic up or down in size. If DE' > 1 then there will be an exploding effect in the destination (pixels will be skipped).

ldpirx is intended as a pattern lookup for fills.

@aralbrec aralbrec reopened this Aug 27, 2017
@spth
Copy link

spth commented May 11, 2018

The Z180 multiplication instructions also were quite useful in SDCC (or having at least one of them, preferably the one for hl). After all, often an 8x8->16 multiplication is sufficient. It seems even more useful than the 16x16->32 Rabbit one, since the 16x16->32 makes too many registers unavailable for other purposes.

Philipp

@aralbrec
Copy link
Member Author

aralbrec commented May 12, 2018

They have the one for de only, is that still useful without excessive shuffling to hl?

We've started on some integer multiplies for the target library here.

I would like to see them bring in signed versions of some of these instructions (add hl,a ; mul d,e) and return of the 16x16->32 bit multiply they dropped due to limited fpga space. But this is not likely to happen from the core team itself - more as a proposed patch from reviewers - so whether it will or can happen is a question.

@spth
Copy link

spth commented May 12, 2018

Where can I find the instruction set?
http://devnext.referata.com/wiki/Extended_Z80_instruction_set
Mentions the 16x16->32 multiply, but not 8x8->16.
Also, it seems quite unfortunate, that there is no equivalent of the Rabbits 16-bit load instructions ld hl, (hl+d) and ld hl, (ix+d)/ld hl, (sp+d) and ld (sp+d), hl/ld (ix+d), hl. Thos really make a big difference for 16-bit code.

mlt de is fine, the advantage of mlt hl over mlt de should be very small.

Philipp

@aralbrec
Copy link
Member Author

The most accurate information is in z88dk. The wiki is maintained by the community who don't keep up / have access to some of the information.

#312 (comment)

@spth
Copy link

spth commented May 25, 2018

So here's my comment on the instruction in #312 (comment) from a compiler writer perspective (SDCC). I'll comment on how useful I consider the new instructions, and suggest to rename some (since they already exist in other Z80-derivatives under a different name). I will make a later post with suggestions for additional instructions.

T=4+           8T*    swapnib           ED 23           A bits 7-4 swap with A bits 3-0

This one is useful for speeding up some shifts. SDCC already emits this instruction for gbz80. For consistency with gbz80. I suggest to rename this instruction "swap" (as it is called on the GameBoy).

T=4+           8T     mul d,e           ED 30           multiply DE = D*E (no flags set)

This one is very useful. Both for the very common 8x8->16 multiplications (either explicit or for array addressing) and as building block for the support routines for wider multiplications. SDCC already emits this instruction for z180. I suggest to rename this instruction "mlt de" (as it is called on the Z180).

T=4+           8T     add  hl,a         ED 31           Add A to HL (no flags set) not sign extended
T=4+           8T*    add  de,a         ED 32           Add A to DE (no flags set) not sign extended
T=4+           8T*    add  bc,a         ED 33           Add A to BC (no flags set) not sign extended

I do not have experience with such instructions. But I believe they will be useful, e.g. for using an 8-bit index into a char array.

M=3+, T=4           16T    add  hl,NNNN    ED 34 LO HI     Add NNNN to HL (no flags set)
M=3+, T=4           16T*   add  de,NNNN     ED 35 LO HI     Add NNNN to DE (no flags set)

I don't see much point in those. The only point is reducing register pressure a bit. I don't see a godd use case in SDCC that benefits from not setting flags. So those are not really an advantage over something like

ld de, NNNN
ld hl, de

Which is the same code size as the first proposed new instruction, and via ex de, hl can also be used instead of the second proposed new instruction at just one byte of extra code size.

M=3+, T=4           16T*   add  bc,NNNN     ED 36 LO HI     Add NNNN to BC (no flags set)

This one seems a bit more useful, since transferring the addition result from hl to bc would take two bytes. However, I again don't see an advantage for not setting flags. In fact, if this one would set flags (or wouldbe changed to adc) it could be even more useful as a building block for 32- and 64-bit additions.

M=4+           16T*      outinb            ED 90           out (c),(hl), hl++

I can see the point in this one, even though SDCC would not emit it (but it looks good for use in asm code).

M=4+           16T    ldix              ED A4           As LDI,  but if byte==A does not copy
M=4+           21T    ldirx             ED B4           As LDIR, but if byte==A does not copy
M=4+           16T*   lddx              ED AC           As LDD,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   lddrx             ED BC           As LDDR,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
M=4+           12T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A

I currently do not see the point in those. They look like made for some specific use case that I do not know about. Do they really just skip a byte if it is A (if they stopped instead they might be useful for string processing)?

T=4+           8T     mirror a          ED 24           mirror the bits in A     
T=4+           8T     mirror de         ED 26           mirror the bits in DE     

Those don't look that useful to me. I can see that mirroring bits is hard to do without them in the Z80 instruction set. But the need for mirroring bits is very rare in my experience. And SDCC would not be able to detect C code mirroring bits easily, so it would not emit those.

M=6+           22T*   push NNNN        ED 8A HI LO     push 16bit immediate value note big endian order

I don't see much point in this one. The only advantage it provides is reducing register pressure. Otherwise it provides no advantage over

ld qq, NNNN
push qq

Which is also just 4 bytes of code.

M=3+           8T*    pop x            ED 8B           discard word on stack (inc sp; inc sp)

I don't see the point at all.

inc sp
inc sp

Does exactly the same at exactly the same cost in code size.

M=5+           16T*   nextreg reg,val   ED 91 reg,val   Set a NEXT register (like doing out($243b),reg then out($253b),val
M=4+           12T*   nextreg reg,a     ED 92 reg       Set a NEXT register using A (like doing out($243b),reg then out($253b),A )
** reg,val are both 8-bit numbers

T=4+           8T   pixeldn           ED 93           Move down a line on the ULA screen
T=4+           8T   pixelad           ED 94           using D,E (as Y,X) calculate the ULA screen address and store in HL
T=4+           8T   setae             ED 95           Using the lower 3 bits of E (X coordinate), set the correct bit value in A

Looks like stuff specific to the peripherals of the device. Is it really worth using opcodes as opposed to some I/O location?

M=2+          11T   test NN           ED 27 NN       And A with NN and set all flags. A is not affected.

This can be useful for some code, but such code is not that common. SDCC already emits this instruction for z180. I suggest to rename it "tst" (as it is called on the Z180).

Summary:

Great, very useful in SDCC:
mul d, e/mlt de

Useful for SDCC:
swapnib/swap a
add hl,a
add de,a
add bc,a
add bc,NNNN
test/tst NN

Marginally useful for SDCC:
push NNNN
add de,NNNN
push NNNN

Nearly useless for SDCC:
add hl,NNNN
mirror a
mirror de
pop x

@spth
Copy link

spth commented May 25, 2018

Having worked on various SDCC backends, including all z80-related ones (gbz80, z180, r2k, r3ka, tlcs90) I noticed some instruction being particularly useful, and making a big difference in code size and code speed. In particular, there are some instructions in the Rabbit that are used by SDCC resulting in much lower code size for the r2k/r3ka backends vs. the z80 backend. If possible, I'd like to see some implemented in the zxn.

ld hl, (hl+NN)

Load hl with the value at the address sum of hl and an 8-bit offset (unsigned is preferable but it deosn't matter much).
This instruction, present in the Rabbits is very useful when working with pointers. Pointers are very common in C code, sometimes explicit, sometimes implicit. Example use cases: Reading a 16-bit value from a pointer. Reading a 16-bit value from a fixed offset into an array. Reading a 16-bit value from a member of a pointed-to struct (e.g. for traversing linked lists).

ld (sp+N), hl
ld (sp+N), de
ld (sp+N), bc
ld (ix+d), hl
ld (ix+d), de
ld (ix+d), bc
ld hl, (sp+N)
ld de, (sp+N)
ld bc, (sp+N)
ld hl, (ix+d)
ld de, (ix+d)
ld bc, (ix+d)

The ix variant is essentially an alternative to the sp variant. Implementing both probably doesn't make that much sense. These instruction are present in the Rabbit.

For C, variables (and function arguments) that cannot be allocated to registers are placed on the stack. Using ix as a stack pointer is an okish way of accessing the stack for 8-bit variables, but it still comes with too much overhead. These instruction allow efficient transfer of 16-bit values between registers and the stack.

bool hl

This instruction present in the Rabbit casts the value in hl to bool and sets the flags accordingly (the z flag is what really matters).
The instruction has a variety of uses. The first one obviously being casts to bool. It also helps a lot with testing 16-bit values for being zero, which is quite common, also for pointers. Another use is efficient zeroing of h (e.g. before loading an 8-bit value into l, and then adding some 16-bit value present in de or bc).

sex gg

This instruction (not implemented in any of the architectures currently supported by SDCC) would for an 16-bit register gg, sign-extend the value in the lower 8 bits into the full 16-bit register. Even having this instruction for just one register pair out of hl, de, bc would be very useful.
For efficiency, on 8-bit systems 8-bit values are used a lot. But sometimes 16 bits are needed for range; the Z80 has 16-bit addresses and the C standard sometimes requires promotion to int. Thus 8-bit values often need to be cast to 16-bit values. For unsigned values, this can easily be done by zero-extending. But for signed values one has to generate relatively complex code. Having direct support in the instruction set would be quite useful; the instruction could also be sued as a building block for wider casts (i.e. 8 to 32 bits, 16 to 32 bits, to 64 bits).

add sp, d

This instruction present on the Rabbit adds a signed 8-bit value to the stack pointer.
After a function call with stack parameters, the stack pointer needs to be adjusted. Similar for function entry and exit at functions that store local variables on the stack. Adjusting the stack pointer is thus a very common task. Unfortunately, on the Z80 doing so is quite complex (except for small values, where inc/dec sp and push/pop can be sued).

Philipp

@spth
Copy link

spth commented May 25, 2018

I've just done a quick test on how often some of the proposed instructions are actually used by SDCC (by compiling the SDCC regression tests for gbz80, z180, r2k). Of course for the total effect one needs to consider more than just their frequency (after all, a rare instruction could save a lot of code at each of the few places where it can be used). Still the data seems helpful.

tst [z180]: 2
bool [r2k]: 14
ld hl, d(iy) [r2k]: 26
ld d (ix), hl [r2k]: 39
mlt [z180]: 103
swap [gbz80]: 141
ld hl, d(ix) [r2k]: 143
ld hl, d(hl) [r2k]: 211
ld d(sp), hl [r2k]: 1361
ld hl, d(sp) [r2k]: 3969
add sp, d [r2k]: 15281

Philipp

@aralbrec
Copy link
Member Author

aralbrec commented May 26, 2018

The special instructions:

M=4+           16T    ldix              ED A4           As LDI,  but if byte==A does not copy
M=4+           21T    ldirx             ED B4           As LDIR, but if byte==A does not copy
M=4+           16T*   lddx              ED AC           As LDD,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   lddrx             ED BC           As LDDR,  but if byte==A does not copy, and DE is incremented
M=4+           21T*   ldirscale         ED B6           As LDIRX,  if(hl)!=A then (de)=(hl); HL_A'+=BC'; DE+=DE'; dec BC; Loop.
M=4+           12T*   ldpirx            ED B7           (de) = ( (hl&$fff8)+(E&7) ) when != A

T=4+           8T     mirror a          ED 24           mirror the bits in A     
T=4+           8T     mirror de         ED 26           mirror the bits in DE     

T=4+           8T   pixeldn           ED 93           Move down a line on the ULA screen
T=4+           8T   pixelad           ED 94           using D,E (as Y,X) calculate the ULA screen address and store in HL
T=4+           8T   setae             ED 95           Using the lower 3 bits of E (X coordinate), set the correct bit value in A

are specifically for games and graphics.

The ldix-family of instructions is for copying graphics while skipping over transparent bytes,

ldirscale is for exploded sprites - the additions are implementing fixed point adjustments to display position and source address.

mirror is for reversing images, pixeldn / pixelad / setae are very specialized for the spectrum's native display file organization.

nextreg is for controlling the hardware state of the machine. These are very useful in practice - I would say it's one of the better additions.

All the above I wouldn't expect the compiler to generate, however they would be present in the libraries and user code.

I agree pop x really doesn't have much use. It would make sense to replace it with add sp,d. push nnn I would keep an eye out for using to push constants on the stack for function calls.

There are some other issues with, eg, the 8x8->16 multiply. There are requests for having a signed counterpart and for bringing back the 32-bit multiply which was found to be very useful for fixed point calculations.

The added instructions do not affect flags because they are implemented outside the z80 alu but I do agree many would be more useful if they did affect flags.

Available space on the fpga also cramps what can be added. We'll see what happens - there is a deadline approaching.

@spth
Copy link

spth commented May 26, 2018

There won't be much use for push NNNN as long as there is at least one free register pair. It is a 4-byte, 22T instruction. Using two old Z80 instructions (ld qq, NNNN; push qq) is 4 bytes, too and at 21T actually faster!

Philipp

@feilipu
Copy link
Collaborator

feilipu commented May 26, 2018

And just to note (because it is buried in lots of comments) the NNNN in push NNNN is stored big endian too. I'll be keeping a safe distance from that very bodged op code.

@spth work showing the use of add sp,d looks very promising.
Perhaps that's a strong proposal from this community?
Can we assist / vote / agitate anywhere?

@suborb
Copy link
Member

suborb commented May 26, 2018

add sp,d would be usable from both sdcc and sccz80 and stop sccz80 from jumping through hoops to preserve the return value with a large frame.

ld hl,(sp+n) and ld (sp+n),hl make significant improvements to the Rabbit generator in sccz80 and are also used by sdcc.

On the Rabbit these instructions are very cheap (2 bytes and 11 or less Rabbit clocks so ~22T). I think @aralbrec has put forward a case for these on several occasions but has sadly been rejected.

In terms of what's being used by sccz80, I think add hl,nnnn is the only one at the moment that's used (I think we do a ld bc,nnn, add hl, bc for structure access), I can see a rules file uses push nnnn, but this will be from the days when that was a quick instruction.

@feilipu
Copy link
Collaborator

feilipu commented Jul 5, 2018

I dislike the mul d,e mnemonic.
It incorrectly states that it does mul with e, and the result is stored in d.

Whereas mul de, or the z180 mnemonic mlt de, more obviously and correctly refers to both the d and e registers being modified.

There seems to be no history of discussion about the instruction mnemonics on the SpecNEXT forum, or elsewhere. So, there seems no avenue to discuss this.

Would it be appropriate to make mul de a synonym of mul d,e, like was done with swap, fill and tst?

For #837.

@feilipu
Copy link
Collaborator

feilipu commented Dec 22, 2018

Noted on the SpecNext Kickstarter - Update 41 that... Z80N has been enhanced with six more instructions: 5 x barrel shift/rotate and 1 JP.

Is there any information on these additional 6 instructions, and their opcodes, etc?

And, should/could they flow into z80asm and into sdcc, too?

@pauloscustodio
Copy link
Member

I can add them to z80asm, if someone tells which they are.

@aralbrec
Copy link
Member Author

I have the information; I'll put it here later today. I've been busy lately. I think we should settle on z?80 mnemonics as alternates where there are equivalents and the official mnemonics so there isn't a proliferation. So "mul de" becomes "mlt de" only, eg.

@aralbrec
Copy link
Member Author

The main list is updated:
#312 (comment)

The additions are barrel shifts and a special jp(c) being used for instruction dispatch from disk streaming. It's important for video and other speculated uses.

ldirscale has been dropped for the time being.

I think we should prune away any instruction aliases except for ones that match other zilog related processors. So mlt de accepted as mul d,e (and we lose mul de which will mean fixes in the library) and tst A,NN accepted as test NN.

The list is updated to reflect these things too.

@aralbrec
Copy link
Member Author

There are other assemblers accepting mul de so maybe that one should be kept.

@pauloscustodio
Copy link
Member

z80asm will be updated.

pauloscustodio added a commit that referenced this issue Dec 24, 2018
Fix #312: new barrel shift operators in z80-zxn
@pauloscustodio
Copy link
Member

pauloscustodio commented Dec 24, 2018 via email

@feilipu feilipu changed the title z80asm: implement zx next opcodes (was: serving fpga z80 variants) z80asm: implement zx next opcodes as z80n (was: serving fpga z80 variants) Jul 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants