Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
z80asm: implement zx next opcodes (was: serving fpga z80 variants) #312
Paulo do you think there can be an easy way to define additional opcodes for the z80? I am thinking in general here as z80 implementations exist for fpgas and we now have a specific case where new opcodes are being added for a machine.
The zx next, which I am working on now, is adding these instructions:
That's a partial list as more is coming.
I have also been thinking about adding via m4 but that is not ideal because all asm would have to be in an m4 file and this would require changes in zcc. I could also do it in pre-processing with copt.
Yes. I'm currently working on the full Rabbit implementation, and it would be easy to add a new cpu-type to the input files.
You can have a look at the still incomplete parser generator:
And, we're clear to use the
Another update... sorry this may be a frequent activity now but people find it helpful to get it in.
They've changed the order to DEHL for us, very nice :) I'll update the instruction list above.
It sounds like they are open to that too.
What is missing really?
ld ix,sp ;; mirrors ld hl,sp already added
That's brilliant. Creating a zxn_rules.1 along the lines of the rabbit version should be easy and allow us to use some of those opcodes in a trivial way.
From the rabbit file there's these extra addressing modes that are useful for C code:
ld hl,(sp+n) - n unsigned byte
Supporting the other pairs instead of hl would be useful, but not hugely critical really.
From the rabbit file, it looks like I made quite a lot of use of bool hl, this basically turns hl into a boolean and sets flags. Thus a comparison to zero is easy and a true boolean value is yielded. In the rabbit world, this is a single byte opcode which makes it particularly efficient.
These are useful:
and hl, de
So: and|or|xor hl,NNNN to cut out the ld de,NNNN
We don't have any control of the opcode names... they should probably conform to existing instructions on other z80 derivatives where they are the same but all we can do is suggest.
test should be tst (z180)
I'm not sure if there are any others in there.
New instructions that would help most directly would include stack relative addressing.
I looked at two sources for inspiration:
See Appendix B page 228 for an alphabetical listing:
Instructions that would help most for the c compilers, given constraints:
more important if stack relative addressing isn't possible:
In the above "d" is a two's complement 8-bit number but "n"
the rabbit constrained itself to these but the z380 goes whole hog
The rabbit added this:
bool hl (or rp)
to convert non-zero value to 1.
If you can wait another week I will be rewriting the integer math so maybe
BTW, the "TEST" instruction seems to be the same as "TST" for z180 and later.
A couple of new instructions also added to the main list above:
ldirscale is going to scale a source graphic up or down in size. If DE' > 1 then there will be an exploding effect in the destination (pixels will be skipped).
ldpirx is intended as a pattern lookup for fills.
added a commit
Mar 12, 2018
added a commit
Mar 15, 2018
I've completed the implementation of the dma commands on branch feature/z80asm_zxn_dma.
The dma commands are available in all CPUs, not only z80-zxn, but I suppose some of the error messages are ZX Next-specific. Should the dma commands be available only in --cpu=z80-zxn?
Yes I think it's working. I also tested a few externs with the argument list.
The example dma program from a few posts up:
Also can you add
The Z180 multiplication instructions also were quite useful in SDCC (or having at least one of them, preferably the one for hl). After all, often an 8x8->16 multiplication is sufficient. It seems even more useful than the 16x16->32 Rabbit one, since the 16x16->32 makes too many registers unavailable for other purposes.
They have the one for de only, is that still useful without excessive shuffling to hl?
We've started on some integer multiplies for the target library here.
I would like to see them bring in signed versions of some of these instructions (add hl,a ; mul d,e) and return of the 16x16->32 bit multiply they dropped due to limited fpga space. But this is not likely to happen from the core team itself - more as a proposed patch from reviewers - so whether it will or can happen is a question.
Where can I find the instruction set?
mlt de is fine, the advantage of mlt hl over mlt de should be very small.
So here's my comment on the instruction in #312 (comment) from a compiler writer perspective (SDCC). I'll comment on how useful I consider the new instructions, and suggest to rename some (since they already exist in other Z80-derivatives under a different name). I will make a later post with suggestions for additional instructions.
This one is useful for speeding up some shifts. SDCC already emits this instruction for gbz80. For consistency with gbz80. I suggest to rename this instruction "swap" (as it is called on the GameBoy).
This one is very useful. Both for the very common 8x8->16 multiplications (either explicit or for array addressing) and as building block for the support routines for wider multiplications. SDCC already emits this instruction for z180. I suggest to rename this instruction "mlt de" (as it is called on the Z180).
I do not have experience with such instructions. But I believe they will be useful, e.g. for using an 8-bit index into a char array.
I don't see much point in those. The only point is reducing register pressure a bit. I don't see a godd use case in SDCC that benefits from not setting flags. So those are not really an advantage over something like
Which is the same code size as the first proposed new instruction, and via ex de, hl can also be used instead of the second proposed new instruction at just one byte of extra code size.
This one seems a bit more useful, since transferring the addition result from hl to bc would take two bytes. However, I again don't see an advantage for not setting flags. In fact, if this one would set flags (or wouldbe changed to adc) it could be even more useful as a building block for 32- and 64-bit additions.
I can see the point in this one, even though SDCC would not emit it (but it looks good for use in asm code).
I currently do not see the point in those. They look like made for some specific use case that I do not know about. Do they really just skip a byte if it is A (if they stopped instead they might be useful for string processing)?
Those don't look that useful to me. I can see that mirroring bits is hard to do without them in the Z80 instruction set. But the need for mirroring bits is very rare in my experience. And SDCC would not be able to detect C code mirroring bits easily, so it would not emit those.
I don't see much point in this one. The only advantage it provides is reducing register pressure. Otherwise it provides no advantage over
Which is also just 4 bytes of code.
I don't see the point at all.
Does exactly the same at exactly the same cost in code size.
Looks like stuff specific to the peripherals of the device. Is it really worth using opcodes as opposed to some I/O location?
This can be useful for some code, but such code is not that common. SDCC already emits this instruction for z180. I suggest to rename it "tst" (as it is called on the Z180).
Great, very useful in SDCC:
Useful for SDCC:
Marginally useful for SDCC:
Nearly useless for SDCC:
Having worked on various SDCC backends, including all z80-related ones (gbz80, z180, r2k, r3ka, tlcs90) I noticed some instruction being particularly useful, and making a big difference in code size and code speed. In particular, there are some instructions in the Rabbit that are used by SDCC resulting in much lower code size for the r2k/r3ka backends vs. the z80 backend. If possible, I'd like to see some implemented in the zxn.
ld hl, (hl+NN)
Load hl with the value at the address sum of hl and an 8-bit offset (unsigned is preferable but it deosn't matter much).
ld (sp+N), hl
The ix variant is essentially an alternative to the sp variant. Implementing both probably doesn't make that much sense. These instruction are present in the Rabbit.
For C, variables (and function arguments) that cannot be allocated to registers are placed on the stack. Using ix as a stack pointer is an okish way of accessing the stack for 8-bit variables, but it still comes with too much overhead. These instruction allow efficient transfer of 16-bit values between registers and the stack.
This instruction present in the Rabbit casts the value in hl to bool and sets the flags accordingly (the z flag is what really matters).
This instruction (not implemented in any of the architectures currently supported by SDCC) would for an 16-bit register gg, sign-extend the value in the lower 8 bits into the full 16-bit register. Even having this instruction for just one register pair out of hl, de, bc would be very useful.
add sp, d
This instruction present on the Rabbit adds a signed 8-bit value to the stack pointer.
I've just done a quick test on how often some of the proposed instructions are actually used by SDCC (by compiling the SDCC regression tests for gbz80, z180, r2k). Of course for the total effect one needs to consider more than just their frequency (after all, a rare instruction could save a lot of code at each of the few places where it can be used). Still the data seems helpful.
tst [z180]: 2
The special instructions:
are specifically for games and graphics.
The ldix-family of instructions is for copying graphics while skipping over transparent bytes,
ldirscale is for exploded sprites - the additions are implementing fixed point adjustments to display position and source address.
mirror is for reversing images, pixeldn / pixelad / setae are very specialized for the spectrum's native display file organization.
nextreg is for controlling the hardware state of the machine. These are very useful in practice - I would say it's one of the better additions.
All the above I wouldn't expect the compiler to generate, however they would be present in the libraries and user code.
There are some other issues with, eg, the 8x8->16 multiply. There are requests for having a signed counterpart and for bringing back the 32-bit multiply which was found to be very useful for fixed point calculations.
The added instructions do not affect flags because they are implemented outside the z80 alu but I do agree many would be more useful if they did affect flags.
Available space on the fpga also cramps what can be added. We'll see what happens - there is a deadline approaching.
And just to note (because it is buried in lots of comments) the
@spth work showing the use of
add sp,d would be usable from both sdcc and sccz80 and stop sccz80 from jumping through hoops to preserve the return value with a large frame.
ld hl,(sp+n) and ld (sp+n),hl make significant improvements to the Rabbit generator in sccz80 and are also used by sdcc.
On the Rabbit these instructions are very cheap (2 bytes and 11 or less Rabbit clocks so ~22T). I think @aralbrec has put forward a case for these on several occasions but has sadly been rejected.
In terms of what's being used by sccz80, I think add hl,nnnn is the only one at the moment that's used (I think we do a ld bc,nnn, add hl, bc for structure access), I can see a rules file uses push nnnn, but this will be from the days when that was a quick instruction.
I dislike the
There seems to be no history of discussion about the instruction mnemonics on the SpecNEXT forum, or elsewhere. So, there seems no avenue to discuss this.
Would it be appropriate to make