New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT0 IM1 & RET - Hangs - Help Wanted #117

Closed
feilipu opened this Issue Mar 19, 2017 · 11 comments

Comments

Projects
None yet
2 participants
@feilipu
Contributor

feilipu commented Mar 19, 2017

I'm trying to write an interrupt service routine (ISR) which is driven by INT0 in IM1 mode.

In initialising (all) the RST and IM1 jump locations I've put RET (C9) instructions as standard practice. for the APU, I insert the RET at the jump location 0x3800.

My issue is that as soon as I try to remove or replace the INT0 RET instruction, the machine hangs.
I've tried to be tricky, and insert the ISR one byte off, and then write the remaining RET to a NOP. But this doesn't help.

I've confirmed that the ISR is not broken. It works as expected when called from another program.

I've tried to establish the ISR with interrupts generally disabled (DI), and with them enabled but turned off for INT0. No difference.

OK. Definitely its a hardware problem, you'd think. Something holding the INT0 line low, causing infinite interrupts.
int0_yaz180
Trouble is, I've held a logic analyser on the !END (INT0) line and it never goes low, unless it should (at the end of processing a command), and then it is reset high by the !ENDACK.

I've found no reference on the Internet for anyone having this kind of issue, and I'm at a loss.

It seems that the sentient machine is waiting patently for a window to escape out of (the INT0 RET), and as soon as it is removed, it is gone!

Any help or suggestion welcome, please.

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Mar 19, 2017

Member

You may not be able to get away with a RET to terminate ISRs on the z180. Try using the proper RETI instead.

A RET is not going to work for im2-aware devices. It was never 'right' on the z80 either but the z80 does not have im2 hardware built in, unlike the z180 which is im2 aware all the way through.

Member

aralbrec commented Mar 19, 2017

You may not be able to get away with a RET to terminate ISRs on the z180. Try using the proper RETI instead.

A RET is not going to work for im2-aware devices. It was never 'right' on the z80 either but the z80 does not have im2 hardware built in, unlike the z180 which is im2 aware all the way through.

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Mar 19, 2017

Member

It looks like you've forgotten to save DE in "APU_ISR". Although DE is not directly used there, it is used in "APU_ISR_OP_ENT" which is branched to.

Member

aralbrec commented Mar 19, 2017

It looks like you've forgotten to save DE in "APU_ISR". Although DE is not directly used there, it is used in "APU_ISR_OP_ENT" which is branched to.

@aralbrec aralbrec added the question label Mar 19, 2017

@feilipu

This comment has been minimized.

Show comment
Hide comment
@feilipu

feilipu Mar 19, 2017

Contributor

Good catch. Thanks.
Relevant push/pop are now added.
Also added a RETI to the ISR.
But, no joy.

Even though INT0 is disabled in ITC register, and there is no external trigger being applied. The location for the INT0 IM1 jump at 0x3800 is being hit or tested constantly.

As soon as I remove the RET (C9) at 0x3800 (jumped to by 0x0038 location) by "poking" a NOP or a DI, or programmatically from assembly, then the machine hangs because it is running "random" code. Even, if there is good code placed from 0x3801 onwards.

Might need to work on a simple replication... too many pieces of code at play here.

Contributor

feilipu commented Mar 19, 2017

Good catch. Thanks.
Relevant push/pop are now added.
Also added a RETI to the ISR.
But, no joy.

Even though INT0 is disabled in ITC register, and there is no external trigger being applied. The location for the INT0 IM1 jump at 0x3800 is being hit or tested constantly.

As soon as I remove the RET (C9) at 0x3800 (jumped to by 0x0038 location) by "poking" a NOP or a DI, or programmatically from assembly, then the machine hangs because it is running "random" code. Even, if there is good code placed from 0x3801 onwards.

Might need to work on a simple replication... too many pieces of code at play here.

@feilipu

This comment has been minimized.

Show comment
Hide comment
@feilipu

feilipu Mar 19, 2017

Contributor

Experimenting with code fragments. It is not so simple.
Although there is no reason trigger for the INT0 interrupt to run when it does, as it is disabled, my ISR code is also faulty, which causes the hang when it runs.

The mystery for me is why is the ISR being run at all, without an INT0 signal.

;        XOR A           ; Zero Accumulator
                         ; Set INT/TRAP Control Register (ITC)             
;        OUT0 (ITC),A    ; Disable all external interrupts.

;        di

        ld hl, INT0_APU  ; load the address of the APU INT0 jump
                         ; initially there is a RET 0xC9 there.
;        ld (hl), $00    ; load a EI 0xFB, or DI 0xF3, or NOP 0x00
;        inc hl
;        ld (hl), $ed    ; load a RET 0xC9 or a RETI 0xED4D
;        inc hl
;        ld (hl), $4d

        ld (hl), $c3    ; load a JP $0020
        inc hl
        ld (hl), $20
        inc hl
        ld (hl), $00

;        ei

I guess I'll just work on fixing the ISR, and perhaps it will become obvious what's happening.

Contributor

feilipu commented Mar 19, 2017

Experimenting with code fragments. It is not so simple.
Although there is no reason trigger for the INT0 interrupt to run when it does, as it is disabled, my ISR code is also faulty, which causes the hang when it runs.

The mystery for me is why is the ISR being run at all, without an INT0 signal.

;        XOR A           ; Zero Accumulator
                         ; Set INT/TRAP Control Register (ITC)             
;        OUT0 (ITC),A    ; Disable all external interrupts.

;        di

        ld hl, INT0_APU  ; load the address of the APU INT0 jump
                         ; initially there is a RET 0xC9 there.
;        ld (hl), $00    ; load a EI 0xFB, or DI 0xF3, or NOP 0x00
;        inc hl
;        ld (hl), $ed    ; load a RET 0xC9 or a RETI 0xED4D
;        inc hl
;        ld (hl), $4d

        ld (hl), $c3    ; load a JP $0020
        inc hl
        ld (hl), $20
        inc hl
        ld (hl), $00

;        ei

I guess I'll just work on fixing the ISR, and perhaps it will become obvious what's happening.

@feilipu feilipu closed this Mar 19, 2017

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Mar 20, 2017

Member

Whenever weird stuff starts happening, I take a look at the stack and make sure it's big enough and in a place that won't overrun or overlap anything.

Member

aralbrec commented Mar 20, 2017

Whenever weird stuff starts happening, I take a look at the stack and make sure it's big enough and in a place that won't overrun or overlap anything.

@feilipu

This comment has been minimized.

Show comment
Hide comment
@feilipu

feilipu Mar 24, 2017

Contributor

This really is the bug that keeps on giving.
I've learned a lot, that I hope will be useful for others, so I'm writing it down here.

I was still getting lots of interrupts to the APU, which was not good, so I decided that I needed to be a bit more robust with the Z80 interrupt jump table. So I implemented one. Pretty standard. Just builds a prototype at 0x0040 in ROM, and then copies it during INIT to 0x2000. I've moved some buffers around and it all works good on the RC2014, and on YAZ180.

The connection to this issue became apparent when I improved the code by correctly inserting an EI before the RETI in the jump location for a null interrupt INT0. Somehow the EI wasn't working for the Z180 based machine, but it was for the Z80 machine. Hmm. Curious, and connected.

Here's the trap which I fell into.

  • Free ROM space was being filled with FF bytes. And,
  • The Op Code for the RST 38 is FF. And,
  • The RST 38 jump location goes to the same location as the INT0 jump, being 0x0038.

What these things mean is that if the Program Counter gets to any unusual locations, then a RST 38 jump to the INT0 code is executed. Which would just RET if the INT0 code said to.

So it was not spurious INT0 calls that are the problem, but rather executing RST 38 instructions.

That explains so much. Two days to get to this point... But, doesn't fix the problem.

First step to fixing the problem is to fill free ROM space with 76 bytes, which is the Op Code for HALT. Then I could see which address was being jumped to, because the TIL311 address display on the YAZ180 just shows me.

The problematic destination is 0x00C3.

And, this bad jump happens after three bytes of printing. Knowing that the TX0 routine does immediate printing for the first two bytes, and then uses the buffer for the following bytes, it should be simple to find the issue, right?

Well not... I'm still looking for a loose RET or similar that could pop the PC at the wrong time. And it is not easy to find where the code starts executing out of the normal bounds.

Contributor

feilipu commented Mar 24, 2017

This really is the bug that keeps on giving.
I've learned a lot, that I hope will be useful for others, so I'm writing it down here.

I was still getting lots of interrupts to the APU, which was not good, so I decided that I needed to be a bit more robust with the Z80 interrupt jump table. So I implemented one. Pretty standard. Just builds a prototype at 0x0040 in ROM, and then copies it during INIT to 0x2000. I've moved some buffers around and it all works good on the RC2014, and on YAZ180.

The connection to this issue became apparent when I improved the code by correctly inserting an EI before the RETI in the jump location for a null interrupt INT0. Somehow the EI wasn't working for the Z180 based machine, but it was for the Z80 machine. Hmm. Curious, and connected.

Here's the trap which I fell into.

  • Free ROM space was being filled with FF bytes. And,
  • The Op Code for the RST 38 is FF. And,
  • The RST 38 jump location goes to the same location as the INT0 jump, being 0x0038.

What these things mean is that if the Program Counter gets to any unusual locations, then a RST 38 jump to the INT0 code is executed. Which would just RET if the INT0 code said to.

So it was not spurious INT0 calls that are the problem, but rather executing RST 38 instructions.

That explains so much. Two days to get to this point... But, doesn't fix the problem.

First step to fixing the problem is to fill free ROM space with 76 bytes, which is the Op Code for HALT. Then I could see which address was being jumped to, because the TIL311 address display on the YAZ180 just shows me.

The problematic destination is 0x00C3.

And, this bad jump happens after three bytes of printing. Knowing that the TX0 routine does immediate printing for the first two bytes, and then uses the buffer for the following bytes, it should be simple to find the issue, right?

Well not... I'm still looking for a loose RET or similar that could pop the PC at the wrong time. And it is not easy to find where the code starts executing out of the normal bounds.

@feilipu feilipu reopened this Mar 24, 2017

@feilipu

This comment has been minimized.

Show comment
Hide comment
@feilipu

feilipu Apr 2, 2017

Contributor

I've spent the last 8 days looking at this code. And, I still can't see what is causing the bad jump. There's nothing that I can see is affecting the PC.

Something is pushing 0x00C3 onto the PC and causing code to be executed there. The situation is as described above. Three bytes are output, then oops.

  • If 0x00C3 contains HALT, execution just stops at 0x00C4.
  • if 0x00C3 contains NOP, then execution just harmlessly NOPs along to the 0x0100 interrupt routine, where recovery happens.

Any sanity preserving input, gratefully accepted.

Contributor

feilipu commented Apr 2, 2017

I've spent the last 8 days looking at this code. And, I still can't see what is causing the bad jump. There's nothing that I can see is affecting the PC.

Something is pushing 0x00C3 onto the PC and causing code to be executed there. The situation is as described above. Three bytes are output, then oops.

  • If 0x00C3 contains HALT, execution just stops at 0x00C4.
  • if 0x00C3 contains NOP, then execution just harmlessly NOPs along to the 0x0100 interrupt routine, where recovery happens.

Any sanity preserving input, gratefully accepted.

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Apr 4, 2017

Member

It's failing right at START then?

START:                                     
            LD      HL,SIGNON1      ; Sign-on message
            CALL    TX0_PRINT       ; Output string      
Member

aralbrec commented Apr 4, 2017

It's failing right at START then?

START:                                     
            LD      HL,SIGNON1      ; Sign-on message
            CALL    TX0_PRINT       ; Output string      
@feilipu

This comment has been minimized.

Show comment
Hide comment
@feilipu

feilipu Apr 4, 2017

Contributor

Yes. Exactly.

In fact, it is failing after the third character is transmitted. The first two characters slip straight into the TX0 immediate transmit case (as the ASCI is double buffered)., The third character is the first run through the TX0 buffer code where the TEI flag sets the Tx interrupt, and then the ASCI interrupt,

Therefore it must be something in the interrupt code or the TX0 code that causes a jump to 0x00C3.

My issue is that I can't find anything that would cause that jump, and no reference to 0xC3 anywhere which could get it incorrectly loaded into PC.

Contributor

feilipu commented Apr 4, 2017

Yes. Exactly.

In fact, it is failing after the third character is transmitted. The first two characters slip straight into the TX0 immediate transmit case (as the ASCI is double buffered)., The third character is the first run through the TX0 buffer code where the TEI flag sets the Tx interrupt, and then the ASCI interrupt,

Therefore it must be something in the interrupt code or the TX0 code that causes a jump to 0x00C3.

My issue is that I can't find anything that would cause that jump, and no reference to 0xC3 anywhere which could get it incorrectly loaded into PC.

@feilipu

This comment has been minimized.

Show comment
Hide comment
@feilipu

feilipu Apr 10, 2017

Contributor

Finally, I've resolved my issue.
What we had here was a classic "failure to understand".

Somewhat embarrassed to leave this here for Internet eternity.

  • Z80 vectors are supported by a JUMP table.
  • Z180 vectors are supported by an ADDRESS table.

Inserting JP instructions into an address vector table will win no friends.

z180_internal_vectors

That cost so much time. But at least on the up side, I've written robust Z80 and Z180 vector tables, improved the ASCI code, and cleaned up initialisation code, in trying to track this down.
Also finally, I now understand.

Contributor

feilipu commented Apr 10, 2017

Finally, I've resolved my issue.
What we had here was a classic "failure to understand".

Somewhat embarrassed to leave this here for Internet eternity.

  • Z80 vectors are supported by a JUMP table.
  • Z180 vectors are supported by an ADDRESS table.

Inserting JP instructions into an address vector table will win no friends.

z180_internal_vectors

That cost so much time. But at least on the up side, I've written robust Z80 and Z180 vector tables, improved the ASCI code, and cleaned up initialisation code, in trying to track this down.
Also finally, I now understand.

@feilipu feilipu closed this Apr 10, 2017

@aralbrec

This comment has been minimized.

Show comment
Hide comment
@aralbrec

aralbrec Apr 10, 2017

Member

I didn't spot that either so it must be something that sits in a mental blindspot!

Member

aralbrec commented Apr 10, 2017

I didn't spot that either so it must be something that sits in a mental blindspot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment