Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Corrupted console output with large buffers. #1697

Closed
gendx opened this issue Mar 18, 2020 · 10 comments · Fixed by #1960
Closed

[Bug] Corrupted console output with large buffers. #1697

gendx opened this issue Mar 18, 2020 · 10 comments · Fixed by #1960
Labels
blocked Waiting on something, like a different PR or a dependency.

Comments

@gendx
Copy link
Contributor

gendx commented Mar 18, 2020

System specifications

Commands run

On Tock repository.

cd boards/nordic/nrf52840dk
make flash

On libtock-rs at my console-check branch.

./run_example console_triangle

Expected behavior

The console_triangle example from my console-check branch on libtock-rs is using the console driver with buffers of increasing size (hence the "triangle") https://github.com/gendx/libtock-rs/blob/console-check/examples/console_triangle.rs.

The output is supposed to look like this:

1
22
333
4444
55555
666666
7777777
88888888
999999999
0000000000
11111111111
222222222222
3333333333333
44444444444444
555555555555555
...

Observed behavior

With a maximum buffer size of 128 bytes, I observe corrupted console output such as the following:

...
6666666666666666666666666666666666666666666666666666666666666666666666666666
77777777777777777777777777777777777777777777777777777777777777777777777777777
888888888888888888888888888888888888888888888888888888888888888888888888888888
9999999999999999999999999999999999999999999999999999999999999999999999999999999
00000000000000000000000000000000000000000000000000000000000000000000000000000000
111111111111111111111111111111111111111111111111111111111111111111111111111111111
2222222222222222222222222222222222222222222222222222222222222222222222222222222222
3333333333333333333333333333333333333334444444444
333333333333333333333333333333333
4444444444444445555555555555555555555555555555555555555555555
4444444444444444444444
5566666666666666666666666666666666666666666666666666666666666666666666666
7777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777
88888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
...

When the maximum buffer size is 64 bytes however, there's no such corrupted output.

Analysis

Incidentally, the console driver has buffers of 64 bytes, which leads me to believe that the bug could come from there.

pub static mut WRITE_BUF: [u8; 64] = [0; 64];
pub static mut READ_BUF: [u8; 64] = [0; 64];

It could also be another side-effect of alarms not working properly (see #1651).

Help wanted

I can currently only test on nRF52840-DK via Segger RTT output, so it would be helpful to know if anyone can reproduce on another board and/or on UART output.

@alevy
Copy link
Member

alevy commented Mar 18, 2020

Let's confirm that the same behavior does not exist on a board with a "normal" UART (e.g. Hail or imix). So we can triangulate that this is probably specific to the Segger RTT stack (which is pretty different).

@gendx
Copy link
Contributor Author

gendx commented Mar 18, 2020

Let's confirm that the same behavior does not exist on a board with a "normal" UART (e.g. Hail or imix). So we can triangulate that this is probably specific to the Segger RTT stack (which is pretty different).

Yes if someone could check on a traditional UART that would be very helpful.

@bradjc
Copy link
Contributor

bradjc commented Mar 18, 2020

on nrf52840dk with kernel on master no changes:

Initialization complete. Entering main loop


Kernel panic at <::core::macros::panic macros>:5:
	"Process console_triangle had a fault"
	Kernel version release-1.4-822-g527afb416

---| Debug buffer not empty. Flushing. May repeat some of last message(s):
NRF52 HW INFO: Variant: AACA, Part: N52840, Package: QI, Ram: K256, Flash: K1024

---| No debug queue found. You can set it with the DebugQueue component.

---| Fault Status |---
Data Access Violation:              true
Forced Hard Fault:                  true
Faulting Memory Address:            0x20006800
Fault Status Register (CFSR):       0x00000082
Hard Fault Status Register (HFSR):  0x40000000

---| App Status |---
App: console_triangle   -   [Fault]
 Events Queued: 0   Syscall Count: 3   Dropped Callback Count: 0
 Restart Count: 0
 Last Syscall: Some(MEMOP { operand: 0, arg0: 536897556 })

 ╔═══════════╤══════════════════════════════════════════╗
 ║  Address  │ Region Name    Used | Allocated (bytes)  ║
 ╚0x20006000═╪══════════════════════════════════════════╝
             │ ▼ Grant         940 |    940
  0x20005C54 ┼───────────────────────────────────────────
             │ Unused
  0x20004C00 ┼───────────────────────────────────────────
             │ ▲ Heap            ? |      ?               S
  ?????????? ┼─────────────────────────────────────────── R
             │ Data              ? |      ?               A
  0x20004800 ┼─────────────────────────────────────────── M
             │ ▼ Stack          32 |   2048
  0x200047E0 ┼───────────────────────────────────────────
             │ Unused
  0x20004000 ┴───────────────────────────────────────────
             .....
  0x00031000 ┬─────────────────────────────────────────── F
             │ App Flash      4032                        L
  0x00030040 ┼─────────────────────────────────────────── A
             │ Protected        64                        S
  0x00030000 ┴─────────────────────────────────────────── H

  R0 : 0x20006814    R6 : 0x00030040
  R1 : 0x20006800    R7 : 0x20004800
  R2 : 0x20006800    R8 : 0x20006814
  R3 : 0x00000010    R10: 0x00000000
  R4 : 0x20006814    R11: 0x00000000
  R5 : 0x00000001    R12: 0x00000000
  R9 : 0x00000000 (Static Base Register)
  SP : 0x200047E0 (Process Stack Pointer)
  LR : 0x000301D3
  PC : 0x00030220
 YPC : 0x000300D0

 APSR: N 0 Z 0 C 1 V 0 Q 0
       GE 0 0 0 0
 EPSR: ICI.IT 0x00
       ThumbBit true

 Cortex-M MPU
  Region 0: [0x20004000:0x20006000], length: 8192 bytes; ReadWrite (0x3)
    Sub-region 0: [0x20004000:0x20004400], Enabled
    Sub-region 1: [0x20004400:0x20004800], Enabled
    Sub-region 2: [0x20004800:0x20004C00], Enabled
    Sub-region 3: [0x20004C00:0x20005000], Enabled
    Sub-region 4: [0x20005000:0x20005400], Disabled
    Sub-region 5: [0x20005400:0x20005800], Disabled
    Sub-region 6: [0x20005800:0x20005C00], Disabled
    Sub-region 7: [0x20005C00:0x20006000], Disabled
  Region 1: [0x00030000:0x00031000], length: 4096 bytes; ReadOnly (0x6)
    Sub-region 0: [0x00030000:0x00030200], Enabled
    Sub-region 1: [0x00030200:0x00030400], Enabled
    Sub-region 2: [0x00030400:0x00030600], Enabled
    Sub-region 3: [0x00030600:0x00030800], Enabled
    Sub-region 4: [0x00030800:0x00030A00], Enabled
    Sub-region 5: [0x00030A00:0x00030C00], Enabled
    Sub-region 6: [0x00030C00:0x00030E00], Enabled
    Sub-region 7: [0x00030E00:0x00031000], Enabled
  Region 2: Unused
  Region 3: Unused
  Region 4: Unused
  Region 5: Unused
  Region 6: Unused
  Region 7: Unused

To debug, run `make debug RAM_START=0x20004000 FLASH_INIT=0x30081`
in the app's folder and open the .lst file.

on hail, again on master:

Initialization complete. Entering main loop


Kernel panic at tock/kernel/src/process.rs:373:
	"Restart threshold surpassed!"
	Kernel version release-1.4-822-g527afb416

---| No debug queue found. You can set it with the DebugQueue component.

---| Fault Status |---
Data Access Violation:              true
Forced Hard Fault:                  true
Faulting Memory Address:            0x20006800
Fault Status Register (CFSR):       0x00000082
Hard Fault Status Register (HFSR):  0x40000000

---| App Status |---
App: console_triangle   -   [StoppedFaulted]
 Events Queued: 0   Syscall Count: 3   Dropped Callback Count: 0
 Restart Count: 5
 Last Syscall: Some(MEMOP { operand: 0, arg0: 536897556 })

 ╔═══════════╤══════════════════════════════════════════╗
 ║  Address  │ Region Name    Used | Allocated (bytes)  ║
 ╚0x20006000═╪══════════════════════════════════════════╝
             │ ▼ Grant         952 |    952
  0x20005C48 ┼───────────────────────────────────────────
             │ Unused
  0x20004C00 ┼───────────────────────────────────────────
             │ ▲ Heap            ? |      ?               S
  ?????????? ┼─────────────────────────────────────────── R
             │ Data              ? |      ?               A
  0x20004800 ┼─────────────────────────────────────────── M
             │ ▼ Stack          32 |   2048
  0x200047E0 ┼───────────────────────────────────────────
             │ Unused
  0x20004000 ┴───────────────────────────────────────────
             .....
  0x00031000 ┬─────────────────────────────────────────── F
             │ App Flash      4032                        L
  0x00030040 ┼─────────────────────────────────────────── A
             │ Protected        64                        S
  0x00030000 ┴─────────────────────────────────────────── H

  R0 : 0x20006814    R6 : 0x00030040
  R1 : 0x20006800    R7 : 0x20004800
  R2 : 0x20006800    R8 : 0x20006814
  R3 : 0x00000010    R10: 0x00000000
  R4 : 0x20006814    R11: 0x00000000
  R5 : 0x00000001    R12: 0x00000000
  R9 : 0x00000000 (Static Base Register)
  SP : 0x200047E0 (Process Stack Pointer)
  LR : 0x000301D3
  PC : 0x00030220
 YPC : 0x000300D0

 APSR: N 0 Z 0 C 1 V 0 Q 0
       GE 0 0 0 0
 EPSR: ICI.IT 0x00
       ThumbBit true

 Cortex-M MPU
  Region 0: [0x20004000:0x20006000], length: 8192 bytes; ReadWrite (0x3)
    Sub-region 0: [0x20004000:0x20004400], Enabled
    Sub-region 1: [0x20004400:0x20004800], Enabled
    Sub-region 2: [0x20004800:0x20004C00], Enabled
    Sub-region 3: [0x20004C00:0x20005000], Enabled
    Sub-region 4: [0x20005000:0x20005400], Disabled
    Sub-region 5: [0x20005400:0x20005800], Disabled
    Sub-region 6: [0x20005800:0x20005C00], Disabled
    Sub-region 7: [0x20005C00:0x20006000], Disabled
  Region 1: [0x00030000:0x00031000], length: 4096 bytes; ReadOnly (0x6)
    Sub-region 0: [0x00030000:0x00030200], Enabled
    Sub-region 1: [0x00030200:0x00030400], Enabled
    Sub-region 2: [0x00030400:0x00030600], Enabled
    Sub-region 3: [0x00030600:0x00030800], Enabled
    Sub-region 4: [0x00030800:0x00030A00], Enabled
    Sub-region 5: [0x00030A00:0x00030C00], Enabled
    Sub-region 6: [0x00030C00:0x00030E00], Enabled
    Sub-region 7: [0x00030E00:0x00031000], Enabled
  Region 2: Unused
  Region 3: Unused
  Region 4: Unused
  Region 5: Unused
  Region 6: Unused
  Region 7: Unused

To debug, run `make debug RAM_START=0x20004000 FLASH_INIT=0x30081`
in the app's folder and open the .lst file.

@bradjc
Copy link
Contributor

bradjc commented Mar 18, 2020

I ported the app to c and it runs fine on hail and nrf52840dk using uart. On the nrf52840dk with RTT it gets corrupted:

...
000000000000000000000000000000000000000000000000000000000000
1111111111111111111111111111111111111111111111111111111111111
22222222222222222222222222222222222222222222222222222222222222
333333333333333333333333333333333333333333333333333333333333333
4444444444444444444444444444444444444444444444444444444444444444
55555555555555555555555555555555555555555555555555555555555555555
666666666666666666666666666666666666666666666666666666666666666666
7777777777777777777777777777777777777777777777777777777777777777777
88888888888888888888888888811111
88888888888888888888888888888888888
9999999999222222222222222222222222222222222222
9999999999999999999999
333333333333333333333333333333333333333333333333333333333333
000444444444444444444444444444444444444444444444444444444444444444444444444444444444
5555555555555555555555555555555555555555555555555555555555555555555555555555555555555
66666666666666666666666666666666666666666666666666666666666666666666666666666666666666
...

@gendx
Copy link
Contributor Author

gendx commented Mar 18, 2020

on nrf52840dk with kernel on master no changes:

Initialization complete. Entering main loop


Kernel panic at <::core::macros::panic macros>:5:
	"Process console_triangle had a fault"
	Kernel version release-1.4-822-g527afb416

[...]

Faulting Memory Address:            0x20006800

[...]

             │ ▲ Heap            ? |      ?               S
  ?????????? ┼─────────────────────────────────────────── R
             │ Data              ? |      ?               A
  0x20004800 ┼─────────────────────────────────────────── M
             │ ▼ Stack          32 |   2048
  0x200047E0 ┼───────────────────────────────────────────
             │ Unused
  0x20004000 ┴───────────────────────────────────────────

Your kernel loads apps from 0x20004000 whereas the app's layout expects to start from 0x20006000. As mentioned in #1696 (comment), this is because with USB_DEBUGGING enabled, the kernel's RAM ends at 0x20006000, whereas without it ends at 0x20004000. I put 0x20006000 as I was testing with USB_DEBUGGING enabled.

on hail, again on master:

Same thing. With libtock-rs the app must currently be compiled with a (board-dependent) fixed start address, because relocating libtock-rs apps currently doesn't work (tock/libtock-rs#28).

@bradjc
Copy link
Contributor

bradjc commented Mar 18, 2020

Gotcha. We really need to improve the tooling around all of this.

@gendx
Copy link
Contributor Author

gendx commented Mar 18, 2020

We really need to improve the tooling around all of this.

What happens would be a bit clearer with debug_load_processes enabled in the config. Maybe we should make it opt-out instead of opt-in?
Although one still needs to remember which address the libtock-rs app expected.

/// Whether the kernel should show debugging output when loading processes.
///
/// If enabled, the kernel will show from which addresses processes are loaded in flash and
/// into which SRAM addresses. This can be useful to debug whether the kernel could
/// successfully load processes, and whether the allocated SRAM is as expected.
crate debug_load_processes: bool,

@alevy
Copy link
Member

alevy commented Mar 19, 2020

I ported the app to c and it runs fine on hail and nrf52840dk using uart. On the nrf52840dk with RTT it gets corrupted:

libtock-rs tooling issues aside, this implies that the problem is likely in the Segger-RTT driver, right?

@bradjc
Copy link
Contributor

bradjc commented Mar 19, 2020

libtock-rs tooling issues aside, this implies that the problem is likely in the Segger-RTT driver, right?

I think so

@gendx
Copy link
Contributor Author

gendx commented Mar 20, 2020

In that case, unless someone else is willing to debug this, I'd label this issue as "blocked" until #1651 is resolved. I don't want to scratch my head on the Segger RTT while the basics it depends on (virtual alarms) are not working properly.

Fixing the alarms may even turn out to be sufficient to fix this corrupted output.

@ppannuto ppannuto added the blocked Waiting on something, like a different PR or a dependency. label Apr 9, 2020
bors bot added a commit that referenced this issue Jun 19, 2020
1960: Increase delay in Segger RTT debugging (fix #1697). r=brghena a=gendx

### Pull Request Overview

This pull request fixes #1697 by increasing the delay in Segger RTT debugging.

I tried various parameters, and 1ms of delay seems reliable enough, even for large buffers (tested up to 1500 bytes).

Somehow, when I originally filed #1697, I don't recall that increasing this delay would work, but it seems to fix it in the current kernel.


### Testing Strategy

This pull request was tested on a nRF52840-DK board with the "console triangle" test mentioned in #1697 and located on https://github.com/gendx/libtock-rs/blob/console-check/examples/console_triangle.rs. I also tested increasing the "width" of the triangle up to 1500 bytes, and the output works properly.

I had a look at https://www.segger.com/downloads/jlink but couldn't find a reliable source of ground truth on the acceptable frequency to refresh the RTT buffers...


### TODO or Help Wanted

This change makes the console slower, especially if one only uses very small buffers. However, reliability is likely preferable over premature optimization. Better start from a working code and optimize from there than the opposite.


### Documentation Updated

- [x] Updated the relevant files in `/docs`, or no updates are required.

### Formatting

- [x] Ran `make prepush`.


Co-authored-by: Guillaume Endignoux <guillaumee@google.com>
Co-authored-by: gendx <gendx@users.noreply.github.com>
@bors bors bot closed this as completed in c17fa3b Jun 19, 2020
@bors bors bot closed this as completed in #1960 Jun 19, 2020
hudson-ayers pushed a commit that referenced this issue Mar 7, 2022
BUG=none
TEST=make

(cherry picked from commit c17fa3b)
Change-Id: I11769236fdf5cdfbe853c09f953ff59bb1ecf0df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Waiting on something, like a different PR or a dependency.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants