New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel: init: allow to jitter boot delay #25627
kernel: init: allow to jitter boot delay #25627
Conversation
Open issue: How to make sure that the printk() is flushed before k_msleep()? |
As this mentions networking usecase, adding @jukkar to reviewers. I myself would point that if networking is the usecase, than rather e.g. DHCP client should have a randomized delay support. Handling that with a boot delay seems like a crude approach. |
It is mostly a wireless/BT/openThread networking use case, not all of the networking options necessarily require DHCP. It might be a crude approach, but the implementation is rather straight forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable. And as already commented, the DHCP provides similar support but if one is not using DHCP then something like this is useful.
c4978f7
to
6d8db01
Compare
6d8db01
to
cce769a
Compare
rebased on master post 2.3 release. |
cce769a
to
b14f096
Compare
b14f096
to
e904700
Compare
e904700
to
0f8096b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick about docs, but one that does need to be fixed. Also confused about that noop printk wrapper.
help | ||
This option randomizes the bootup delay between 0 and the | ||
value of BOOT_DELAY. This can be used e.g., to jitter boot-up | ||
of wireless network devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantry: this needs to specify units. How long is "1"? One reasonable choice IMHO is to bake it into the API and call this CONFIG_BOOT_DELAY_RANDOM_MS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it says, the value is between 0 and BOOT_DELAY. BOOT_DELAY is in milliseconds:
https://github.com/zephyrproject-rtos/zephyr/pull/25627/files#diff-a0dbba97dac305f5927e75a408d79a25R309-R310
va_start(ap, fmt); | ||
vprintk(fmt, ap); | ||
va_end(ap); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. What is the point of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printk() is calling log_printk() when CONFIG_LOG_PRINTK is on. Then the output would be happening in the logging thread which is halted while the busy-loop is running. The output of the delay would only happen after the delay and thus leaving the user wondering what is happening.
This way the message is outputted before the busy-loop executes and the user knows that boot is delayed.
@@ -191,6 +182,15 @@ void z_data_copy(void) | |||
|
|||
bool z_sys_post_kernel; | |||
|
|||
void immediate_printk(const char *fmt, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you adding it ? What it the problem with printk
? I presume is something related with log_printk
... but I don't know what it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
help | ||
This option randomizes the bootup delay between 0 and the | ||
value of BOOT_DELAY. This can be used e.g., to jitter boot-up | ||
of wireless network devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth to add a note about it in BOOT_DELAY
option. Because there is saying that the entire value set will be used to delay the boot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If BOOT_DELAY_RANDOM is not set, then BOOT_DELAY is used.
Would you rather add a random delay on top of BOOT_DELAY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my comments. If I'm right the random value is not being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather you not modify the core kernel for this use-case.
Have you considered defining a SYS_INIT() in your application (at PRE_KERNEL_1) which just introduces a delay there?
0f8096b
to
4640172
Compare
Have not considered this. How would you delay network set-up with SYS_INIT()? Do you have an example? |
When a big amount of networked devices collectively gets switched on, e.g. by a mains switch, a low-bandwidth wireless network is quickly overwhelmed. This PR allows to jitter the start-up of the devices. Cleaned-up the boot banner after the boot delay. At that point in time the boot banner is irrelevant. Signed-off-by: Markus Becker <markus.becker@tridonic.com>
4640172
to
63f83af
Compare
OK... I get it. And this is clever, but... no, sorry. :) Really what you're doing here is exploiting a bug in the logging subsystem. CONFIG_LOG_PRINTK is supposed to mean that printk() output goes to logging instead, but apparently the feature missed a spot, or put the hook in the wrong place, and it happens that vprintk() doesn't do that and goes to the original output. But that's wrong. When you have CONFIG_LOG=y and CONFIG_LOG_IMMEDIATE=n, logging is asynchronous. That's the way it's supposed to work, and it's desired behavior[1]. Subsystems shouldn't be silently working around the intended behavior of other subsystems by using backdoors. Basically: if the user asked for async logging give them what they want. If you don't think they should want that, let's deal with that via a patch to the log layer. [1] By someone (cough, cough, @nordic-krch). I actually agree that for most development usage asynchronous logging is the devil's work and don't understand why we decided to make it default. But it's the system we have, so let's not hack holes around it. |
Thanks :)
Hmm.
Will the bug be fixed?
True, that the user wants to use async logging (or has to because it is pulled in by default). But this tripped me several times, looking at debug on UART and expecting to see something, only to notice several trials later that the device is delaying the boot. For my information, would be switching the logging backend to RTT make any difference?
What was the reason to move it to async? Would disabling async logging be possible? What would be the side-effects? |
Do we have a bug opened on GitHub for this issue? That's the first step. Consider enabling CONFIG_LOG_MINIMAL and just ignoring the async log mechanism. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
When a big amount of networked devices collectively gets switched on,
e.g. by a mains switch, quickly overwhelms a wireless network.
This PR allows to jitter the start-up of the devices.
Cleaned-up the boot banner after the boot delay. At that point in time
is irrelevant.
Signed-off-by: Markus Becker markus.becker@tridonic.com