Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Stack Allocation API #26999

Closed
Tracked by #25569
cfriedt opened this issue Jul 20, 2020 · 9 comments · Fixed by #44379
Closed
Tracked by #25569

Add a Stack Allocation API #26999

cfriedt opened this issue Jul 20, 2020 · 9 comments · Fixed by #44379
Labels
area: Kernel Enhancement Changes/Updates/Additions to existing features priority: medium Medium impact/importance bug

Comments

@cfriedt
Copy link
Member

cfriedt commented Jul 20, 2020

Relates to #24714

We want an API such that we can pass in a desired stack size, and a suitably aligned pointer to a stack object will be returned, with all reserved space, alignment, and permissions considerations taken care of.

@andrewboie andrewboie added this to the v2.4.0 milestone Jul 20, 2020
@andrewboie
Copy link
Contributor

With the updates to sys_heap this shouldn't be hard, one of my PRs needs to go in #24714 and then it should be easy.

@andrewboie andrewboie added Enhancement Changes/Updates/Additions to existing features area: Kernel labels Jul 20, 2020
@andrewboie
Copy link
Contributor

For cases where user mode compatibility is not needed, for a stack with requested buffer size N just allocate a blob with size Z_KERNEL_STACK_SIZE_ADJUST(N) and aligned to Z_KERNEL_STACK_OBJ_ALIGN.

To get this completely working with user mode the following steps need to be taken:

  • At creation time it must be decided whether the created stack will have the capability to host a user thread, or only a supervisor thread.
  • To create a stack with requested size N, requires reserving a memory region of size X with alignment Y, where both X and Y are functions of N that vary per-arch. We now have macros to help with this:
    • For size, X = Z_THREAD_STACK_SIZE_ADJUST(N) or X = Z_KERNEL_STACK_SIZE_ADJUST(N), for user and supervisor-only stacks respectively
    • For alignment, Y = Z_THREAD_STACK_OBJ_ALIGN(N) or Y = Z_KERNEL_STACK_OBJ_ALIGN, for user and supervisor-only stacks respectively. In the latter case, the alignment is not a function of N.
  • If the stack can host user thread, kernel object metadata must be set up properly
    • A struct z_object metadata must be created for the stack object and placed within the dynamic tables, this is handled by z_dynamic_object_create()
    • If CONFIG_GEN_PRIV_STACKS=y, then the privilege mode stack memory must also be allocated, this is a fixed-size kernel-only stack. A struct z_stack_data must also be allocated and properly configured.

@nashif nashif added the priority: medium Medium impact/importance bug label Aug 13, 2020
@cfriedt
Copy link
Member Author

cfriedt commented Oct 9, 2020

Looking at this again, and we were just chatting about this on Slack. Some additional notes:

#29064 - @andrewboie provides the struct k_thread * as an argument to fn_abort()
^^ this will also be necessary for #25973 as we need to free the stack from a safe context (either another thread or the idle thread). I would for sure use this function pointer in pthread.c .

I thought about adding another flag K_STACK_ON_HEAP and oring that into user_options. That would be easiest, as long as the original void * from k_malloc() can be recreated from the members of struct _thread_stack_info, because that same void * will need to be passed back to k_free().

@andrewboie
Copy link
Contributor

I thought about adding another flag K_STACK_ON_HEAP and oring that into user_options. That would be easiest, as long as the original void * from k_malloc() can be recreated from the members of struct _thread_stack_info, because that same void * will need to be passed back to k_free().

I don't think you should be touching anything inside the k_thread object, that is reserved for the kernel.
Do this accounting at the pthread level.

@cfriedt cfriedt pinned this issue Feb 3, 2021
@cfriedt
Copy link
Member Author

cfriedt commented Feb 3, 2021

For cases where user mode compatibility is not needed, for a stack with requested buffer size N just allocate a blob with size Z_KERNEL_STACK_SIZE_ADJUST(N) and aligned to Z_KERNEL_STACK_OBJ_ALIGN.

That was easy - 15/20 platforms are passing as-is right now with that approach. I think that the stack size might be too small on some 64-bit architectures, so I'll have to increase the default.

To get this completely working with user mode the following steps need to be taken:

  • At creation time it must be decided whether the created stack will have the capability to host a user thread, or only a supervisor thread.

I have not done that yet. Currently, I've mainly considered kernel mode threads.

  • To create a stack with requested size N, requires reserving a memory region of size X with alignment Y, where both X and Y are functions of N that vary per-arch. We now have macros to help with this:

    • For size, X = Z_THREAD_STACK_SIZE_ADJUST(N) or X = Z_KERNEL_STACK_SIZE_ADJUST(N), for user and supervisor-only stacks respectively
    • For alignment, Y = Z_THREAD_STACK_OBJ_ALIGN(N) or Y = Z_KERNEL_STACK_OBJ_ALIGN, for user and supervisor-only stacks respectively. In the latter case, the alignment is not a function of N.
  • If the stack can host user thread, kernel object metadata must be set up properly

    • A struct z_object metadata must be created for the stack object and placed within the dynamic tables, this is handled by z_dynamic_object_create()
    • If CONFIG_GEN_PRIV_STACKS=y, then the privilege mode stack memory must also be allocated, this is a fixed-size kernel-only stack. A struct z_stack_data must also be allocated and properly configured.

Today I took a stab at the creating a k_thread_stack_alloc() function that would create a dynamic k_object under the assumption that it was a compile-time issue whether to use kernel or user mode threads, but that was a bad assumption. It would be nice to have a consistent API for that that would work for both user and kernel threads, such that whether it was user or kernel mode would be decided by the mode of the calling thread.

For kernel threads, it should be fine to deallocate the stack with a call to abort_fn(), but for user mode threads, it looks as though the stack could be automatically cleaned up when the number of references to the stack drop to zero. Is that true?

@nashif - I'm hoping to finish this up in the near future. I'll need to come up with a better system than the K_STACK_ON_HEAP flag since @andrewboie does not want me to touch the k_thread object. Probably the next-best thing I could do would be to keep a linked-list of pointers to stacks that have been dynamically allocated, and then k_free() if necessary in the abort_fn() callback.

@nashif nashif unpinned this issue Feb 3, 2021
@cfriedt
Copy link
Member Author

cfriedt commented Feb 27, 2021

z_dynamic_object_aligned_create() aligns the dynamic object, not the data section of the dynamic object, so it does not, by itself, appear to be able to be able to allocate a suitably aligned stack area. It would be ideal to be able to re-use k_aligned_alloc() to allocate the stack area, then to allocate a dynamic object, and then to embed a pointer to the stack area inside of it, but I'm not sure if that's exactly how kernel objects work in terms of MPU / MMU protection.

cfriedt added a commit to cfriedt/zephyr that referenced this issue Mar 14, 2022
This change adds the syscall k_alloc_thread_stack() which
dynamically allocates suitably aligned and padded thread stacks.

Fixes zephyrproject-rtos#26999

Signed-off-by: Christopher Friedt <chrisfriedt@gmail.com>
cfriedt added a commit to cfriedt/zephyr that referenced this issue Mar 14, 2022
This change adds tests for dynamically allocated thread stacks.

Fixes zephyrproject-rtos#26999

Signed-off-by: Christopher Friedt <chrisfriedt@gmail.com>
@cfriedt cfriedt modified the milestones: v2.4.0, v3.3.0 Oct 21, 2022
@stephanosio stephanosio modified the milestones: v3.3.0, v3.4.0 Jan 26, 2023
@joelguittet
Copy link
Contributor

Hello @cfriedt

I would like to know the status and goals on this feature, I see you have also #44379 that is a draft PR.

My understanding is actually what you have develop permits to reserve an area (looks it is a good idea !) to declare dynamically some stacks, that can be passed to k_thread_create. It seems then you need to k_thread_join waiting the termination of the task to free the stack itself when it's no more used. Is it correct ?

The last point is annoying in my opinion, because this means using dynamic stack we need to implement a kind of controller that will check for tasks created with dynamic stack, verify if they are terminated and then release the corresponding stacks.

If I refer to FreeRTOS, this is what is done: when a task is deleted, it is added to a list for termination. Then the idle task periodically check this list and free memory.

So my question will be more exactly and more directly: why not having the same behavior in Zephyr, idle task checking terminated tasks to free the dynamically created stacks that are no more used ?

Thanks,
Joel

@cfriedt
Copy link
Member Author

cfriedt commented Apr 1, 2023

Hi @joelguittet - sorry this issue has not moved very much, but I really don't get very many opportunities to work on this feature these days. Most of this is done in my "spare time".

The goal is to be able to dynamically allocate a stack. There have been many attempts with PRs that try and do this sort of thing using a pool of statically allocated stacks. That is technically quite easy.

What is not as easy is using arbitrary heap memory for a thread stack. For that, we need a corresponding kernel object as well for when CONFIG_USERSPACE is enabled. Additionally, on systems with an MMU, we want to be able to set the correct MMU flags for the allocated stack regions before the thread begins to run, and clear the flags after the thread has joined.

All of that is done in #44379 except for the MMU bits. I have not really had an opportunity to figure those out. Also, from what I understand, it may be a different procedure for different architectures (but I'm not certain of that).

There is additional complexity for systems that have an MPU or PMP, because those units can sometimes have insane memory alignment requirements for thread stacks.

So yes, there is the typical requirements for stack setup / tear-down, but also additional complexities.

I have no ETA for when this will be finished, as it is quite low on my list of priorities at the moment.

@joelguittet
Copy link
Contributor

@cfriedt
Thanks for the prompt answer! You made a big picture which is very interesting, looks like the topic is more expanded that I was thinking.
Regarding ETA, no worries, I understand perfectly, myself I'm also developing in my free time :-)
I will try to have a specific implementation for my use case.
Joel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel Enhancement Changes/Updates/Additions to existing features priority: medium Medium impact/importance bug
Projects
None yet
5 participants