Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device-level Cache API #33122

Closed
nika-nordic opened this issue Mar 8, 2021 · 14 comments · Fixed by #34607
Closed

Device-level Cache API #33122

nika-nordic opened this issue Mar 8, 2021 · 14 comments · Fixed by #34607
Labels
RFC Request For Comments: want input from the community

Comments

@nika-nordic
Copy link
Collaborator

Introduction

Zephyr has a public API for cache management, and it was greatly enhanced, recently, in this PR: #30398.
There exists an open ticket that reflects an original view of how a “complete” API should look like: #2647.

One limitation of the current API is that it treats cache as, always, arch-specific feature (i.e. a Kernel -> ARCH API). However, there might be SoCs which may implement a cache controller as a peripheral separate from the core.

Problem description

Some SoCs may implement a cache management controller as a separate peripheral that exists outside the core, thus, cache management, becomes, more of a device-management issue, i.e. cache should not be managed at arch level.
The implementation of the current Zephyr Cache API assumes that cache is managed at arch-level only.

Proposed change

Rework implementation of existing Zephyr Cache API so that both arch-level and device-level cache management APIs may be supported.

Detailed RFC

Proposed change (Detailed)

Current implementation of cache API looks like this:

static inline int z_impl_sys_dcache_all(int op)
{
              if (IS_ENABLED(CONFIG_CACHE_MANAGEMENT)) {
                             return arch_dcache_all(op);
              }
              return -ENOTSUP;
}

There are two approaches to support both arch & device cache:

  1. use device-level API or the arch-level API depending on DT setting:
static inline int z_impl_sys_dcache_all(int op)
{
              if (IS_ENABLED(CONFIG_CACHE_MANAGEMENT)) {
                             if (<cache is in arch>) {
                                           return arch_dcache_all(op);
                             } else if (<cache is a device>) {
                                           return dcache_all(op); // Example of device-level cache API call
                             }
              }
              return -ENOTSUP;
}
  1. use device-level API underneath arch-level API - this would probably imply the need of adding cache.c file to arch directory of affected SoC
    (analogously to https://github.com/zephyrproject-rtos/zephyr/blob/master/arch/arc/core/cache.c).
    This option may sound counter-intuitive (as peripheral is not a part of arch obviously) but I believe it is done in Zephyr already
    (example - call to PM subsystem from within arch code: https://github.com/zephyrproject-rtos/zephyr/blob/master/arch/arm/core/aarch32/irq_manage.c#L188)

Dependencies

As far as I know, cache is not widely used in Zephyr currently. arc arch that uses enhanced cache API will be unaffected by proposed changes.

Concerns and Unresolved Questions

  • which approach should be used for introducing device-level cache API?
  • should cache device API (to be added to /include/drivers) be introduced to Zephyr as a separate RFC?
@nika-nordic nika-nordic added the RFC Request For Comments: want input from the community label Mar 8, 2021
@carlocaione
Copy link
Collaborator

So I have possibly done something like this for another problem, see #32673, not sure if applicable here.

This is how this is working for the pm_cpu_ops subsystem:

  1. The general subsystem interface is defined in https://github.com/zephyrproject-rtos/zephyr/blob/master/include/drivers/pm_cpu_ops.h
  2. A weak implementation is given in https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/pm_cpu_ops/pm_cpu_ops_weak_impl.c
  3. The interface can be hooked to arch-specific code if needed, but also the implementation can be provided by some kind of device, like in this case PSCI, see https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/pm_cpu_ops/pm_cpu_ops_psci.c

Can something like this work for your cache case?

@ioannisg
Copy link
Member

I think this can work, yes. Basically, we need a way that the cache API is available in device- and arch-level.

In the example you refer to @carlocaione , the API was defined at device-level, and the implementation hooks up arch- functions.
In this current scenario, the arch-level API already exists, so to follow your suggestion we need an - additional - device level API. I'd like to get some more opinions before adopting the recommendation.

@ioannisg
Copy link
Member

CC @carlescufi @ceolin

@carlocaione
Copy link
Collaborator

In the example you refer to @carlocaione , the API was defined at device-level, and the implementation hooks up arch- functions.

Isn't exactly the opposite in the pm_cpu_ops case? The API is defined at an agnostic level (neither arch- nor device-, just a general interface implemented by weak functions) while the implementation hooks up device- functions (the PSCI driver that is implementing the functions is indeed a device probed by DT).

@carlescufi
Copy link
Member

API meeting 2021.03.16:

Proposal:

  • Take the arch code that is currently present in the tree and move it to drivers/cache/. Each architecture would have its own cache driver.
  • Add additional drivers for SoC-specific cache controllers that override the architecture ones

New API would be in include/drivers/cache.h.

drivers/cache/ would likely contain:

cache_aarch32.c
cache_aarch64.c
cache_arc.c
cache_x86.c
cache_soc_xyz.c

Use the interrupt controller and the timer (drivers/timer) as examples of drivers used directly by the kernel.
Additional details on the actual architecture are required to be able to make progress.

@carlocaione
Copy link
Collaborator

So, back at this again.

This is a possible proposal based on the @carlescufi proposal.

  1. The cache code should be moved and live in drivers/cache/ with the API defined in include/drivers/cache/cache.h (it's better to have a cache directory as well in case we need to host cache-related header files).
  2. drivers/cache/ directory should at least host the arch code cache management (so cache_aarch32.c for ARM, cache_aarch64.c, etc...).
  3. A cache.c file should be present and compiled unconditionally. This file is hosting the weak implementation of the API hooks, see for example sys_clock_init.c. During the API meeting someone raised a point with the weak functions being problematic so a couple of points about this:
    • The timer subsystem is apparently already using this.
    • We can avoid using weak functions if we force each arch to provide also a dummy implementation of the API.
  4. The same directory also should host the cache-controller-as-a-peripheral drivers.
  5. Assuming that these drivers and the arch-implementation are mutually exclusive we should manage this at Kconfig time maybe adding a CONFIG_HAS_SOC_CACHE or something similar to avoid compiling in the cache arch implementation.
  6. The cache-controller-as-a-peripheral driver is probed using DT as usual and should simply hook up the cache APIs.

tagging some more people for feedbacks: @nashif @npitre @ceolin @andyross @stephanosio

@ceolin
Copy link
Member

ceolin commented Apr 21, 2021

So, back at this again.

This is a possible proposal based on the @carlescufi proposal.

1. The cache code should be moved and live in `drivers/cache/` with the API defined in `include/drivers/cache/cache.h` (it's better to have a `cache` directory as well in case we need to host cache-related header files).

2. `drivers/cache/` directory should at least host the arch code cache management (so `cache_aarch32.c` for ARM, `cache_aarch64.c`, etc...).

3. A `cache.c` file should be present and compiled unconditionally. This file is hosting the weak implementation of the API hooks, see for example [sys_clock_init.c](https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/timer/sys_clock_init.c). During the API meeting someone raised a point with the weak functions being problematic so a couple of points about this:
   
   * The timer subsystem is apparently already using this.
   * We can avoid using weak functions if we force each arch to provide also a dummy implementation of the API.

4. The same directory also should host the _cache-controller-as-a-peripheral_ drivers.

5. Assuming that these drivers and the arch-implementation are mutually exclusive we should manage this at Kconfig time maybe adding a `CONFIG_HAS_SOC_CACHE` or something similar to avoid compiling in the cache arch implementation.

6. The _cache-controller-as-a-peripheral_ driver is probed using DT as usual and should simply hook up the cache APIs.

tagging some more people for feedbacks: @nashif @npitre @ceolin @andyross @stephanosio

The problem with weak function is if you end up having two strong definitions (that would cause a build failure) or two weak symbols which the behavior is not defined. Is there a explicit precedence between arch / peripheral ? If yes we could establish that architectures would always implement an weak function and peripherals the strong symbol. The drawback is that all arch have to implement this API.
There is another possibility to avoid this problem add a shim, for example we can have in this proposed cache.c:

extern __weak dcache_all(int op);

static inline int z_impl_sys_dcache_all(int op)
{
              if (dcache_all != NULL) {
                         return dcache_all(op);
              }
              return -ENOTSUP;
}

In this case if dcache_all is not implemented it will return -ENOTSUP otherwise it will return the strong symbol or the weak symbol, in this order.

@carlocaione
Copy link
Collaborator

The problem with weak function is if you end up having two strong definitions (that would cause a build failure) or two weak symbols which the behavior is not defined.

Build failure is "good" (because you can catch the problem early) and I don't see how you can end up with two weak symbols. The only place where the weak symbol is supposed to be declared is inside cache.c. Both arch and peripheral drivers should use non-weak functions.

Is there a explicit precedence between arch / peripheral?

Yes, that should be enforced using Kconfig. For example CONFIG_HAS_SOC_CACHE should disable the arch and only use the peripheral driver.

If yes we could establish that architectures would always implement an weak function and peripherals the strong symbol. The drawback is that all arch have to implement this API.

Yes, that's why I think my proposal is more complex but less problematic I guess:

  • If no arch / peripheral driver is present the weak functions in cache.c return -ENOTSUP
  • If only one arch / peripheral driver is present with a non-weak function there is no problem
  • If both drivers are present, Kconfig is enforcing some priority, otherwise a build-time error is raised (as it correctly should)

There is another possibility to avoid this problem add a shim, for example we can have in this proposed cache.c:

Yes, that could be a solution.

Just to put down here PROs and CONs:

extern __weak dcache_all(int op);

static inline int z_impl_sys_dcache_all(int op)
{
        if (dcache_all != NULL) {
                return dcache_all(op);
        }
        return -ENOTSUP;
}

PROs: slim and elegant
CONs: we have to actually mark some function as weak to enforce implicit ordering

static inline int z_impl_sys_dcache_all(int op)
{
        if (IS_ENABLED(CONFIG_HAS_ARCH_CACHE)) {
                return arch_dcache_all(op);
        } else if (IS_ENABLED(CONFIG_HAS_PERIPH_CACHE)) {
                return dcache_all(op);
        }
        return -ENOTSUP;
}

PROs: no weak functions
CONs: arch and peripheral functions have different names

int __weak dcache_all(int op)
{
        return -ENOTSUP;
}

static inline int z_impl_sys_dcache_all(int op)
{
        dcache_all(op);
}

PROs: fallback weak function
CONs: arch and peripheral drivers conflicts should be resolved by Kconfig

Or a mix of course.

I don't have a strong preference TBH, so I'm open to whatever implementation the majority thinks is the best.

@carlocaione
Copy link
Collaborator

Tagging @dcpleung because X86 people can be interested in this.

@dcpleung
Copy link
Member

Moving into drivers/cache means arch code no longer only resides under arch/, which is going to cause some confusion.

We had this issue with weak functions for timing subsys, as there can be arch-/soc-/board-specific timing routine. We decided to use kconfig to specify which level to use. I think the caching API has similar problem where we can have cache control on different levels, and we cannot use weak functions.

Do you think we would need to support all these levels? Or do you prefer to have a 'one or the otherapproach (i.e. arch or external)? If latter, this is easy. We can declare arch_dcache_all()as an alias todcache_all()with__weak FUNC_ALIAS(...)similar to what we have in newlib's libc hooks. So if there is an actual implementation ofdcache_all()`, it will be used instead of the arch one.

@carlocaione
Copy link
Collaborator

carlocaione commented Apr 23, 2021

Moving into drivers/cache means arch code no longer only resides under arch/, which is going to cause some confusion.

In drivers/cache you only have a function call to something possibly defined somewhere under arch/ (if you decide to use the arch driver). I mean, also the interrupt controller drivers are calling code into arch/ after all.

We had this issue with weak functions for timing subsys, as there can be arch-/soc-/board-specific timing routine. We decided to use kconfig to specify which level to use. I think the caching API has similar problem where we can have cache control on different levels, and we cannot use weak functions.

This case is easier though. You do not have levels, you have a choice: either arch-implemented cache operations or driver-implemented cache operations, that's it (or architecture without cache management at all).

Do you think we would need to support all these levels? Or do you prefer to have a 'one or the other approach' (i.e. arch or external)? If latter, this is easy.

Exactly my point. At least for now we do not need to support all the levels, only arch or external / device.

We can declare arch_dcache_all() as an alias to dcache_all() with__weak FUNC_ALIAS(...) similar to what we have in newlib's libc hooks. So if there is an actual implementation of dcache_all(), it will be used instead of the arch one.

Ok, this is on the same line of what was already proposed. But what about architectures that do not implement neither arch nor device cache drivers? Considering that I'd probably prefer to have a shim as proposed by @ceolin in #33122 (comment) to catch architecture without cache management at all.

@dcpleung
Copy link
Member

Instead of dealing with weak functions, maybe we can use kconfig choice to choose which to use (including none). Each choice would depend on an CONFIG_*_HAS_CACHE_CONTROL (or similar). Then the inline functions in cache.h can call the appropriate arch or driver cache functions. This would allow future expansion for SoC and board level cache code (though not sure how much of this matters).

@carlocaione
Copy link
Collaborator

That's basically solution 2 in #33122 (comment):

static inline int z_impl_sys_dcache_all(int op)
{
        if (IS_ENABLED(CONFIG_HAS_ARCH_CACHE)) {
                return arch_dcache_all(op);
        } else if (IS_ENABLED(CONFIG_HAS_PERIPH_CACHE)) {
                return dcache_all(op);
        }
        return -ENOTSUP;
}

Right?

@dcpleung
Copy link
Member

Oh yeah... forgot I read about it yesterday. (I need more caffeine.)

Since we have 3 choices (for now): none, arch and device. Kconfig seems to be a more straightforward approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments: want input from the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants