Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arc SMP support #17747

Conversation

vonhust
Copy link

@vonhust vonhust commented Jul 24, 2019

This PR is used to contain all necessary work related to ARC SMP support. It will be frequently updated

This PR should be rebased and merged after #16662 and #17420

  • update (2019.08.06 )
    • lock issue fixed in timer_int_handler, z_arc_connect_gfrc_read should be protected by spin lock
    • lock issue fixed in z_clock_elapsed, z_arc_connect_gfrc_read should be protected by spin lock

@vonhust vonhust requested review from abrodkin and ruuddw July 24, 2019 10:05
@vonhust vonhust requested a review from andrewboie as a code owner July 24, 2019 10:05
@zephyrbot zephyrbot added area: ARC ARC Architecture area: API Changes to public APIs labels Jul 24, 2019
@andrewboie
Copy link
Contributor

We need a new build target for NSIM which runs all our tests with SMP enabled, to show that this works.

@abrodkin
Copy link
Collaborator

@andrewboie well, it's not that easy unfortunately. The point is "Free nSIM" which is available for free doesn't support multi-processor setups (by design - that's one of limitations of nSIM's free version).

Full MetaWare debugger which includes full nSIM version does support multi-processor configurations and SMP in particular but it requires a license and generally is not available for free.

The plan though is to use QEMU for ARC when SMP support is implemented (our expectation is a couple of months from now) or as an alternative real HSDK board (with quad-ore ARC HS38) could be used (https://www.synopsys.com/dw/ipdir.php?ds=arc-hs-development-kit). If ARC SMP target is really required I think we may discuss donation of the board to the community.

And finally this pull-request is named not very correct as it is still a preparatory step before having real SMP target integrated... but stay tuned we're almost there :)

Copy link
Collaborator

@abrodkin abrodkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all this pull-request title is not correct. We're not introducing SMP support for ARC just yet instead we just introduce support of "ARConnect" which will be required for SMP.

Then I suggest to remove any atomic-related changes from this pull-request because it's a completely different topic.

Also I think it makes more sense to implement functional blocks rather than have many very specific functions. Based on my summary in #16661 we need:

  • Interrupt Distribution Unit (IDU) which is basically an intermediate interrupt controller for which it makes sense to implement: init(), irq_enable() & irq_disable.

  • Global Free-Runing Counter which is just a counter for which we need to implement corresponding driver. But given it's just a counter but not timer (i.e. it cannot generate interrupts on overflow) I may assume we don't really need it for starters at least.

  • Inter-core Debug Unit is useful but we just need to setup it properly on start so a simple open-coded snippet in some cpu_init() function will be enough.

  • And finally Inter-core Interrupt Unit is used as a tool for cross-core signaling.

With that functional approach code will be:

  1. Much simpler and slimmer, i.e. easier to understand & review
  2. It all will be used, i.e. no dead-code

arch/arc/core/CMakeLists.txt Outdated Show resolved Hide resolved
arch/arc/core/atomic.S Outdated Show resolved Hide resolved
@@ -417,6 +420,6 @@ SECTION_FUNC(TEXT, atomic_cas)

/* failed comparison */
nanoAtomicCas_fail:
scond r1, [r0] /* write old value to clear the access lock */
scond r3, [r0] /* write old value to clear the access lock */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? There's no such thing as "access lock". There's an internal "access flag" which gets set by llock and is checked by scond (if it is still set or was reset by a write to the cache line which contains our precious data word). That said the next llock will set that flag again and here we don't need to do anything as it will be a pure waste of time.

I.e. I suggest to remove that scond at all.

And BTW that's why I suggested @ruuddw to not spend time now on this implementation of custom atomics as it obviously requires a special and careful inspection. Instead file a separate issue for ARC custom atomics face-lift and do int either as a bug-fix past merge-window or as a brand new feature in the next release if there're more chnages needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that llock and scond must be used in pair. that's why scond is added at the end.
Based on your experience on linux, is it ok to use llocak only ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llock/scond indeed are meant to be used as a pair but in their essence they are just a little bit specific load (ld on ARC parlance) & store (st) instructions. In fact you may use llock instead of normal ld easily and it will do the job - it will read specified memory location on each invocation.

But it's scond ("store conditional") which makes that pair useful for implementation of atomic data operations because scond only succeeds if it sees hidden architectural flag set previously by llock and not yet reset by IRQ, exception or any write to that same location.

That said it's OK to not execute scond if there's no need to like in this case - we already know data loaded by llock doesn't match our expectations. Again there's no requirement to use scond if llock was used previously.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will try and test it on HSDK, if no problem , i will optimize it as you suggested

extern u32_t arc_connect_ici_check_src(void);
extern void arc_connect_ici_clear(void);

/* inter-core debug related functions */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure if we do need all those features for Inter-core Debugging.
From practical point of view and based on our experience with SMP Linux (which we've been using for years now) what's important is to set Inter-core Debug unit in such a way so that whenever any ARC core from the cluster gets halted for any reason (be it external debugger via JTAG or flag 1 instruction executed the core itself on die() etc) all the rest are halted as well. That helps a lot for debugging because no cores are progressing as well as gives a better post-mortem picture as all the cores are halted not very far from where real problem happened.

That said we only really need arc_connect_debug_mask_set() and all the rest might be removed for now.
See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arc/kernel/mcip.c#n94 as a reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you want to keep code as simple as possible. But I prefer keeping codes belonging to one unit together. As zephyr goes forward, these functions may be used.

Even we plan to remove unused code, it's better be in the final clean up.

extern void arc_connect_gfrc_clear(void);
extern u32_t arc_connect_gfrc_lo_read(void);
extern u32_t arc_connect_gfrc_hi_read(void);
extern u64_t arc_connect_gfrc_read(void);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we have arc_connect_gfrc_read() which reads u64 value already why keep arc_connect_gfrc_lo_read() & arc_connect_gfrc_hi_read()? Let's only implement what we're going to use.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan though is to use QEMU for ARC when SMP support is implemented

Works for me.

arch/arc/core/arc_connect.c Outdated Show resolved Hide resolved
@abrodkin
Copy link
Collaborator

The plan though is to use QEMU for ARC when SMP support is implemented

Works for me.

@andrewboie so just there's no misunderstanding - I meant when SMP support gets added to QEMU port for ARC, not SMPS support in Zephyr :)

@vonhust
Copy link
Author

vonhust commented Jul 25, 2019

First of all this pull-request title is not correct. We're not introducing SMP support for ARC just yet instead we just introduce support of "ARConnect" which will be required for SMP.

Then I suggest to remove any atomic-related changes from this pull-request because it's a completely different topic.

Also I think it makes more sense to implement functional blocks rather than have many very specific functions. Based on my summary in #16661 we need:

  • Interrupt Distribution Unit (IDU) which is basically an intermediate interrupt controller for which it makes sense to implement: init(), irq_enable() & irq_disable.
  • Global Free-Runing Counter which is just a counter for which we need to implement corresponding driver. But given it's just a counter but not timer (i.e. it cannot generate interrupts on overflow) I may assume we don't really need it for starters at least.
  • Inter-core Debug Unit is useful but we just need to setup it properly on start so a simple open-coded snippet in some cpu_init() function will be enough.
  • And finally Inter-core Interrupt Unit is used as a tool for cross-core signaling.

With that functional approach code will be:

  1. Much simpler and slimmer, i.e. easier to understand & review
  2. It all will be used, i.e. no dead-code

Alexey, this PR contains all necessary board-indepent codes for SMP, it will be useful for you to add hs smp in nsim. If you, ruud or others prefer smaller PR, i will split PR into smaller ones, like one PR for arc connect only, one PR for arcv2 timer, etc.

/* gfrc is the wall clock */
curr_time = arc_connect_gfrc_read();

key = k_spin_lock(&lock);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that same code is supposed to be executed by all cores in the cluster?
If so it might be a bit of overkill I guess, but if only 1 core will be doing that then spinlock is not required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accoridng to my debug on hsdk, it's possible for all cores to raise local timer interrupt, so spinlock is required to get the wall time

Currently, Zephyr's SMP is still simple. In the future, it's possible to optimize this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that's perfectly possible but do we really need slave cores to have Timer0 interrupt for anything except timekeeping? If not we may disable it. Note since we do have cross-core interrupts I guess it's OK for slave cores to not have their own periodic interrupt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Alexey has a point, what if multiple CPUs announce the same clocktick? I think then the first one does the 'real' announce, and next ones do a z_clock_announce(0) ? I think z_clock_announce can handle that, but it seems a waste of cycles.

if (IS_ENABLED(CONFIG_TICKLESS_IDLE) && idle && ticks == K_FOREVER) {
timer0_control_register_set(0);
timer0_count_register_set(0);
timer0_limit_register_set(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given ARC core timers are actually free running counters capable of interrupt generation on reaching a preset "limit" value, there's no point in messing with all 3 registers here. The only thing we need to do is clear control.IE bit so that IRQs are not generated, all the rest is waste of cycles.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's kind of optimization, but i prefer letting it in a determined status.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then add a comment explicitly saying what happens here like:

In reality ARC core timers (Timer0 & Timer1) are always running and we cannot stop them. The only thing we may do is disable interrupt generation which is done by writing 0 to IE bit in timer's control register. But for XXX (nicer debugging experience whatever) we also set 0 in count & limit registers.

Still IMHO this code might fool a reader. Moreover on power-on timer has interrupts disabled control.IE = 0. That's what PRM says:

When the processor is Reset, the IE flag is set to 0

So nothing needs to be done here at all.

/* ensure slave core's internal timer doesn't work */
timer0_control_register_set(0);
timer0_count_register_set(0);
timer0_limit_register_set(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here no point of setting count & limit registers as it won't stop the timer/counter. Just set control.IE bit to 0.

timer0_limit_register_set(0);

z_irq_priority_set(IRQ_TIMER0, CONFIG_ARCV2_TIMER_IRQ_PRIORITY, 0);
irq_enable(IRQ_TIMER0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above says "ensure slave core's internal timer doesn't work" still we enable Timer0 interrupt here even though in the peripheral (Timer0) we just disabled interrupt generation by writing 0 to its control register. That doesn't look good.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment needs to be updated. It's possible that each core's local timer interrupt is raised.

boards/arc/nsim/doc/index.rst Outdated Show resolved Hide resolved
boards/arc/nsim/doc/index.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@abrodkin abrodkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Obviously that improves performance (removal of spin-lock wrapper) and helps in bring-up on new targets both virtual and real HW (check for critical HW features presence).

@vonhust
Copy link
Author

vonhust commented Aug 2, 2019

@vonhust please add definition of ARC_CONNECT_IRQ_START to include/arch/arc/v2/arc_connect.h since w/o it we won't be able to prepare real platforms for execution and that property (offset of ARC core's interrupt controller input for IRQs coming via IDU) really belongs to ARConnect.

I.e. something like that:

diff --git a/include/arch/arc/v2/arc_connect.h b/include/arch/arc/v2/arc_connect.h
index 8e7e274ed1..941a983d5c 100644
--- a/include/arch/arc/v2/arc_connect.h
+++ b/include/arch/arc/v2/arc_connect.h
@@ -158,6 +158,12 @@ struct arc_connect_idu_bcr {
        };
 };

+/*
+ * The interrupts distributed by the IDU to each CPU core are connected to
+ * sequential interrupt lines: irq24, irq25, and so on.
+ */
+#define ARC_CONNECT_IRQ_START          24
+
 static inline void z_arc_connect_cmd(u32_t cmd, u32_t param)
 {
        struct arc_connect_cmd regval;

@abrodkin it's already in arc_connect.h, line 103

/* the start intno of common interrupt managed by IDU */
#define ARC_CONNECT_IDU_IRQ_START	

@abrodkin
Copy link
Collaborator

abrodkin commented Aug 2, 2019

@ruuddw could you please review this one and approve if there're no objections.

Copy link
Member

@ruuddw ruuddw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok in general, left few comments on gfrc timer, and SMP interrupt stack and initialization design explanation.

@@ -229,20 +290,28 @@ u32_t z_clock_elapsed(void)
return 0;
}

#ifdef CONFIG_SMP
return (z_arc_connect_gfrc_read() - last_time) / CYC_PER_TICK;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a spinlock here? Earlier it was argued gfrc requires protection. last_time could be updated by other core, couldn't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read is save, only need to do protection for write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fully sure, what if last_time got updated just after the z_arc_connect_gfrc_read() call returned? then last_time could be larger then this value, resulting in a very large z_clock_elpased() time...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because z_clock_elapsed will be called frequently called, so last_time will be correct finally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already fixed, z_arc_connect_gfrc_read() should be protected by spin lock

}

u32_t z_timer_cycle_get_32(void)
{
#ifdef CONFIG_SMP
return z_arc_connect_gfrc_read() - start_time;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the symantics for timer_cycle_get_32, is the start_time needed, or would it be ok to simply return cycles after reset, instead of cycles after init?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a spinlock isn't required here, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have considered to reset the gfrc, but in hsdk, the reset/halt of gfrc don't work, i.e, gfrc cannot be controlled, so i introdcued start_time. Because start_time is just set once, no need of a spinlock

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was: do we care about the start_time, or is it ok to return the slightly larger hw timer value directly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, e.g, soft reset, debug, the gfrc will not stop, if no start_time, when z_timer_cycl_get_32 first,called, it may return a very big value, like 0xa5a5a5a5, it will be unreasonable, the whole system will work uncorrectly. If we can reset and clear the gfrc during init, the start_time can be removed.

@@ -35,10 +35,16 @@

static struct k_spinlock lock;


#ifdef CONFIG_SMP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CONFIG_SMP the right define here? It's the ARC_CONNECT gfrt that you depend on, not SMP (though SMP implies ARC_CONNECT and GFRT are configured).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when smp is not configured, we just use arc internal timer even ARC_CONNECT is there. If someone just wants to use arc_connect gfrc on single core, better to provide a separate sys tick driver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would it be better to provide a new, separate driver in that case? Feels to me the 'SMP' code would simply work in that case, too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep things simple. the only usage of gfrc until know is for SMP case. Just like i remove lots of unused codes in arc_connect driver. Obviously, Zephyr's SMP feautre will be changed a lot in future. We can do that when required.



volatile u32_t arc_cpu_wake_flag;
volatile u32_t _curr_irq_stack[CONFIG_MP_NUM_CPUS];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments would help here. Why is the variable name _curr_irq_stack, curr stands for current? Isn't it the per-cpu interrupt stack, already part of per CPU kernel context?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_curr_irq_stack is used to simplfy the record of per_cpu interrupt stack. Because zephyr's high level design,
per_cpu stack pointly is deep in cpu struct, not easy to use in assembly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, understood.
With 'comment', I meant comment in the code, not comment in the review, can you please add the explanation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments added

arc_cpu_wake_flag = cpu_num;

/* wait slave cpu to start */
while (arc_cpu_wake_flag != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments explaining the init sequence and the synchronization protocol using this flag would be appreciated. Any implicit assumptions of boot order of different CPUs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master core must init first, no assumptions for slave cores

static ALWAYS_INLINE void kernel_arch_init(void)
{
z_irq_setup();
_current_cpu->irq_stack =
Z_THREAD_STACK_BUFFER(_interrupt_stack) + CONFIG_ISR_STACK_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single '_interrupt_stack' used by all CPUs? See earlier remarks, some additional comments or general overview on the SMP interrupt/stack design would help understanding this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernel_arch_init is only called by master core, slave core will not call this.

* arc connect is a component to connect multiple arc cores
* it's necessary for arc smp support
* the following features are implemented
  * inter-core interrupt unit
  * gloabl free running counter
  * inter-core debug unit
  * interrupt distribute unit

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
@vonhust vonhust dismissed dbkinder’s stale review August 6, 2019 08:25

This PR has doc now after refactor and rebase

Wayne Ren added 3 commits August 6, 2019 17:23
* use global free running counter as global wall clock (clock source)
* use arc internal timer 0 as local time event (clock event)

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
add macros for assembly and C to get current cpu id

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
* modify the reset flow for SMP
* add smp related initialization
* implement ipi related functions
* implement thread switch in isr/exception

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
Copy link
Member

@ruuddw ruuddw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the timer spinlock issues, I consider this PR now ready for merging.

@vonhust
Copy link
Author

vonhust commented Aug 7, 2019

Ping for merge @ioannisg @carlescufi @galak @andrewboie @nashif . We have several other PRs waiting for this PR

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment from my side

config ARC_CONNECT
bool "ARC has ARC connect"
select SCHED_IPI_SUPPORTED
default n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default n is not needed

@carlescufi
Copy link
Member

Merging, please remove the default n in a follow-up PR

@carlescufi carlescufi merged commit e11be42 into zephyrproject-rtos:master Aug 7, 2019
ioannisg pushed a commit to foss-for-synopsys-dwc-arc-processors/zephyr that referenced this pull request Aug 10, 2019
as suggested in comments of PR zephyrproject-rtos#17747, no need of default n
for arc_connect

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
ioannisg pushed a commit that referenced this pull request Aug 10, 2019
as suggested in comments of PR #17747, no need of default n
for arc_connect

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
@vonhust vonhust deleted the arc_smp branch August 11, 2019 06:38
LeiW000 pushed a commit to LeiW000/zephyr that referenced this pull request Sep 2, 2019
as suggested in comments of PR zephyrproject-rtos#17747, no need of default n
for arc_connect

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants