Skip to content

Commit

Permalink
Project C v5.7.5-r2
Browse files Browse the repository at this point in the history
  • Loading branch information
cchalpha committed Aug 30, 2021
1 parent 7d2a07b commit f746b58
Show file tree
Hide file tree
Showing 35 changed files with 6,935 additions and 31 deletions.
6 changes: 6 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Expand Up @@ -4947,6 +4947,12 @@

sbni= [NET] Granch SBNI12 leased line adapter

sched_timeslice=
[KNL] Time slice in us for BMQ scheduler.
Format: <int> (must be >= 1000)
Default: 4000
See Documentation/scheduler/sched-BMQ.txt

sched_verbose [KNL] Enables verbose scheduler debug messages.

schedstats= [KNL,X86] Enable or disable scheduled statistics.
Expand Down
10 changes: 10 additions & 0 deletions Documentation/admin-guide/sysctl/kernel.rst
Expand Up @@ -1542,3 +1542,13 @@ is 10 seconds.

The softlockup threshold is (``2 * watchdog_thresh``). Setting this
tunable to zero will disable lockup detection altogether.

yield_type:
===========

BMQ CPU scheduler only. This determines what type of yield calls to
sched_yield will perform.

0 - No yield.
1 - Deboost and requeue task. (default)
2 - Set run queue skip task.
110 changes: 110 additions & 0 deletions Documentation/scheduler/sched-BMQ.txt
@@ -0,0 +1,110 @@
BitMap queue CPU Scheduler
--------------------------

CONTENT
========

Background
Design
Overview
Task policy
Priority management
BitMap Queue
CPU Assignment and Migration


Background
==========

BitMap Queue CPU scheduler, referred to as BMQ from here on, is an evolution
of previous Priority and Deadline based Skiplist multiple queue scheduler(PDS),
and inspired by Zircon scheduler. The goal of it is to keep the scheduler code
simple, while efficiency and scalable for interactive tasks, such as desktop,
movie playback and gaming etc.

Design
======

Overview
--------

BMQ use per CPU run queue design, each CPU(logical) has it's own run queue,
each CPU is responsible for scheduling the tasks that are putting into it's
run queue.

The run queue is a set of priority queues. Note that these queues are fifo
queue for non-rt tasks or priority queue for rt tasks in data structure. See
BitMap Queue below for details. BMQ is optimized for non-rt tasks in the fact
that most applications are non-rt tasks. No matter the queue is fifo or
priority, In each queue is an ordered list of runnable tasks awaiting execution
and the data structures are the same. When it is time for a new task to run,
the scheduler simply looks the lowest numbered queueue that contains a task,
and runs the first task from the head of that queue. And per CPU idle task is
also in the run queue, so the scheduler can always find a task to run on from
its run queue.

Each task will assigned the same timeslice(default 4ms) when it is picked to
start running. Task will be reinserted at the end of the appropriate priority
queue when it uses its whole timeslice. When the scheduler selects a new task
from the priority queue it sets the CPU's preemption timer for the remainder of
the previous timeslice. When that timer fires the scheduler will stop execution
on that task, select another task and start over again.

If a task blocks waiting for a shared resource then it's taken out of its
priority queue and is placed in a wait queue for the shared resource. When it
is unblocked it will be reinserted in the appropriate priority queue of an
eligible CPU.

Task policy
-----------

BMQ supports DEADLINE, FIFO, RR, NORMAL, BATCH and IDLE task policy like the
mainline CFS scheduler. But BMQ is heavy optimized for non-rt task, that's
NORMAL/BATCH/IDLE policy tasks. Below is the implementation detail of each
policy.

DEADLINE
It is squashed as priority 0 FIFO task.

FIFO/RR
All RT tasks share one single priority queue in BMQ run queue designed. The
complexity of insert operation is O(n). BMQ is not designed for system runs
with major rt policy tasks.

NORMAL/BATCH/IDLE
BATCH and IDLE tasks are treated as the same policy. They compete CPU with
NORMAL policy tasks, but they just don't boost. To control the priority of
NORMAL/BATCH/IDLE tasks, simply use nice level.

ISO
ISO policy is not supported in BMQ. Please use nice level -20 NORMAL policy
task instead.

Priority management
-------------------

RT tasks have priority from 0-99. For non-rt tasks, there are three different
factors used to determine the effective priority of a task. The effective
priority being what is used to determine which queue it will be in.

The first factor is simply the task’s static priority. Which is assigned from
task's nice level, within [-20, 19] in userland's point of view and [0, 39]
internally.

The second factor is the priority boost. This is a value bounded between
[-MAX_PRIORITY_ADJ, MAX_PRIORITY_ADJ] used to offset the base priority, it is
modified by the following cases:

*When a thread has used up its entire timeslice, always deboost its boost by
increasing by one.
*When a thread gives up cpu control(voluntary or non-voluntary) to reschedule,
and its switch-in time(time after last switch and run) below the thredhold
based on its priority boost, will boost its boost by decreasing by one buti is
capped at 0 (won’t go negative).

The intent in this system is to ensure that interactive threads are serviced
quickly. These are usually the threads that interact directly with the user
and cause user-perceivable latency. These threads usually do little work and
spend most of their time blocked awaiting another user event. So they get the
priority boost from unblocking while background threads that do most of the
processing receive the priority penalty for using their entire timeslice.
5 changes: 0 additions & 5 deletions arch/powerpc/platforms/cell/spufs/sched.c
Expand Up @@ -51,11 +51,6 @@ static struct task_struct *spusched_task;
static struct timer_list spusched_timer;
static struct timer_list spuloadavg_timer;

/*
* Priority of a normal, non-rt, non-niced'd process (aka nice level 0).
*/
#define NORMAL_PRIO 120

/*
* Frequency of the spu scheduler tick. By default we do one SPU scheduler
* tick for every 10 CPU scheduler ticks.
Expand Down
2 changes: 1 addition & 1 deletion fs/proc/base.c
Expand Up @@ -476,7 +476,7 @@ static int proc_pid_schedstat(struct seq_file *m, struct pid_namespace *ns,
seq_puts(m, "0 0 0\n");
else
seq_printf(m, "%llu %llu %lu\n",
(unsigned long long)task->se.sum_exec_runtime,
(unsigned long long)tsk_seruntime(task),
(unsigned long long)task->sched_info.run_delay,
task->sched_info.pcount);

Expand Down
2 changes: 1 addition & 1 deletion include/asm-generic/resource.h
Expand Up @@ -23,7 +23,7 @@
[RLIMIT_LOCKS] = { RLIM_INFINITY, RLIM_INFINITY }, \
[RLIMIT_SIGPENDING] = { 0, 0 }, \
[RLIMIT_MSGQUEUE] = { MQ_BYTES_MAX, MQ_BYTES_MAX }, \
[RLIMIT_NICE] = { 0, 0 }, \
[RLIMIT_NICE] = { 30, 30 }, \
[RLIMIT_RTPRIO] = { 0, 0 }, \
[RLIMIT_RTTIME] = { RLIM_INFINITY, RLIM_INFINITY }, \
}
Expand Down
32 changes: 30 additions & 2 deletions include/linux/sched.h
Expand Up @@ -680,13 +680,19 @@ struct task_struct {
unsigned int flags;
unsigned int ptrace;

#ifdef CONFIG_SMP
#if defined(CONFIG_SMP) || defined(CONFIG_SCHED_ALT)
int on_cpu;
struct __call_single_node wake_entry;
#endif
#if defined(CONFIG_SMP) && !defined(CONFIG_SCHED_ALT)
struct llist_node wake_entry;
#endif

#ifdef CONFIG_SMP
#ifdef CONFIG_THREAD_INFO_IN_TASK
/* Current CPU: */
unsigned int cpu;
#endif
#ifndef CONFIG_SCHED_ALT
unsigned int wakee_flips;
unsigned long wakee_flip_decay_ts;
struct task_struct *last_wakee;
Expand All @@ -700,6 +706,7 @@ struct task_struct {
*/
int recent_used_cpu;
int wake_cpu;
#endif /* !CONFIG_SCHED_ALT */
#endif
int on_rq;

Expand All @@ -708,6 +715,17 @@ struct task_struct {
int normal_prio;
unsigned int rt_priority;

#ifdef CONFIG_SCHED_ALT
u64 last_ran;
s64 time_slice;
int boost_prio;
#ifdef CONFIG_SCHED_BMQ
int bmq_idx;
struct list_head bmq_node;
#endif /* CONFIG_SCHED_BMQ */
/* sched_clock time spent running */
u64 sched_time;
#else /* !CONFIG_SCHED_ALT */
const struct sched_class *sched_class;
struct sched_entity se;
struct sched_rt_entity rt;
Expand All @@ -718,6 +736,7 @@ struct task_struct {
unsigned long core_cookie;
unsigned int core_occupation;
#endif
#endif /* !CONFIG_SCHED_ALT */

#ifdef CONFIG_CGROUP_SCHED
struct task_group *sched_task_group;
Expand Down Expand Up @@ -1417,6 +1436,15 @@ struct task_struct {
*/
};

#ifdef CONFIG_SCHED_ALT
#define tsk_seruntime(t) ((t)->sched_time)
/* replace the uncertian rt_timeout with 0UL */
#define tsk_rttimeout(t) (0UL)
#else /* CFS */
#define tsk_seruntime(t) ((t)->se.sum_exec_runtime)
#define tsk_rttimeout(t) ((t)->rt.timeout)
#endif /* !CONFIG_SCHED_ALT */

static inline struct pid *task_pid(struct task_struct *task)
{
return task->thread_pid;
Expand Down
11 changes: 11 additions & 0 deletions include/linux/sched/deadline.h
@@ -1,5 +1,15 @@
/* SPDX-License-Identifier: GPL-2.0 */

#ifdef CONFIG_SCHED_ALT

#ifdef CONFIG_SCHED_BMQ
#define __tsk_deadline(p) (0UL)
#endif

#else

#define __tsk_deadline(p) ((p)->dl.deadline)

/*
* SCHED_DEADLINE tasks has negative priorities, reflecting
* the fact that any of them has higher prio than RT and
Expand All @@ -19,6 +29,7 @@ static inline int dl_task(struct task_struct *p)
{
return dl_prio(p->prio);
}
#endif /* CONFIG_SCHED_ALT */

static inline bool dl_time_before(u64 a, u64 b)
{
Expand Down
5 changes: 5 additions & 0 deletions include/linux/sched/prio.h
Expand Up @@ -18,6 +18,11 @@
#define MAX_PRIO (MAX_RT_PRIO + NICE_WIDTH)
#define DEFAULT_PRIO (MAX_RT_PRIO + NICE_WIDTH / 2)

#ifdef CONFIG_SCHED_ALT
/* +/- priority levels from the base priority */
#define MAX_PRIORITY_ADJ 4
#endif

/*
* Convert user-nice values [ -20 ... 0 ... 19 ]
* to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
Expand Down
2 changes: 2 additions & 0 deletions include/linux/sched/rt.h
Expand Up @@ -24,8 +24,10 @@ static inline bool task_is_realtime(struct task_struct *tsk)

if (policy == SCHED_FIFO || policy == SCHED_RR)
return true;
#ifndef CONFIG_SCHED_ALT
if (policy == SCHED_DEADLINE)
return true;
#endif
return false;
}

Expand Down
28 changes: 27 additions & 1 deletion init/Kconfig
Expand Up @@ -786,9 +786,33 @@ config GENERIC_SCHED_CLOCK

menu "Scheduler features"

menuconfig SCHED_ALT
bool "Alternative CPU Schedulers"
default y
help
This feature enable alternative CPU scheduler"

if SCHED_ALT

choice
prompt "Alternative CPU Scheduler"
default SCHED_BMQ

config SCHED_BMQ
bool "BMQ CPU scheduler"
help
The BitMap Queue CPU scheduler for excellent interactivity and
responsiveness on the desktop and solid scalability on normal
hardware and commodity servers.

endchoice

endif

config UCLAMP_TASK
bool "Enable utilization clamping for RT/FAIR tasks"
depends on CPU_FREQ_GOV_SCHEDUTIL
depends on !SCHED_BMQ
help
This feature enables the scheduler to track the clamped utilization
of each CPU based on RUNNABLE tasks scheduled on that CPU.
Expand Down Expand Up @@ -874,6 +898,7 @@ config NUMA_BALANCING
depends on ARCH_SUPPORTS_NUMA_BALANCING
depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY
depends on SMP && NUMA && MIGRATION
depends on !SCHED_BMQ
help
This option adds support for automatic NUMA aware memory/task placement.
The mechanism is quite primitive and is based on migrating memory when
Expand Down Expand Up @@ -960,7 +985,7 @@ menuconfig CGROUP_SCHED
bandwidth allocation to such task groups. It uses cgroups to group
tasks.

if CGROUP_SCHED
if CGROUP_SCHED && !SCHED_BMQ
config FAIR_GROUP_SCHED
bool "Group scheduling for SCHED_OTHER"
depends on CGROUP_SCHED
Expand Down Expand Up @@ -1231,6 +1256,7 @@ config CHECKPOINT_RESTORE

config SCHED_AUTOGROUP
bool "Automatic process group scheduling"
depends on !SCHED_BMQ
select CGROUPS
select CGROUP_SCHED
select FAIR_GROUP_SCHED
Expand Down
15 changes: 15 additions & 0 deletions init/init_task.c
Expand Up @@ -75,9 +75,15 @@ struct task_struct init_task
.stack = init_stack,
.usage = REFCOUNT_INIT(2),
.flags = PF_KTHREAD,
#ifdef CONFIG_SCHED_ALT
.prio = DEFAULT_PRIO + MAX_PRIORITY_ADJ,
.static_prio = DEFAULT_PRIO,
.normal_prio = DEFAULT_PRIO + MAX_PRIORITY_ADJ,
#else
.prio = MAX_PRIO - 20,
.static_prio = MAX_PRIO - 20,
.normal_prio = MAX_PRIO - 20,
#endif
.policy = SCHED_NORMAL,
.cpus_ptr = &init_task.cpus_mask,
.cpus_mask = CPU_MASK_ALL,
Expand All @@ -87,13 +93,22 @@ struct task_struct init_task
.restart_block = {
.fn = do_no_restart_syscall,
},
#ifdef CONFIG_SCHED_ALT
.boost_prio = 0,
#ifdef CONFIG_SCHED_BMQ
.bmq_idx = 15,
.bmq_node = LIST_HEAD_INIT(init_task.bmq_node),
#endif
.time_slice = HZ,
#else
.se = {
.group_node = LIST_HEAD_INIT(init_task.se.group_node),
},
.rt = {
.run_list = LIST_HEAD_INIT(init_task.rt.run_list),
.time_slice = RR_TIMESLICE,
},
#endif
.tasks = LIST_HEAD_INIT(init_task.tasks),
#ifdef CONFIG_SMP
.pushable_tasks = PLIST_NODE_INIT(init_task.pushable_tasks, MAX_PRIO),
Expand Down

0 comments on commit f746b58

Please sign in to comment.