Linux kernel mirror (for testing)
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel
os
linux
1.. _sched-ext:
2
3==========================
4Extensible Scheduler Class
5==========================
6
7sched_ext is a scheduler class whose behavior can be defined by a set of BPF
8programs - the BPF scheduler.
9
10* sched_ext exports a full scheduling interface so that any scheduling
11 algorithm can be implemented on top.
12
13* The BPF scheduler can group CPUs however it sees fit and schedule them
14 together, as tasks aren't tied to specific CPUs at the time of wakeup.
15
16* The BPF scheduler can be turned on and off dynamically anytime.
17
18* The system integrity is maintained no matter what the BPF scheduler does.
19 The default scheduling behavior is restored anytime an error is detected,
20 a runnable task stalls, or on invoking the SysRq key sequence
21 `SysRq-S`.
22
23* When the BPF scheduler triggers an error, debug information is dumped to
24 aid debugging. The debug dump is passed to and printed out by the
25 scheduler binary. The debug dump can also be accessed through the
26 `sched_ext_dump` tracepoint. The SysRq key sequence `SysRq-D`
27 triggers a debug dump. This doesn't terminate the BPF scheduler and can
28 only be read through the tracepoint.
29
30Switching to and from sched_ext
31===============================
32
33``CONFIG_SCHED_CLASS_EXT`` is the config option to enable sched_ext and
34``tools/sched_ext`` contains the example schedulers. The following config
35options should be enabled to use sched_ext:
36
37.. code-block:: none
38
39 CONFIG_BPF=y
40 CONFIG_SCHED_CLASS_EXT=y
41 CONFIG_BPF_SYSCALL=y
42 CONFIG_BPF_JIT=y
43 CONFIG_DEBUG_INFO_BTF=y
44 CONFIG_BPF_JIT_ALWAYS_ON=y
45 CONFIG_BPF_JIT_DEFAULT_ON=y
46
47sched_ext is used only when the BPF scheduler is loaded and running.
48
49If a task explicitly sets its scheduling policy to ``SCHED_EXT``, it will be
50treated as ``SCHED_NORMAL`` and scheduled by the fair-class scheduler until the
51BPF scheduler is loaded.
52
53When the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is not set
54in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
55``SCHED_EXT`` tasks are scheduled by sched_ext.
56
57However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is
58set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled
59by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and
60``SCHED_IDLE`` policies are scheduled by the fair-class scheduler which has
61higher sched_class precedence than ``SCHED_EXT``.
62
63Terminating the sched_ext scheduler program, triggering `SysRq-S`, or
64detection of any internal error including stalled runnable tasks aborts the
65BPF scheduler and reverts all tasks back to the fair-class scheduler.
66
67.. code-block:: none
68
69 # make -j16 -C tools/sched_ext
70 # tools/sched_ext/build/bin/scx_simple
71 local=0 global=3
72 local=5 global=24
73 local=9 global=44
74 local=13 global=56
75 local=17 global=72
76 ^CEXIT: BPF scheduler unregistered
77
78The current status of the BPF scheduler can be determined as follows:
79
80.. code-block:: none
81
82 # cat /sys/kernel/sched_ext/state
83 enabled
84 # cat /sys/kernel/sched_ext/root/ops
85 simple
86
87You can check if any BPF scheduler has ever been loaded since boot by examining
88this monotonically incrementing counter (a value of zero indicates that no BPF
89scheduler has been loaded):
90
91.. code-block:: none
92
93 # cat /sys/kernel/sched_ext/enable_seq
94 1
95
96Each running scheduler also exposes a per-scheduler ``events`` file under
97``/sys/kernel/sched_ext/<scheduler-name>/events`` that tracks diagnostic
98counters. Each counter occupies one ``name value`` line:
99
100.. code-block:: none
101
102 # cat /sys/kernel/sched_ext/simple/events
103 SCX_EV_SELECT_CPU_FALLBACK 0
104 SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE 0
105 SCX_EV_DISPATCH_KEEP_LAST 123
106 SCX_EV_ENQ_SKIP_EXITING 0
107 SCX_EV_ENQ_SKIP_MIGRATION_DISABLED 0
108 SCX_EV_REENQ_IMMED 0
109 SCX_EV_REENQ_LOCAL_REPEAT 0
110 SCX_EV_REFILL_SLICE_DFL 456789
111 SCX_EV_BYPASS_DURATION 0
112 SCX_EV_BYPASS_DISPATCH 0
113 SCX_EV_BYPASS_ACTIVATE 0
114 SCX_EV_INSERT_NOT_OWNED 0
115 SCX_EV_SUB_BYPASS_DISPATCH 0
116
117The counters are described in ``kernel/sched/ext_internal.h``; briefly:
118
119* ``SCX_EV_SELECT_CPU_FALLBACK``: ops.select_cpu() returned a CPU unusable by
120 the task and the core scheduler silently picked a fallback CPU.
121* ``SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE``: a local-DSQ dispatch was redirected
122 to the global DSQ because the target CPU went offline.
123* ``SCX_EV_DISPATCH_KEEP_LAST``: a task continued running because no other
124 task was available (only when ``SCX_OPS_ENQ_LAST`` is not set).
125* ``SCX_EV_ENQ_SKIP_EXITING``: an exiting task was dispatched to the local DSQ
126 directly, bypassing ops.enqueue() (only when ``SCX_OPS_ENQ_EXITING`` is not set).
127* ``SCX_EV_ENQ_SKIP_MIGRATION_DISABLED``: a migration-disabled task was
128 dispatched to its local DSQ directly (only when
129 ``SCX_OPS_ENQ_MIGRATION_DISABLED`` is not set).
130* ``SCX_EV_REENQ_IMMED``: a task dispatched with ``SCX_ENQ_IMMED`` was
131 re-enqueued because the target CPU was not available for immediate execution.
132* ``SCX_EV_REENQ_LOCAL_REPEAT``: a reenqueue of the local DSQ triggered
133 another reenqueue; recurring counts indicate incorrect ``SCX_ENQ_REENQ``
134 handling in the BPF scheduler.
135* ``SCX_EV_REFILL_SLICE_DFL``: a task's time slice was refilled with the
136 default value (``SCX_SLICE_DFL``).
137* ``SCX_EV_BYPASS_DURATION``: total nanoseconds spent in bypass mode.
138* ``SCX_EV_BYPASS_DISPATCH``: number of tasks dispatched while in bypass mode.
139* ``SCX_EV_BYPASS_ACTIVATE``: number of times bypass mode was activated.
140* ``SCX_EV_INSERT_NOT_OWNED``: attempted to insert a task not owned by this
141 scheduler into a DSQ; such attempts are silently ignored.
142* ``SCX_EV_SUB_BYPASS_DISPATCH``: tasks dispatched from sub-scheduler bypass
143 DSQs (only relevant with ``CONFIG_EXT_SUB_SCHED``).
144
145``tools/sched_ext/scx_show_state.py`` is a drgn script which shows more
146detailed information:
147
148.. code-block:: none
149
150 # tools/sched_ext/scx_show_state.py
151 ops : simple
152 enabled : 1
153 switching_all : 1
154 switched_all : 1
155 enable_state : enabled (2)
156 bypass_depth : 0
157 nr_rejected : 0
158 enable_seq : 1
159
160Whether a given task is on sched_ext can be determined as follows:
161
162.. code-block:: none
163
164 # grep ext /proc/self/sched
165 ext.enabled : 1
166
167The Basics
168==========
169
170Userspace can implement an arbitrary BPF scheduler by loading a set of BPF
171programs that implement ``struct sched_ext_ops``. The only mandatory field
172is ``ops.name`` which must be a valid BPF object name. All operations are
173optional. The following modified excerpt is from
174``tools/sched_ext/scx_simple.bpf.c`` showing a minimal global FIFO scheduler.
175
176.. code-block:: c
177
178 /*
179 * Decide which CPU a task should be migrated to before being
180 * enqueued (either at wakeup, fork time, or exec time). If an
181 * idle core is found by the default ops.select_cpu() implementation,
182 * then insert the task directly into SCX_DSQ_LOCAL and skip the
183 * ops.enqueue() callback.
184 *
185 * Note that this implementation has exactly the same behavior as the
186 * default ops.select_cpu implementation. The behavior of the scheduler
187 * would be exactly same if the implementation just didn't define the
188 * simple_select_cpu() struct_ops prog.
189 */
190 s32 BPF_STRUCT_OPS(simple_select_cpu, struct task_struct *p,
191 s32 prev_cpu, u64 wake_flags)
192 {
193 s32 cpu;
194 /* Need to initialize or the BPF verifier will reject the program */
195 bool direct = false;
196
197 cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &direct);
198
199 if (direct)
200 scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
201
202 return cpu;
203 }
204
205 /*
206 * Do a direct insertion of a task to the global DSQ. This ops.enqueue()
207 * callback will only be invoked if we failed to find a core to insert
208 * into in ops.select_cpu() above.
209 *
210 * Note that this implementation has exactly the same behavior as the
211 * default ops.enqueue implementation, which just dispatches the task
212 * to SCX_DSQ_GLOBAL. The behavior of the scheduler would be exactly same
213 * if the implementation just didn't define the simple_enqueue struct_ops
214 * prog.
215 */
216 void BPF_STRUCT_OPS(simple_enqueue, struct task_struct *p, u64 enq_flags)
217 {
218 scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
219 }
220
221 s32 BPF_STRUCT_OPS_SLEEPABLE(simple_init)
222 {
223 /*
224 * By default, all SCHED_EXT, SCHED_OTHER, SCHED_IDLE, and
225 * SCHED_BATCH tasks should use sched_ext.
226 */
227 return 0;
228 }
229
230 void BPF_STRUCT_OPS(simple_exit, struct scx_exit_info *ei)
231 {
232 exit_type = ei->type;
233 }
234
235 SEC(".struct_ops")
236 struct sched_ext_ops simple_ops = {
237 .select_cpu = (void *)simple_select_cpu,
238 .enqueue = (void *)simple_enqueue,
239 .init = (void *)simple_init,
240 .exit = (void *)simple_exit,
241 .name = "simple",
242 };
243
244Dispatch Queues
245---------------
246
247To match the impedance between the scheduler core and the BPF scheduler,
248sched_ext uses DSQs (dispatch queues) which can operate as both a FIFO and a
249priority queue. By default, there is one global FIFO (``SCX_DSQ_GLOBAL``),
250and one local DSQ per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
251an arbitrary number of DSQs using ``scx_bpf_create_dsq()`` and
252``scx_bpf_destroy_dsq()``.
253
254A CPU always executes a task from its local DSQ. A task is "inserted" into a
255DSQ. A task in a non-local DSQ is "move"d into the target CPU's local DSQ.
256
257When a CPU is looking for the next task to run, if the local DSQ is not
258empty, the first task is picked. Otherwise, the CPU tries to move a task
259from the global DSQ. If that doesn't yield a runnable task either,
260``ops.dispatch()`` is invoked.
261
262Scheduling Cycle
263----------------
264
265The following briefly shows how a waking task is scheduled and executed.
266
2671. When a task is waking up, ``ops.select_cpu()`` is the first operation
268 invoked. This serves two purposes. First, CPU selection optimization
269 hint. Second, waking up the selected CPU if idle.
270
271 The CPU selected by ``ops.select_cpu()`` is an optimization hint and not
272 binding. The actual decision is made at the last step of scheduling.
273 However, there is a small performance gain if the CPU
274 ``ops.select_cpu()`` returns matches the CPU the task eventually runs on.
275
276 A side-effect of selecting a CPU is waking it up from idle. While a BPF
277 scheduler can wake up any cpu using the ``scx_bpf_kick_cpu()`` helper,
278 using ``ops.select_cpu()`` judiciously can be simpler and more efficient.
279
280 Note that the scheduler core will ignore an invalid CPU selection, for
281 example, if it's outside the allowed cpumask of the task.
282
283 A task can be immediately inserted into a DSQ from ``ops.select_cpu()``
284 by calling ``scx_bpf_dsq_insert()`` or ``scx_bpf_dsq_insert_vtime()``.
285
286 If the task is inserted into ``SCX_DSQ_LOCAL`` from
287 ``ops.select_cpu()``, it will be added to the local DSQ of whichever CPU
288 is returned from ``ops.select_cpu()``. Additionally, inserting directly
289 from ``ops.select_cpu()`` will cause the ``ops.enqueue()`` callback to
290 be skipped.
291
292 Any other attempt to store a task in BPF-internal data structures from
293 ``ops.select_cpu()`` does not prevent ``ops.enqueue()`` from being
294 invoked. This is discouraged, as it can introduce racy behavior or
295 inconsistent state.
296
2972. Once the target CPU is selected, ``ops.enqueue()`` is invoked (unless the
298 task was inserted directly from ``ops.select_cpu()``). ``ops.enqueue()``
299 can make one of the following decisions:
300
301 * Immediately insert the task into either the global or a local DSQ by
302 calling ``scx_bpf_dsq_insert()`` with one of the following options:
303 ``SCX_DSQ_GLOBAL``, ``SCX_DSQ_LOCAL``, or ``SCX_DSQ_LOCAL_ON | cpu``.
304
305 * Immediately insert the task into a custom DSQ by calling
306 ``scx_bpf_dsq_insert()`` with a DSQ ID which is smaller than 2^63.
307
308 * Queue the task on the BPF side.
309
310 **Task State Tracking and ops.dequeue() Semantics**
311
312 A task is in the "BPF scheduler's custody" when the BPF scheduler is
313 responsible for managing its lifecycle. A task enters custody when it is
314 dispatched to a user DSQ or stored in the BPF scheduler's internal data
315 structures. Custody is entered only from ``ops.enqueue()`` for those
316 operations. The only exception is dispatching to a user DSQ from
317 ``ops.select_cpu()``: although the task is not yet technically in BPF
318 scheduler custody at that point, the dispatch has the same semantic
319 effect as dispatching from ``ops.enqueue()`` for custody-related
320 purposes.
321
322 Once ``ops.enqueue()`` is called, the task may or may not enter custody
323 depending on what the scheduler does:
324
325 * **Directly dispatched to terminal DSQs** (``SCX_DSQ_LOCAL``,
326 ``SCX_DSQ_LOCAL_ON | cpu``, or ``SCX_DSQ_GLOBAL``): the BPF scheduler
327 is done with the task - it either goes straight to a CPU's local run
328 queue or to the global DSQ as a fallback. The task never enters (or
329 exits) BPF custody, and ``ops.dequeue()`` will not be called.
330
331 * **Dispatch to user-created DSQs** (custom DSQs): the task enters the
332 BPF scheduler's custody. When the task later leaves BPF custody
333 (dispatched to a terminal DSQ, picked by core-sched, or dequeued for
334 sleep/property changes), ``ops.dequeue()`` will be called exactly
335 once.
336
337 * **Stored in BPF data structures** (e.g., internal BPF queues): the
338 task is in BPF custody. ``ops.dequeue()`` will be called when it
339 leaves (e.g., when ``ops.dispatch()`` moves it to a terminal DSQ, or
340 on property change / sleep).
341
342 When a task leaves BPF scheduler custody, ``ops.dequeue()`` is invoked.
343 The dequeue can happen for different reasons, distinguished by flags:
344
345 1. **Regular dispatch**: when a task in BPF custody is dispatched to a
346 terminal DSQ from ``ops.dispatch()`` (leaving BPF custody for
347 execution), ``ops.dequeue()`` is triggered without any special flags.
348
349 2. **Core scheduling pick**: when ``CONFIG_SCHED_CORE`` is enabled and
350 core scheduling picks a task for execution while it's still in BPF
351 custody, ``ops.dequeue()`` is called with the
352 ``SCX_DEQ_CORE_SCHED_EXEC`` flag.
353
354 3. **Scheduling property change**: when a task property changes (via
355 operations like ``sched_setaffinity()``, ``sched_setscheduler()``,
356 priority changes, CPU migrations, etc.) while the task is still in
357 BPF custody, ``ops.dequeue()`` is called with the
358 ``SCX_DEQ_SCHED_CHANGE`` flag set in ``deq_flags``.
359
360 **Important**: Once a task has left BPF custody (e.g., after being
361 dispatched to a terminal DSQ), property changes will not trigger
362 ``ops.dequeue()``, since the task is no longer managed by the BPF
363 scheduler.
364
3653. When a CPU is ready to schedule, it first looks at its local DSQ. If
366 empty, it then looks at the global DSQ. If there still isn't a task to
367 run, ``ops.dispatch()`` is invoked which can use the following two
368 functions to populate the local DSQ.
369
370 * ``scx_bpf_dsq_insert()`` inserts a task to a DSQ. Any target DSQ can be
371 used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``,
372 ``SCX_DSQ_GLOBAL`` or a custom DSQ. While ``scx_bpf_dsq_insert()``
373 currently can't be called with BPF locks held, this is being worked on
374 and will be supported. ``scx_bpf_dsq_insert()`` schedules insertion
375 rather than performing them immediately. There can be up to
376 ``ops.dispatch_max_batch`` pending tasks.
377
378 * ``scx_bpf_dsq_move_to_local()`` moves a task from the specified non-local
379 DSQ to the dispatching DSQ. This function cannot be called with any BPF
380 locks held. ``scx_bpf_dsq_move_to_local()`` flushes the pending insertions
381 tasks before trying to move from the specified DSQ.
382
3834. After ``ops.dispatch()`` returns, if there are tasks in the local DSQ,
384 the CPU runs the first one. If empty, the following steps are taken:
385
386 * Try to move from the global DSQ. If successful, run the task.
387
388 * If ``ops.dispatch()`` has dispatched any tasks, retry #3.
389
390 * If the previous task is an SCX task and still runnable, keep executing
391 it (see ``SCX_OPS_ENQ_LAST``).
392
393 * Go idle.
394
395Note that the BPF scheduler can always choose to dispatch tasks immediately
396in ``ops.enqueue()`` as illustrated in the above simple example. If only the
397built-in DSQs are used, there is no need to implement ``ops.dispatch()`` as
398a task is never queued on the BPF scheduler and both the local and global
399DSQs are executed automatically.
400
401``scx_bpf_dsq_insert()`` inserts the task on the FIFO of the target DSQ. Use
402``scx_bpf_dsq_insert_vtime()`` for the priority queue. Internal DSQs such as
403``SCX_DSQ_LOCAL`` and ``SCX_DSQ_GLOBAL`` do not support priority-queue
404dispatching, and must be dispatched to with ``scx_bpf_dsq_insert()``. See
405the function documentation and usage in ``tools/sched_ext/scx_simple.bpf.c``
406for more information.
407
408Task Lifecycle
409--------------
410
411The following pseudo-code presents a rough overview of the entire lifecycle
412of a task managed by a sched_ext scheduler:
413
414.. code-block:: c
415
416 ops.init_task(); /* A new task is created */
417 ops.enable(); /* Enable BPF scheduling for the task */
418
419 while (task in SCHED_EXT) {
420 if (task can migrate)
421 ops.select_cpu(); /* Called on wakeup (optimization) */
422
423 ops.runnable(); /* Task becomes ready to run */
424
425 while (task_is_runnable(task)) {
426 if (task is not in a DSQ || task->scx.slice == 0) {
427 ops.enqueue(); /* Task can be added to a DSQ */
428
429 /* Task property change (i.e., affinity, nice, etc.)? */
430 if (sched_change(task)) {
431 ops.dequeue(); /* Exiting BPF scheduler custody */
432 ops.quiescent();
433
434 /* Property change callback, e.g. ops.set_weight() */
435
436 ops.runnable();
437 continue;
438 }
439
440 /* Any usable CPU becomes available */
441
442 ops.dispatch(); /* Task is moved to a local DSQ */
443 ops.dequeue(); /* Exiting BPF scheduler custody */
444 }
445
446 ops.running(); /* Task starts running on its assigned CPU */
447
448 while (task_is_runnable(task) && task->scx.slice > 0) {
449 ops.tick(); /* Called every 1/HZ seconds */
450
451 if (task->scx.slice == 0)
452 ops.dispatch(); /* task->scx.slice can be refilled */
453 }
454
455 ops.stopping(); /* Task stops running (time slice expires or wait) */
456 }
457
458 ops.quiescent(); /* Task releases its assigned CPU (wait) */
459 }
460
461 ops.disable(); /* Disable BPF scheduling for the task */
462 ops.exit_task(); /* Task is destroyed */
463
464Note that the above pseudo-code does not cover all possible state transitions
465and edge cases, to name a few examples:
466
467* ``ops.dispatch()`` may fail to move the task to a local DSQ due to a racing
468 property change on that task, in which case ``ops.dispatch()`` will be
469 retried.
470
471* The task may be direct-dispatched to a local DSQ from ``ops.enqueue()``,
472 in which case ``ops.dispatch()`` and ``ops.dequeue()`` are skipped and we go
473 straight to ``ops.running()``.
474
475* Property changes may occur at virtually any point during the task's lifecycle,
476 not just when the task is queued and waiting to be dispatched. For example,
477 changing a property of a running task will lead to the callback sequence
478 ``ops.stopping()`` -> ``ops.quiescent()`` -> (property change callback) ->
479 ``ops.runnable()`` -> ``ops.running()``.
480
481* A sched_ext task can be preempted by a task from a higher-priority scheduling
482 class, in which case it will exit the tick-dispatch loop even though it is runnable
483 and has a non-zero slice.
484
485See the "Scheduling Cycle" section for a more detailed description of how
486a freshly woken up task gets on a CPU.
487
488Where to Look
489=============
490
491* ``include/linux/sched/ext.h`` defines the core data structures, ops table
492 and constants.
493
494* ``kernel/sched/ext.c`` contains sched_ext core implementation and helpers.
495 The functions prefixed with ``scx_bpf_`` can be called from the BPF
496 scheduler.
497
498* ``kernel/sched/ext_idle.c`` contains the built-in idle CPU selection policy.
499
500* ``tools/sched_ext/`` hosts example BPF scheduler implementations.
501
502 * ``scx_simple[.bpf].c``: Minimal global FIFO scheduler example using a
503 custom DSQ.
504
505 * ``scx_qmap[.bpf].c``: A multi-level FIFO scheduler supporting five
506 levels of priority implemented with ``BPF_MAP_TYPE_QUEUE``.
507
508 * ``scx_central[.bpf].c``: A central FIFO scheduler where all scheduling
509 decisions are made on one CPU, demonstrating ``LOCAL_ON`` dispatching,
510 tickless operation, and kthread preemption.
511
512 * ``scx_cpu0[.bpf].c``: A scheduler that queues all tasks to a shared DSQ
513 and only dispatches them on CPU0 in FIFO order. Useful for testing bypass
514 behavior.
515
516 * ``scx_flatcg[.bpf].c``: A flattened cgroup hierarchy scheduler
517 implementing hierarchical weight-based cgroup CPU control by compounding
518 each cgroup's share at every level into a single flat scheduling layer.
519
520 * ``scx_pair[.bpf].c``: A core-scheduling example that always makes
521 sibling CPU pairs execute tasks from the same CPU cgroup.
522
523 * ``scx_sdt[.bpf].c``: A variation of ``scx_simple`` demonstrating BPF
524 arena memory management for per-task data.
525
526 * ``scx_userland[.bpf].c``: A minimal scheduler demonstrating user space
527 scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order;
528 all others are scheduled in user space by a simple vruntime scheduler.
529
530Module Parameters
531=================
532
533sched_ext exposes two module parameters under the ``sched_ext.`` prefix that
534control bypass-mode behaviour. These knobs are primarily for debugging; there
535is usually no reason to change them during normal operation. They can be read
536and written at runtime (mode 0600) via
537``/sys/module/sched_ext/parameters/``.
538
539``sched_ext.slice_bypass_us`` (default: 5000 µs)
540 The time slice assigned to all tasks when the scheduler is in bypass mode,
541 i.e. during BPF scheduler load, unload, and error recovery. Valid range is
542 100 µs to 100 ms.
543
544``sched_ext.bypass_lb_intv_us`` (default: 500000 µs)
545 The interval at which the bypass-mode load balancer redistributes tasks
546 across CPUs. Set to 0 to disable load balancing during bypass mode. Valid
547 range is 0 to 10 s.
548
549ABI Instability
550===============
551
552The APIs provided by sched_ext to BPF schedulers programs have no stability
553guarantees. This includes the ops table callbacks and constants defined in
554``include/linux/sched/ext.h``, as well as the ``scx_bpf_`` kfuncs defined in
555``kernel/sched/ext.c`` and ``kernel/sched/ext_idle.c``.
556
557While we will attempt to provide a relatively stable API surface when
558possible, they are subject to change without warning between kernel
559versions.