Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict

Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
the sibling's partition state becomes invalid. This is overly harsh and
is probably not necessary.

The cpuset.cpus.exclusive control file, if set, will override the
cpuset.cpus of the same cpuset when creating a cpuset partition.
So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
a partition. However, it cannot override a conflicting cpuset.cpus file
in a sibling cpuset and the partition creation process will fail. This
is inconsistent. That will also make using cpuset.cpus.exclusive less
valuable as a tool to set up cpuset partitions as the users have to
check if such a cpuset.cpus conflict exists or not.

Fix these problems by making sure that once a cpuset.cpus.exclusive
is set without failure, it will always be allowed to form a valid
partition as long as at least one CPU can be granted from its parent
irrespective of the state of the siblings' cpuset.cpus values. Of
course, setting cpuset.cpus.exclusive will fail if it conflicts with
the cpuset.cpus.exclusive or the cpuset.cpus.exclusive.effective value
of a sibling.

Partition can still be created by setting only cpuset.cpus without
setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's
cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will
be removed from its cpuset.cpus.exclusive.effective as long as there
is still one or more CPUs left and can be granted from its parent. This
CPU stripping is currently done in rm_siblings_excl_cpus().

The new code will now try its best to enable the creation of new
partitions with only cpuset.cpus set without invalidating existing ones.
However it is not guaranteed that all the CPUs requested in cpuset.cpus
will be used in the new partition even when all these CPUs can be
granted from the parent.

This is similar to the fact that cpuset.cpus.effective may not be
able to include all the CPUs requested in cpuset.cpus. In this case,
the parent may not able to grant all the exclusive CPUs requested in
cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have
already been granted to other partitions earlier.

With the creation of multiple sibling partitions by setting
only cpuset.cpus, this does have the side effect that their exact
cpuset.cpus.exclusive.effective settings will depend on the order of
partition creation if there are conflicts. Due to the exclusive nature
of the CPUs in a partition, it is not easy to make it fair other than
the old behavior of invalidating all the conflicting partitions.

For example,
# echo "0-2" > A1/cpuset.cpus
# echo "root" > A1/cpuset.cpus.partition
# cat A1/cpuset.cpus.partition
root
# cat A1/cpuset.cpus.exclusive.effective
0-2
# echo "2-4" > B1/cpuset.cpus
# echo "root" > B1/cpuset.cpus.partition
# cat B1/cpuset.cpus.partition
root
# cat B1/cpuset.cpus.exclusive.effective
3-4
# cat B1/cpuset.cpus.effective
3-4

For users who want to be sure that they can get most of the CPUs they
want, cpuset.cpus.exclusive should be used instead if they can set
it successfully without failure. Setting cpuset.cpus.exclusive will
guarantee that sibling conflicts from then onward is no longer possible.

To make this change, we have to separate out the is_cpu_exclusive()
check in cpus_excl_conflict() into a cgroup v1 only
cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change()
helper is now no longer needed and can be removed.

Some existing tests in test_cpuset_prs.sh are updated and new ones are
added to reflect the new behavior. The cgroup-v2.rst doc file is also
updated the clarify what exclusive CPUs will be used when a partition
is created.

Reported-by: Sun Shaojie <sunshaojie@kylinos.cn>
Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

authored by

Waiman Long and committed by
Tejun Heo
2a360203 6e6f13f6

+90 -71
+23 -10
Documentation/admin-guide/cgroup-v2.rst
··· 2584 2584 of this file will always be a subset of its parent's 2585 2585 "cpuset.cpus.exclusive.effective" if its parent is not the root 2586 2586 cgroup. It will also be a subset of "cpuset.cpus.exclusive" 2587 - if it is set. If "cpuset.cpus.exclusive" is not set, it is 2588 - treated to have an implicit value of "cpuset.cpus" in the 2589 - formation of local partition. 2587 + if it is set. This file should only be non-empty if either 2588 + "cpuset.cpus.exclusive" is set or when the current cpuset is 2589 + a valid partition root. 2590 2590 2591 2591 cpuset.cpus.isolated 2592 2592 A read-only and root cgroup only multiple values file. ··· 2618 2618 There are two types of partitions - local and remote. A local 2619 2619 partition is one whose parent cgroup is also a valid partition 2620 2620 root. A remote partition is one whose parent cgroup is not a 2621 - valid partition root itself. Writing to "cpuset.cpus.exclusive" 2622 - is optional for the creation of a local partition as its 2623 - "cpuset.cpus.exclusive" file will assume an implicit value that 2624 - is the same as "cpuset.cpus" if it is not set. Writing the 2625 - proper "cpuset.cpus.exclusive" values down the cgroup hierarchy 2626 - before the target partition root is mandatory for the creation 2627 - of a remote partition. 2621 + valid partition root itself. 2622 + 2623 + Writing to "cpuset.cpus.exclusive" is optional for the creation 2624 + of a local partition as its "cpuset.cpus.exclusive" file will 2625 + assume an implicit value that is the same as "cpuset.cpus" if it 2626 + is not set. Writing the proper "cpuset.cpus.exclusive" values 2627 + down the cgroup hierarchy before the target partition root is 2628 + mandatory for the creation of a remote partition. 2629 + 2630 + Not all the CPUs requested in "cpuset.cpus.exclusive" can be 2631 + used to form a new partition. Only those that were present 2632 + in its parent's "cpuset.cpus.exclusive.effective" control 2633 + file can be used. For partitions created without setting 2634 + "cpuset.cpus.exclusive", exclusive CPUs specified in sibling's 2635 + "cpuset.cpus.exclusive" or "cpuset.cpus.exclusive.effective" 2636 + also cannot be used. 2628 2637 2629 2638 Currently, a remote partition cannot be created under a local 2630 2639 partition. All the ancestors of a remote partition root except ··· 2641 2632 2642 2633 The root cgroup is always a partition root and its state cannot 2643 2634 be changed. All other non-root cgroups start out as "member". 2635 + Even though the "cpuset.cpus.exclusive*" and "cpuset.cpus" 2636 + control files are not present in the root cgroup, they are 2637 + implicitly the same as the "/sys/devices/system/cpu/possible" 2638 + sysfs file. 2644 2639 2645 2640 When set to "root", the current cgroup is the root of a new 2646 2641 partition or scheduling domain. The set of exclusive CPUs is
+3
kernel/cgroup/cpuset-internal.h
··· 312 312 struct cpumask *new_cpus, nodemask_t *new_mems, 313 313 bool cpus_updated, bool mems_updated); 314 314 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); 315 + bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); 315 316 void cpuset1_init(struct cpuset *cs); 316 317 void cpuset1_online_css(struct cgroup_subsys_state *css); 317 318 int cpuset1_generate_sched_domains(cpumask_var_t **domains, ··· 327 326 bool cpus_updated, bool mems_updated) {} 328 327 static inline int cpuset1_validate_change(struct cpuset *cur, 329 328 struct cpuset *trial) { return 0; } 329 + static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, 330 + struct cpuset *cs2) { return false; } 330 331 static inline void cpuset1_init(struct cpuset *cs) {} 331 332 static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {} 332 333 static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
+19
kernel/cgroup/cpuset-v1.c
··· 373 373 return ret; 374 374 } 375 375 376 + /* 377 + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts 378 + * to legacy (v1) 379 + * @cs1: first cpuset to check 380 + * @cs2: second cpuset to check 381 + * 382 + * Returns: true if CPU exclusivity conflict exists, false otherwise 383 + * 384 + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect. 385 + */ 386 + bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) 387 + { 388 + if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) 389 + return cpumask_intersects(cs1->cpus_allowed, 390 + cs2->cpus_allowed); 391 + 392 + return false; 393 + } 394 + 376 395 #ifdef CONFIG_PROC_PID_CPUSET 377 396 /* 378 397 * proc_cpuset_show()
+26 -54
kernel/cgroup/cpuset.c
··· 129 129 * For simplicity, a local partition can be created under a local or remote 130 130 * partition but a remote partition cannot have any partition root in its 131 131 * ancestor chain except the cgroup root. 132 + * 133 + * A valid partition can be formed by setting exclusive_cpus or cpus_allowed 134 + * if exclusive_cpus is not set. In the case of partition with empty 135 + * exclusive_cpus, all the conflicting exclusive CPUs specified in the 136 + * following cpumasks of sibling cpusets will be removed from its 137 + * cpus_allowed in determining its effective_xcpus. 138 + * - effective_xcpus 139 + * - exclusive_cpus 140 + * 141 + * The "cpuset.cpus.exclusive" control file should be used for setting up 142 + * partition if the users want to get as many CPUs as possible. 132 143 */ 133 144 #define PRS_MEMBER 0 134 145 #define PRS_ROOT 1 ··· 627 616 * Returns: true if CPU exclusivity conflict exists, false otherwise 628 617 * 629 618 * Conflict detection rules: 630 - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive 631 - * 2. exclusive_cpus masks cannot intersect between cpusets 632 - * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs 619 + * o cgroup v1 620 + * See cpuset1_cpus_excl_conflict() 621 + * o cgroup v2 622 + * - The exclusive_cpus values cannot overlap. 623 + * - New exclusive_cpus cannot be a superset of a sibling's cpus_allowed. 633 624 */ 634 625 static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling, 635 626 bool xcpus_changed) 636 627 { 637 - /* If either cpuset is exclusive, check if they are mutually exclusive */ 638 - if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling)) 639 - return !cpusets_are_exclusive(trial, sibling); 640 - 641 - /* Exclusive_cpus cannot intersect */ 642 - if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus)) 643 - return true; 628 + if (!cpuset_v2()) 629 + return cpuset1_cpus_excl_conflict(trial, sibling); 644 630 645 631 /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */ 646 632 if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) && 647 633 cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus)) 648 634 return true; 649 635 650 - return false; 636 + /* Exclusive_cpus cannot intersect */ 637 + return cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus); 651 638 } 652 639 653 640 static inline bool mems_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) ··· 2321 2312 return PERR_NONE; 2322 2313 } 2323 2314 2324 - static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialcs, 2325 - struct tmpmasks *tmp) 2326 - { 2327 - int retval; 2328 - struct cpuset *parent = parent_cs(cs); 2329 - 2330 - retval = validate_change(cs, trialcs); 2331 - 2332 - if ((retval == -EINVAL) && cpuset_v2()) { 2333 - struct cgroup_subsys_state *css; 2334 - struct cpuset *cp; 2335 - 2336 - /* 2337 - * The -EINVAL error code indicates that partition sibling 2338 - * CPU exclusivity rule has been violated. We still allow 2339 - * the cpumask change to proceed while invalidating the 2340 - * partition. However, any conflicting sibling partitions 2341 - * have to be marked as invalid too. 2342 - */ 2343 - trialcs->prs_err = PERR_NOTEXCL; 2344 - rcu_read_lock(); 2345 - cpuset_for_each_child(cp, css, parent) { 2346 - struct cpumask *xcpus = user_xcpus(trialcs); 2347 - 2348 - if (is_partition_valid(cp) && 2349 - cpumask_intersects(xcpus, cp->effective_xcpus)) { 2350 - rcu_read_unlock(); 2351 - update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp); 2352 - rcu_read_lock(); 2353 - } 2354 - } 2355 - rcu_read_unlock(); 2356 - retval = 0; 2357 - } 2358 - return retval; 2359 - } 2360 - 2361 2315 /** 2362 2316 * partition_cpus_change - Handle partition state changes due to CPU mask updates 2363 2317 * @cs: The target cpuset being modified ··· 2380 2408 if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed)) 2381 2409 return 0; 2382 2410 2383 - if (alloc_tmpmasks(&tmp)) 2384 - return -ENOMEM; 2385 - 2386 2411 compute_trialcs_excpus(trialcs, cs); 2387 2412 trialcs->prs_err = PERR_NONE; 2388 2413 2389 - retval = cpus_allowed_validate_change(cs, trialcs, &tmp); 2414 + retval = validate_change(cs, trialcs); 2390 2415 if (retval < 0) 2391 - goto out_free; 2416 + return retval; 2417 + 2418 + if (alloc_tmpmasks(&tmp)) 2419 + return -ENOMEM; 2392 2420 2393 2421 /* 2394 2422 * Check all the descendants in update_cpumasks_hier() if ··· 2411 2439 /* Update CS_SCHED_LOAD_BALANCE and/or sched_domains, if necessary */ 2412 2440 if (cs->partition_root_state) 2413 2441 update_partition_sd_lb(cs, old_prs); 2414 - out_free: 2442 + 2415 2443 free_tmpmasks(&tmp); 2416 2444 return retval; 2417 2445 }
+19 -7
tools/testing/selftests/cgroup/test_cpuset_prs.sh
··· 269 269 " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3" 270 270 " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" 271 271 " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3" 272 - " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2" 272 + " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2" 273 273 " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5" 274 274 " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" 275 275 " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4" ··· 318 318 # Invalid to valid local partition direct transition tests 319 319 " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3" 320 320 " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3" 321 - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0" 321 + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0" 322 322 " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3" 323 323 324 324 # Local partition invalidation tests ··· 388 388 " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2" 389 389 " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1" 390 390 391 - # A non-exclusive cpuset.cpus change will invalidate partition and its siblings 392 - " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0" 393 - " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1" 394 - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1" 391 + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition. 392 + " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0" 393 + " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|XA1:0-1|B1:2-3 A1:P1|B1:P1" 394 + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1" 395 395 396 396 # cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it 397 397 " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5" ··· 417 417 " CX1-4:S+ CX2-4:P2 . C5-6 . . . P1:C3-6 0 A1:1|A2:2-4|B1:5-6 \ 418 418 A1:P0|A2:P2:B1:P-1 2-4" 419 419 420 + # When multiple partitions with conflicting cpuset.cpus are created, the 421 + # latter created ones will only get what are left of the available exclusive 422 + # CPUs. 423 + " C1-3:P1 . . . . . . C3-5:P1 0 A1:1-3|B1:4-5:XB1:4-5 A1:P1|B1:P1" 424 + 425 + # cpuset.cpus can be set to a subset of sibling's cpuset.cpus.exclusive 426 + " C1-3:X1-3 . . C4-5 . . . C1-2 0 A1:1-3|B1:1-2" 427 + 420 428 # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS 421 429 # ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ -------- 422 430 # Failure cases: ··· 435 427 # Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejected 436 428 " C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3|B1:4-5" 437 429 438 - # cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive 430 + # cpuset.cpus.exclusive cannot be set to a superset of sibling's cpuset.cpus 439 431 " C0-3 . . C4-5 X3-5 . . . 1 A1:0-3|B1:4-5" 440 432 ) 441 433 ··· 485 477 . . X1-2:P2 X4-5:P1 . X1-7:P2 p1:3|c11:1-2|c12:4:c22:5-6 \ 486 478 p1:P0|p2:P1|c11:P2|c12:P1|c22:P2 \ 487 479 1-2,4-6|1-2,5-6" 480 + # c12 whose cpuset.cpus CPUs are all granted to c11 will become invalid partition 481 + " C1-5:P1:S+ . C1-4:P1 C2-3 . . \ 482 + . . . P1 . . p1:5|c11:1-4|c12:5 \ 483 + p1:P1|c11:P1|c12:P-1" 488 484 ) 489 485 490 486 #