Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

cgroup/cpuset: Clarify exclusion rules for cpuset internal variables

Clarify the locking rules associated with file level internal variables
inside the cpuset code. There is no functional change.

Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

authored by

Waiman Long and committed by
Tejun Heo
17b18600 68230aac

+61 -44
+61 -44
kernel/cgroup/cpuset.c
··· 62 62 }; 63 63 64 64 /* 65 + * CPUSET Locking Convention 66 + * ------------------------- 67 + * 68 + * Below are the three global locks guarding cpuset structures in lock 69 + * acquisition order: 70 + * - cpu_hotplug_lock (cpus_read_lock/cpus_write_lock) 71 + * - cpuset_mutex 72 + * - callback_lock (raw spinlock) 73 + * 74 + * A task must hold all the three locks to modify externally visible or 75 + * used fields of cpusets, though some of the internally used cpuset fields 76 + * and internal variables can be modified without holding callback_lock. If only 77 + * reliable read access of the externally used fields are needed, a task can 78 + * hold either cpuset_mutex or callback_lock which are exposed to other 79 + * external subsystems. 80 + * 81 + * If a task holds cpu_hotplug_lock and cpuset_mutex, it blocks others, 82 + * ensuring that it is the only task able to also acquire callback_lock and 83 + * be able to modify cpusets. It can perform various checks on the cpuset 84 + * structure first, knowing nothing will change. It can also allocate memory 85 + * without holding callback_lock. While it is performing these checks, various 86 + * callback routines can briefly acquire callback_lock to query cpusets. Once 87 + * it is ready to make the changes, it takes callback_lock, blocking everyone 88 + * else. 89 + * 90 + * Calls to the kernel memory allocator cannot be made while holding 91 + * callback_lock which is a spinlock, as the memory allocator may sleep or 92 + * call back into cpuset code and acquire callback_lock. 93 + * 94 + * Now, the task_struct fields mems_allowed and mempolicy may be changed 95 + * by other task, we use alloc_lock in the task_struct fields to protect 96 + * them. 97 + * 98 + * The cpuset_common_seq_show() handlers only hold callback_lock across 99 + * small pieces of code, such as when reading out possibly multi-word 100 + * cpumasks and nodemasks. 101 + */ 102 + 103 + static DEFINE_MUTEX(cpuset_mutex); 104 + 105 + /* 106 + * File level internal variables below follow one of the following exclusion 107 + * rules. 108 + * 109 + * RWCS: Read/write-able by holding either cpus_write_lock (and optionally 110 + * cpuset_mutex) or both cpus_read_lock and cpuset_mutex. 111 + * 112 + * CSCB: Readable by holding either cpuset_mutex or callback_lock. Writable 113 + * by holding both cpuset_mutex and callback_lock. 114 + */ 115 + 116 + /* 65 117 * For local partitions, update to subpartitions_cpus & isolated_cpus is done 66 118 * in update_parent_effective_cpumask(). For remote partitions, it is done in 67 119 * the remote_partition_*() and remote_cpus_update() helpers. ··· 122 70 * Exclusive CPUs distributed out to local or remote sub-partitions of 123 71 * top_cpuset 124 72 */ 125 - static cpumask_var_t subpartitions_cpus; 73 + static cpumask_var_t subpartitions_cpus; /* RWCS */ 126 74 127 75 /* 128 - * Exclusive CPUs in isolated partitions 76 + * Exclusive CPUs in isolated partitions (shown in cpuset.cpus.isolated) 129 77 */ 130 - static cpumask_var_t isolated_cpus; 78 + static cpumask_var_t isolated_cpus; /* CSCB */ 131 79 132 80 /* 133 - * isolated_cpus updating flag (protected by cpuset_mutex) 134 - * Set if isolated_cpus is going to be updated in the current 135 - * cpuset_mutex crtical section. 81 + * Set if isolated_cpus is being updated in the current cpuset_mutex 82 + * critical section. 136 83 */ 137 - static bool isolated_cpus_updating; 84 + static bool isolated_cpus_updating; /* RWCS */ 138 85 139 86 /* 140 87 * A flag to force sched domain rebuild at the end of an operation. ··· 149 98 * Note that update_relax_domain_level() in cpuset-v1.c can still call 150 99 * rebuild_sched_domains_locked() directly without using this flag. 151 100 */ 152 - static bool force_sd_rebuild; 101 + static bool force_sd_rebuild; /* RWCS */ 153 102 154 103 /* 155 104 * Partition root states: ··· 268 217 BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE), 269 218 .partition_root_state = PRS_ROOT, 270 219 }; 271 - 272 - /* 273 - * There are two global locks guarding cpuset structures - cpuset_mutex and 274 - * callback_lock. The cpuset code uses only cpuset_mutex. Other kernel 275 - * subsystems can use cpuset_lock()/cpuset_unlock() to prevent change to cpuset 276 - * structures. Note that cpuset_mutex needs to be a mutex as it is used in 277 - * paths that rely on priority inheritance (e.g. scheduler - on RT) for 278 - * correctness. 279 - * 280 - * A task must hold both locks to modify cpusets. If a task holds 281 - * cpuset_mutex, it blocks others, ensuring that it is the only task able to 282 - * also acquire callback_lock and be able to modify cpusets. It can perform 283 - * various checks on the cpuset structure first, knowing nothing will change. 284 - * It can also allocate memory while just holding cpuset_mutex. While it is 285 - * performing these checks, various callback routines can briefly acquire 286 - * callback_lock to query cpusets. Once it is ready to make the changes, it 287 - * takes callback_lock, blocking everyone else. 288 - * 289 - * Calls to the kernel memory allocator can not be made while holding 290 - * callback_lock, as that would risk double tripping on callback_lock 291 - * from one of the callbacks into the cpuset code from within 292 - * __alloc_pages(). 293 - * 294 - * If a task is only holding callback_lock, then it has read-only 295 - * access to cpusets. 296 - * 297 - * Now, the task_struct fields mems_allowed and mempolicy may be changed 298 - * by other task, we use alloc_lock in the task_struct fields to protect 299 - * them. 300 - * 301 - * The cpuset_common_seq_show() handlers only hold callback_lock across 302 - * small pieces of code, such as when reading out possibly multi-word 303 - * cpumasks and nodemasks. 304 - */ 305 - 306 - static DEFINE_MUTEX(cpuset_mutex); 307 220 308 221 /** 309 222 * cpuset_lock - Acquire the global cpuset mutex ··· 1178 1163 static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus) 1179 1164 { 1180 1165 WARN_ON_ONCE(old_prs == new_prs); 1166 + lockdep_assert_held(&callback_lock); 1167 + lockdep_assert_held(&cpuset_mutex); 1181 1168 if (new_prs == PRS_ISOLATED) 1182 1169 cpumask_or(isolated_cpus, isolated_cpus, xcpus); 1183 1170 else