powerpc/rtas: Facilitate high-level call sequences

On RTAS platforms there is a general restriction that the OS must not
enter RTAS on more than one CPU at a time. This low-level
serialization requirement is satisfied by holding a spin
lock (rtas_lock) across most RTAS function invocations.

However, some pseries RTAS functions require multiple successive calls
to complete a logical operation. Beginning a new call sequence for such a
function may disrupt any other sequences of that function already in
progress. Safe and reliable use of these functions effectively
requires higher-level serialization beyond what is already done at the
level of RTAS entry and exit.

Where a sequence-based RTAS function is invoked only through
sys_rtas(), with no in-kernel users, there is no issue as far as the
kernel is concerned. User space is responsible for appropriately
serializing its call sequences. (Whether user space code actually
takes measures to prevent sequence interleaving is another matter.)
Examples of such functions currently include ibm,platform-dump and
ibm,get-vpd.

But where a sequence-based RTAS function has both user space and
in-kernel uesrs, there is a hazard. Even if the in-kernel call sites
of such a function serialize their sequences correctly, a user of
sys_rtas() can invoke the same function at any time, potentially
disrupting a sequence in progress.

So in order to prevent disruption of kernel-based RTAS call sequences,
they must serialize not only with themselves but also with sys_rtas()
users, somehow. Preferably without adding more function-specific hacks
to sys_rtas(). This is a prerequisite for adding an in-kernel call
sequence of ibm,get-vpd, which is in a change to follow.

Note that it has never been feasible for the kernel to prevent
sys_rtas()-based sequences from being disrupted because control
returns to user space on every call. sys_rtas()-based users of these
functions have always been, and continue to be, responsible for
coordinating their call sequences with other users, even those which
may invoke the RTAS functions through less direct means than
sys_rtas(). This is an unavoidable consequence of exposing
sequence-based RTAS functions through sys_rtas().

* Add an optional mutex member to struct rtas_function.

* Statically define a mutex for each RTAS function with known call
sequence serialization requirements, and assign its address to the
.lock member of the corresponding function table entry, along with
justifying commentary.

* In sys_rtas(), if the table entry for the RTAS function being
called has a populated lock member, acquire it before taking
rtas_lock and entering RTAS.

* Kernel-based RTAS call sequences are expected to access the
appropriate mutex explicitly by name. For example, a user of the
ibm,activate-firmware RTAS function would do:

int token = rtas_function_token(RTAS_FN_IBM_ACTIVATE_FIRMWARE);
int fwrc;

mutex_lock(&rtas_ibm_activate_firmware_lock);

do {
fwrc = rtas_call(token, 0, 1, NULL);
} while (rtas_busy_delay(fwrc));

mutex_unlock(&rtas_ibm_activate_firmware_lock);

There should be no perceivable change introduced here except that
concurrent callers of the same RTAS function via sys_rtas() may block
on a mutex instead of spinning on rtas_lock.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-6-e9eafd0c8c6c@linux.ibm.com

authored by

Nathan Lynch and committed by

Michael Ellerman 2 years ago adf7a019 e7582edb

+86

2 changed files

expand all

arch

powerpc

include

asm

rtas.h

kernel

rtas.c

arch/powerpc/include/asm/rtas.h

··· 3 3 #define _POWERPC_RTAS_H 4 4 #ifdef __KERNEL__ 5 5 6 + #include <linux/mutex.h> 6 7 #include <linux/spinlock.h> 7 8 #include <asm/page.h> 8 9 #include <asm/rtas-types.h> ··· 512 511 513 512 /* RMO buffer reserved for user-space RTAS use */ 514 513 extern unsigned long rtas_rmo_buf; 514 + 515 + extern struct mutex rtas_ibm_get_vpd_lock; 515 516 516 517 #define GLOBAL_INTERRUPT_QUEUE 9005 517 518

+83

arch/powerpc/kernel/rtas.c

··· 18 18 #include <linux/kernel.h> 19 19 #include <linux/lockdep.h> 20 20 #include <linux/memblock.h> 21 + #include <linux/mutex.h> 21 22 #include <linux/of.h> 22 23 #include <linux/of_fdt.h> 23 24 #include <linux/reboot.h> ··· 71 70 * ppc64le, and we want to keep it that way. It does 72 71 * not make sense for this to be set when @filter 73 72 * is NULL. 73 + * @lock: Pointer to an optional dedicated per-function mutex. This 74 + * should be set for functions that require multiple calls in 75 + * sequence to complete a single operation, and such sequences 76 + * will disrupt each other if allowed to interleave. Users of 77 + * this function are required to hold the associated lock for 78 + * the duration of the call sequence. Add an explanatory 79 + * comment to the function table entry if setting this member. 74 80 */ 75 81 struct rtas_function { 76 82 s32 token; 77 83 const bool banned_for_syscall_on_le:1; 78 84 const char * const name; 79 85 const struct rtas_filter *filter; 86 + struct mutex *lock; 80 87 }; 88 + 89 + /* 90 + * Per-function locks for sequence-based RTAS functions. 91 + */ 92 + static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock); 93 + static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock); 94 + static DEFINE_MUTEX(rtas_ibm_get_indices_lock); 95 + static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock); 96 + static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock); 97 + static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock); 98 + DEFINE_MUTEX(rtas_ibm_get_vpd_lock); 81 99 82 100 static struct rtas_function rtas_function_table[] __ro_after_init = { 83 101 [RTAS_FNIDX__CHECK_EXCEPTION] = { ··· 145 125 .buf_idx1 = -1, .size_idx1 = -1, 146 126 .buf_idx2 = -1, .size_idx2 = -1, 147 127 }, 128 + /* 129 + * PAPR+ as of v2.13 doesn't explicitly impose any 130 + * restriction, but this typically requires multiple 131 + * calls before success, and there's no reason to 132 + * allow sequences to interleave. 133 + */ 134 + .lock = &rtas_ibm_activate_firmware_lock, 148 135 }, 149 136 [RTAS_FNIDX__IBM_CBE_START_PTCAL] = { 150 137 .name = "ibm,cbe-start-ptcal", ··· 223 196 .buf_idx1 = 1, .size_idx1 = -1, 224 197 .buf_idx2 = -1, .size_idx2 = -1, 225 198 }, 199 + /* 200 + * PAPR+ v2.13 R1–7.3.19–3 is explicit that the OS 201 + * must not call ibm,get-dynamic-sensor-state with 202 + * different inputs until a non-retry status has been 203 + * returned. 204 + */ 205 + .lock = &rtas_ibm_get_dynamic_sensor_state_lock, 226 206 }, 227 207 [RTAS_FNIDX__IBM_GET_INDICES] = { 228 208 .name = "ibm,get-indices", ··· 237 203 .buf_idx1 = 2, .size_idx1 = 3, 238 204 .buf_idx2 = -1, .size_idx2 = -1, 239 205 }, 206 + /* 207 + * PAPR+ v2.13 R1–7.3.17–2 says that the OS must not 208 + * interleave ibm,get-indices call sequences with 209 + * different inputs. 210 + */ 211 + .lock = &rtas_ibm_get_indices_lock, 240 212 }, 241 213 [RTAS_FNIDX__IBM_GET_RIO_TOPOLOGY] = { 242 214 .name = "ibm,get-rio-topology", ··· 260 220 .buf_idx1 = 0, .size_idx1 = -1, 261 221 .buf_idx2 = 1, .size_idx2 = 2, 262 222 }, 223 + /* 224 + * PAPR+ v2.13 R1–7.3.20–4 indicates that sequences 225 + * should not be allowed to interleave. 226 + */ 227 + .lock = &rtas_ibm_get_vpd_lock, 263 228 }, 264 229 [RTAS_FNIDX__IBM_GET_XIVE] = { 265 230 .name = "ibm,get-xive", ··· 284 239 .buf_idx1 = 2, .size_idx1 = 3, 285 240 .buf_idx2 = -1, .size_idx2 = -1, 286 241 }, 242 + /* 243 + * PAPR+ v2.13 R1–7.3.26–6 says the OS should allow 244 + * only one call sequence in progress at a time. 245 + */ 246 + .lock = &rtas_ibm_lpar_perftools_lock, 287 247 }, 288 248 [RTAS_FNIDX__IBM_MANAGE_FLASH_IMAGE] = { 289 249 .name = "ibm,manage-flash-image", ··· 327 277 .buf_idx1 = 0, .size_idx1 = 1, 328 278 .buf_idx2 = -1, .size_idx2 = -1, 329 279 }, 280 + /* 281 + * This follows a sequence-based pattern similar to 282 + * ibm,get-vpd et al. Since PAPR+ restricts 283 + * interleaving call sequences for other functions of 284 + * this style, assume the restriction applies here, 285 + * even though it's not explicit in the spec. 286 + */ 287 + .lock = &rtas_ibm_physical_attestation_lock, 330 288 }, 331 289 [RTAS_FNIDX__IBM_PLATFORM_DUMP] = { 332 290 .name = "ibm,platform-dump", ··· 342 284 .buf_idx1 = 4, .size_idx1 = 5, 343 285 .buf_idx2 = -1, .size_idx2 = -1, 344 286 }, 287 + /* 288 + * PAPR+ v2.13 7.3.3.4.1 indicates that concurrent 289 + * sequences of ibm,platform-dump are allowed if they 290 + * are operating on different dump tags. So leave the 291 + * lock pointer unset for now. This may need 292 + * reconsideration if kernel-internal users appear. 293 + */ 345 294 }, 346 295 [RTAS_FNIDX__IBM_POWER_OFF_UPS] = { 347 296 .name = "ibm,power-off-ups", ··· 391 326 .buf_idx1 = 2, .size_idx1 = -1, 392 327 .buf_idx2 = -1, .size_idx2 = -1, 393 328 }, 329 + /* 330 + * PAPR+ v2.13 R1–7.3.18–3 says the OS must not call 331 + * this function with different inputs until a 332 + * non-retry status has been returned. 333 + */ 334 + .lock = &rtas_ibm_set_dynamic_indicator_lock, 394 335 }, 395 336 [RTAS_FNIDX__IBM_SET_EEH_OPTION] = { 396 337 .name = "ibm,set-eeh-option", ··· 1959 1888 1960 1889 buff_copy = get_errorlog_buffer(); 1961 1890 1891 + /* 1892 + * If this function has a mutex assigned to it, we must 1893 + * acquire it to avoid interleaving with any kernel-based uses 1894 + * of the same function. Kernel-based sequences acquire the 1895 + * appropriate mutex explicitly. 1896 + */ 1897 + if (func->lock) 1898 + mutex_lock(func->lock); 1899 + 1962 1900 raw_spin_lock_irqsave(&rtas_lock, flags); 1963 1901 cookie = lockdep_pin_lock(&rtas_lock); 1964 1902 ··· 1982 1902 1983 1903 lockdep_unpin_lock(&rtas_lock, cookie); 1984 1904 raw_spin_unlock_irqrestore(&rtas_lock, flags); 1905 + 1906 + if (func->lock) 1907 + mutex_unlock(func->lock); 1985 1908 1986 1909 if (buff_copy) { 1987 1910 if (errbuf)

Configure Feed

Configure Feed