Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

fwctl: Add documentation

Document the purpose and rules for the fwctl subsystem.

Link in kdocs to the doc tree.

Link: https://patch.msgid.link/r/6-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com
Nacked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240603114250.5325279c@kernel.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://lore.kernel.org/r/ZrHY2Bds7oF7KRGz@phenom.ffwll.local
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

+298
+284
Documentation/userspace-api/fwctl/fwctl.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =============== 4 + fwctl subsystem 5 + =============== 6 + 7 + :Author: Jason Gunthorpe 8 + 9 + Overview 10 + ======== 11 + 12 + Modern devices contain extensive amounts of FW, and in many cases, are largely 13 + software-defined pieces of hardware. The evolution of this approach is largely a 14 + reaction to Moore's Law where a chip tape out is now highly expensive, and the 15 + chip design is extremely large. Replacing fixed HW logic with a flexible and 16 + tightly coupled FW/HW combination is an effective risk mitigation against chip 17 + respin. Problems in the HW design can be counteracted in device FW. This is 18 + especially true for devices which present a stable and backwards compatible 19 + interface to the operating system driver (such as NVMe). 20 + 21 + The FW layer in devices has grown to incredible size and devices frequently 22 + integrate clusters of fast processors to run it. For example, mlx5 devices have 23 + over 30MB of FW code, and big configurations operate with over 1GB of FW managed 24 + runtime state. 25 + 26 + The availability of such a flexible layer has created quite a variety in the 27 + industry where single pieces of silicon are now configurable software-defined 28 + devices and can operate in substantially different ways depending on the need. 29 + Further, we often see cases where specific sites wish to operate devices in ways 30 + that are highly specialized and require applications that have been tailored to 31 + their unique configuration. 32 + 33 + Further, devices have become multi-functional and integrated to the point they 34 + no longer fit neatly into the kernel's division of subsystems. Modern 35 + multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many 36 + subsystems while sharing the underlying hardware using the auxiliary device 37 + system. 38 + 39 + All together this creates a challenge for the operating system, where devices 40 + have an expansive FW environment that needs robust device-specific debugging 41 + support, and FW-driven functionality that is not well suited to “generic” 42 + interfaces. fwctl seeks to allow access to the full device functionality from 43 + user space in the areas of debuggability, management, and first-boot/nth-boot 44 + provisioning. 45 + 46 + fwctl is aimed at the common device design pattern where the OS and FW 47 + communicate via an RPC message layer constructed with a queue or mailbox scheme. 48 + In this case the driver will typically have some layer to deliver RPC messages 49 + and collect RPC responses from device FW. The in-kernel subsystem drivers that 50 + operate the device for its primary purposes will use these RPCs to build their 51 + drivers, but devices also usually have a set of ancillary RPCs that don't really 52 + fit into any specific subsystem. For example, a HW RAID controller is primarily 53 + operated by the block layer but also comes with a set of RPCs to administer the 54 + construction of drives within the HW RAID. 55 + 56 + In the past when devices were more single function, individual subsystems would 57 + grow different approaches to solving some of these common problems. For instance 58 + monitoring device health, manipulating its FLASH, debugging the FW, 59 + provisioning, all have various unique interfaces across the kernel. 60 + 61 + fwctl's purpose is to define a common set of limited rules, described below, 62 + that allow user space to securely construct and execute RPCs inside device FW. 63 + The rules serve as an agreement between the operating system and FW on how to 64 + correctly design the RPC interface. As a uAPI the subsystem provides a thin 65 + layer of discovery and a generic uAPI to deliver the RPCs and collect the 66 + response. It supports a system of user space libraries and tools which will 67 + use this interface to control the device using the device native protocols. 68 + 69 + Scope of Action 70 + --------------- 71 + 72 + fwctl drivers are strictly restricted to being a way to operate the device FW. 73 + It is not an avenue to access random kernel internals, or other operating system 74 + SW states. 75 + 76 + fwctl instances must operate on a well-defined device function, and the device 77 + should have a well-defined security model for what scope within the physical 78 + device the function is permitted to access. For instance, the most complex PCIe 79 + device today may broadly have several function-level scopes: 80 + 81 + 1. A privileged function with full access to the on-device global state and 82 + configuration 83 + 84 + 2. Multiple hypervisor functions with control over itself and child functions 85 + used with VMs 86 + 87 + 3. Multiple VM functions tightly scoped within the VM 88 + 89 + The device may create a logical parent/child relationship between these scopes. 90 + For instance a child VM's FW may be within the scope of the hypervisor FW. It is 91 + quite common in the VFIO world that the hypervisor environment has a complex 92 + provisioning/profiling/configuration responsibility for the function VFIO 93 + assigns to the VM. 94 + 95 + Further, within the function, devices often have RPC commands that fall within 96 + some general scopes of action (see enum fwctl_rpc_scope): 97 + 98 + 1. Access to function & child configuration, FLASH, etc. that becomes live at a 99 + function reset. Access to function & child runtime configuration that is 100 + transparent or non-disruptive to any driver or VM. 101 + 102 + 2. Read-only access to function debug information that may report on FW objects 103 + in the function & child, including FW objects owned by other kernel 104 + subsystems. 105 + 106 + 3. Write access to function & child debug information strictly compatible with 107 + the principles of kernel lockdown and kernel integrity protection. Triggers 108 + a kernel Taint. 109 + 110 + 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO. 111 + 112 + User space will provide a scope label on each RPC and the kernel must enforce the 113 + above CAPs and taints based on that scope. A combination of kernel and FW can 114 + enforce that RPCs are placed in the correct scope by user space. 115 + 116 + Denied behavior 117 + --------------- 118 + 119 + There are many things this interface must not allow user space to do (without a 120 + Taint or CAP), broadly derived from the principles of kernel lockdown. Some 121 + examples: 122 + 123 + 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with 124 + untrusted code, or otherwise compromise device or system security and 125 + integrity. 126 + 127 + 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel 128 + objects owned by kernel drivers. 129 + 130 + 3. Directly configure or otherwise control kernel drivers. A subsystem kernel 131 + driver can react to the device configuration at function reset/driver load 132 + time, but otherwise must not be coupled to fwctl. 133 + 134 + 4. Operate the HW in a way that overlaps with the core purpose of another 135 + primary kernel subsystem, such as read/write to LBAs, send/receive of 136 + network packets, or operate an accelerator's data plane. 137 + 138 + fwctl is not a replacement for device direct access subsystems like uacce or 139 + VFIO. 140 + 141 + Operations exposed through fwctl's non-taining interfaces should be fully 142 + sharable with other users of the device. For instance exposing a RPC through 143 + fwctl should never prevent a kernel subsystem from also concurrently using that 144 + same RPC or hardware unit down the road. In such cases fwctl will be less 145 + important than proper kernel subsystems that eventually emerge. Mistakes in this 146 + area resulting in clashes will be resolved in favour of a kernel implementation. 147 + 148 + fwctl User API 149 + ============== 150 + 151 + .. kernel-doc:: include/uapi/fwctl/fwctl.h 152 + 153 + sysfs Class 154 + ----------- 155 + 156 + fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices 157 + (/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device 158 + operates the iotcl uAPI described above. 159 + 160 + fwctl devices can be related to driver components in other subsystems through 161 + sysfs:: 162 + 163 + $ ls /sys/class/fwctl/fwctl0/device/infiniband/ 164 + ibp0s10f0 165 + 166 + $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/ 167 + fwctl0/ 168 + 169 + $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0 170 + dev device power subsystem uevent 171 + 172 + User space Community 173 + -------------------- 174 + 175 + Drawing inspiration from nvme-cli, participating in the kernel side must come 176 + with a user space in a common TBD git tree, at a minimum to usefully operate the 177 + kernel driver. Providing such an implementation is a pre-condition to merging a 178 + kernel driver. 179 + 180 + The goal is to build user space community around some of the shared problems 181 + we all have, and ideally develop some common user space programs with some 182 + starting themes of: 183 + 184 + - Device in-field debugging 185 + 186 + - HW provisioning 187 + 188 + - VFIO child device profiling before VM boot 189 + 190 + - Confidential Compute topics (attestation, secure provisioning) 191 + 192 + that stretch across all subsystems in the kernel. fwupd is a great example of 193 + how an excellent user space experience can emerge out of kernel-side diversity. 194 + 195 + fwctl Kernel API 196 + ================ 197 + 198 + .. kernel-doc:: drivers/fwctl/main.c 199 + :export: 200 + .. kernel-doc:: include/linux/fwctl.h 201 + 202 + fwctl Driver design 203 + ------------------- 204 + 205 + In many cases a fwctl driver is going to be part of a larger cross-subsystem 206 + device possibly using the auxiliary_device mechanism. In that case several 207 + subsystems are going to be sharing the same device and FW interface layer so the 208 + device design must already provide for isolation and cooperation between kernel 209 + subsystems. fwctl should fit into that same model. 210 + 211 + Part of the driver should include a description of how its scope restrictions 212 + and security model work. The driver and FW together must ensure that RPCs 213 + provided by user space are mapped to the appropriate scope. If the validation is 214 + done in the driver then the validation can read a 'command effects' report from 215 + the device, or hardwire the enforcement. If the validation is done in the FW, 216 + then the driver should pass the fwctl_rpc_scope to the FW along with the command. 217 + 218 + The driver and FW must cooperate to ensure that either fwctl cannot allocate 219 + any FW resources, or any resources it does allocate are freed on FD closure. A 220 + driver primarily constructed around FW RPCs may find that its core PCI function 221 + and RPC layer belongs under fwctl with auxiliary devices connecting to other 222 + subsystems. 223 + 224 + Each device type must be mindful of Linux's philosophy for stable ABI. The FW 225 + RPC interface does not have to meet a strictly stable ABI, but it does need to 226 + meet an expectation that userspace tools that are deployed and in significant 227 + use don't needlessly break. FW upgrade and kernel upgrade should keep widely 228 + deployed tooling working. 229 + 230 + Development and debugging focused RPCs under more permissive scopes can have 231 + less stabilitiy if the tools using them are only run under exceptional 232 + circumstances and not for every day use of the device. Debugging tools may even 233 + require exact version matching as they may require something similar to DWARF 234 + debug information from the FW binary. 235 + 236 + Security Response 237 + ================= 238 + 239 + The kernel remains the gatekeeper for this interface. If violations of the 240 + scopes, security or isolation principles are found, we have options to let 241 + devices fix them with a FW update, push a kernel patch to parse and block RPC 242 + commands or push a kernel patch to block entire firmware versions/devices. 243 + 244 + While the kernel can always directly parse and restrict RPCs, it is expected 245 + that the existing kernel pattern of allowing drivers to delegate validation to 246 + FW to be a useful design. 247 + 248 + Existing Similar Examples 249 + ========================= 250 + 251 + The approach described in this document is not a new idea. Direct, or near 252 + direct device access has been offered by the kernel in different areas for 253 + decades. With more devices wanting to follow this design pattern it is becoming 254 + clear that it is not entirely well understood and, more importantly, the 255 + security considerations are not well defined or agreed upon. 256 + 257 + Some examples: 258 + 259 + - HW RAID controllers. This includes RPCs to do things like compose drives into 260 + a RAID volume, configure RAID parameters, monitor the HW and more. 261 + 262 + - Baseboard managers. RPCs for configuring settings in the device and more 263 + 264 + - NVMe vendor command capsules. nvme-cli provides access to some monitoring 265 + functions that different products have defined, but more exist. 266 + 267 + - CXL also has a NVMe-like vendor command system. 268 + 269 + - DRM allows user space drivers to send commands to the device via kernel 270 + mediation 271 + 272 + - RDMA allows user space drivers to directly push commands to the device 273 + without kernel involvement 274 + 275 + - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc. 276 + 277 + The first 4 are examples of areas that fwctl intends to cover. The latter three 278 + are examples of denied behavior as they fully overlap with the primary purpose 279 + of a kernel subsystem. 280 + 281 + Some key lessons learned from these past efforts are the importance of having a 282 + common user space project to use as a pre-condition for obtaining a kernel 283 + driver. Developing good community around useful software in user space is key to 284 + getting companies to fund participation to enable their products.
+12
Documentation/userspace-api/fwctl/index.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + Firmware Control (FWCTL) Userspace API 4 + ====================================== 5 + 6 + A framework that define a common set of limited rules that allows user space 7 + to securely construct and execute RPCs inside device firmware. 8 + 9 + .. toctree:: 10 + :maxdepth: 1 11 + 12 + fwctl
+1
Documentation/userspace-api/index.rst
··· 45 45 46 46 accelerators/ocxl 47 47 dma-buf-alloc-exchange 48 + fwctl/index 48 49 gpio/index 49 50 iommufd 50 51 media/index
+1
MAINTAINERS
··· 9563 9563 M: Saeed Mahameed <saeedm@nvidia.com> 9564 9564 R: Jonathan Cameron <Jonathan.Cameron@huawei.com> 9565 9565 S: Maintained 9566 + F: Documentation/userspace-api/fwctl/ 9566 9567 F: drivers/fwctl/ 9567 9568 F: include/linux/fwctl.h 9568 9569 F: include/uapi/fwctl/