Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'block-6.0-2022-09-02' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- NVMe pull request via Christoph:
- error handling fix for the new auth code (Hannes Reinecke)
- fix unhandled tcp states in nvmet_tcp_state_change (Maurizio
Lombardi)
- add NVME_QUIRK_BOGUS_NID for Lexar NM610 (Shyamin Ayesh)

- Add documentation for the ublk driver merged in this merge window
(Ming)

* tag 'block-6.0-2022-09-02' of git://git.kernel.dk/linux-block:
Documentation: document ublk
nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change()
nvmet-auth: add missing goto in nvmet_setup_auth()
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM610

+261
+1
Documentation/block/index.rst
··· 23 23 stat 24 24 switching-sched 25 25 writeback_cache_control 26 + ublk
+253
Documentation/block/ublk.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =========================================== 4 + Userspace block device driver (ublk driver) 5 + =========================================== 6 + 7 + Overview 8 + ======== 9 + 10 + ublk is a generic framework for implementing block device logic from userspace. 11 + The motivation behind it is that moving virtual block drivers into userspace, 12 + such as loop, nbd and similar can be very helpful. It can help to implement 13 + new virtual block device such as ublk-qcow2 (there are several attempts of 14 + implementing qcow2 driver in kernel). 15 + 16 + Userspace block devices are attractive because: 17 + 18 + - They can be written many programming languages. 19 + - They can use libraries that are not available in the kernel. 20 + - They can be debugged with tools familiar to application developers. 21 + - Crashes do not kernel panic the machine. 22 + - Bugs are likely to have a lower security impact than bugs in kernel 23 + code. 24 + - They can be installed and updated independently of the kernel. 25 + - They can be used to simulate block device easily with user specified 26 + parameters/setting for test/debug purpose 27 + 28 + ublk block device (``/dev/ublkb*``) is added by ublk driver. Any IO request 29 + on the device will be forwarded to ublk userspace program. For convenience, 30 + in this document, ``ublk server`` refers to generic ublk userspace 31 + program. ``ublksrv`` [#userspace]_ is one of such implementation. It 32 + provides ``libublksrv`` [#userspace_lib]_ library for developing specific 33 + user block device conveniently, while also generic type block device is 34 + included, such as loop and null. Richard W.M. Jones wrote userspace nbd device 35 + ``nbdublk`` [#userspace_nbdublk]_ based on ``libublksrv`` [#userspace_lib]_. 36 + 37 + After the IO is handled by userspace, the result is committed back to the 38 + driver, thus completing the request cycle. This way, any specific IO handling 39 + logic is totally done by userspace, such as loop's IO handling, NBD's IO 40 + communication, or qcow2's IO mapping. 41 + 42 + ``/dev/ublkb*`` is driven by blk-mq request-based driver. Each request is 43 + assigned by one queue wide unique tag. ublk server assigns unique tag to each 44 + IO too, which is 1:1 mapped with IO of ``/dev/ublkb*``. 45 + 46 + Both the IO request forward and IO handling result committing are done via 47 + ``io_uring`` passthrough command; that is why ublk is also one io_uring based 48 + block driver. It has been observed that using io_uring passthrough command can 49 + give better IOPS than block IO; which is why ublk is one of high performance 50 + implementation of userspace block device: not only IO request communication is 51 + done by io_uring, but also the preferred IO handling in ublk server is io_uring 52 + based approach too. 53 + 54 + ublk provides control interface to set/get ublk block device parameters. 55 + The interface is extendable and kabi compatible: basically any ublk request 56 + queue's parameter or ublk generic feature parameters can be set/get via the 57 + interface. Thus, ublk is generic userspace block device framework. 58 + For example, it is easy to setup a ublk device with specified block 59 + parameters from userspace. 60 + 61 + Using ublk 62 + ========== 63 + 64 + ublk requires userspace ublk server to handle real block device logic. 65 + 66 + Below is example of using ``ublksrv`` to provide ublk-based loop device. 67 + 68 + - add a device:: 69 + 70 + ublk add -t loop -f ublk-loop.img 71 + 72 + - format with xfs, then use it:: 73 + 74 + mkfs.xfs /dev/ublkb0 75 + mount /dev/ublkb0 /mnt 76 + # do anything. all IOs are handled by io_uring 77 + ... 78 + umount /mnt 79 + 80 + - list the devices with their info:: 81 + 82 + ublk list 83 + 84 + - delete the device:: 85 + 86 + ublk del -a 87 + ublk del -n $ublk_dev_id 88 + 89 + See usage details in README of ``ublksrv`` [#userspace_readme]_. 90 + 91 + Design 92 + ====== 93 + 94 + Control plane 95 + ------------- 96 + 97 + ublk driver provides global misc device node (``/dev/ublk-control``) for 98 + managing and controlling ublk devices with help of several control commands: 99 + 100 + - ``UBLK_CMD_ADD_DEV`` 101 + 102 + Add a ublk char device (``/dev/ublkc*``) which is talked with ublk server 103 + WRT IO command communication. Basic device info is sent together with this 104 + command. It sets UAPI structure of ``ublksrv_ctrl_dev_info``, 105 + such as ``nr_hw_queues``, ``queue_depth``, and max IO request buffer size, 106 + for which the info is negotiated with the driver and sent back to the server. 107 + When this command is completed, the basic device info is immutable. 108 + 109 + - ``UBLK_CMD_SET_PARAMS`` / ``UBLK_CMD_GET_PARAMS`` 110 + 111 + Set or get parameters of the device, which can be either generic feature 112 + related, or request queue limit related, but can't be IO logic specific, 113 + because the driver does not handle any IO logic. This command has to be 114 + sent before sending ``UBLK_CMD_START_DEV``. 115 + 116 + - ``UBLK_CMD_START_DEV`` 117 + 118 + After the server prepares userspace resources (such as creating per-queue 119 + pthread & io_uring for handling ublk IO), this command is sent to the 120 + driver for allocating & exposing ``/dev/ublkb*``. Parameters set via 121 + ``UBLK_CMD_SET_PARAMS`` are applied for creating the device. 122 + 123 + - ``UBLK_CMD_STOP_DEV`` 124 + 125 + Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns, 126 + ublk server will release resources (such as destroying per-queue pthread & 127 + io_uring). 128 + 129 + - ``UBLK_CMD_DEL_DEV`` 130 + 131 + Remove ``/dev/ublkc*``. When this command returns, the allocated ublk device 132 + number can be reused. 133 + 134 + - ``UBLK_CMD_GET_QUEUE_AFFINITY`` 135 + 136 + When ``/dev/ublkc`` is added, the driver creates block layer tagset, so 137 + that each queue's affinity info is available. The server sends 138 + ``UBLK_CMD_GET_QUEUE_AFFINITY`` to retrieve queue affinity info. It can 139 + set up the per-queue context efficiently, such as bind affine CPUs with IO 140 + pthread and try to allocate buffers in IO thread context. 141 + 142 + - ``UBLK_CMD_GET_DEV_INFO`` 143 + 144 + For retrieving device info via ``ublksrv_ctrl_dev_info``. It is the server's 145 + responsibility to save IO target specific info in userspace. 146 + 147 + Data plane 148 + ---------- 149 + 150 + ublk server needs to create per-queue IO pthread & io_uring for handling IO 151 + commands via io_uring passthrough. The per-queue IO pthread 152 + focuses on IO handling and shouldn't handle any control & management 153 + tasks. 154 + 155 + The's IO is assigned by a unique tag, which is 1:1 mapping with IO 156 + request of ``/dev/ublkb*``. 157 + 158 + UAPI structure of ``ublksrv_io_desc`` is defined for describing each IO from 159 + the driver. A fixed mmaped area (array) on ``/dev/ublkc*`` is provided for 160 + exporting IO info to the server; such as IO offset, length, OP/flags and 161 + buffer address. Each ``ublksrv_io_desc`` instance can be indexed via queue id 162 + and IO tag directly. 163 + 164 + The following IO commands are communicated via io_uring passthrough command, 165 + and each command is only for forwarding the IO and committing the result 166 + with specified IO tag in the command data: 167 + 168 + - ``UBLK_IO_FETCH_REQ`` 169 + 170 + Sent from the server IO pthread for fetching future incoming IO requests 171 + destined to ``/dev/ublkb*``. This command is sent only once from the server 172 + IO pthread for ublk driver to setup IO forward environment. 173 + 174 + - ``UBLK_IO_COMMIT_AND_FETCH_REQ`` 175 + 176 + When an IO request is destined to ``/dev/ublkb*``, the driver stores 177 + the IO's ``ublksrv_io_desc`` to the specified mapped area; then the 178 + previous received IO command of this IO tag (either ``UBLK_IO_FETCH_REQ`` 179 + or ``UBLK_IO_COMMIT_AND_FETCH_REQ)`` is completed, so the server gets 180 + the IO notification via io_uring. 181 + 182 + After the server handles the IO, its result is committed back to the 183 + driver by sending ``UBLK_IO_COMMIT_AND_FETCH_REQ`` back. Once ublkdrv 184 + received this command, it parses the result and complete the request to 185 + ``/dev/ublkb*``. In the meantime setup environment for fetching future 186 + requests with the same IO tag. That is, ``UBLK_IO_COMMIT_AND_FETCH_REQ`` 187 + is reused for both fetching request and committing back IO result. 188 + 189 + - ``UBLK_IO_NEED_GET_DATA`` 190 + 191 + With ``UBLK_F_NEED_GET_DATA`` enabled, the WRITE request will be firstly 192 + issued to ublk server without data copy. Then, IO backend of ublk server 193 + receives the request and it can allocate data buffer and embed its addr 194 + inside this new io command. After the kernel driver gets the command, 195 + data copy is done from request pages to this backend's buffer. Finally, 196 + backend receives the request again with data to be written and it can 197 + truly handle the request. 198 + 199 + ``UBLK_IO_NEED_GET_DATA`` adds one additional round-trip and one 200 + io_uring_enter() syscall. Any user thinks that it may lower performance 201 + should not enable UBLK_F_NEED_GET_DATA. ublk server pre-allocates IO 202 + buffer for each IO by default. Any new project should try to use this 203 + buffer to communicate with ublk driver. However, existing project may 204 + break or not able to consume the new buffer interface; that's why this 205 + command is added for backwards compatibility so that existing projects 206 + can still consume existing buffers. 207 + 208 + - data copy between ublk server IO buffer and ublk block IO request 209 + 210 + The driver needs to copy the block IO request pages into the server buffer 211 + (pages) first for WRITE before notifying the server of the coming IO, so 212 + that the server can handle WRITE request. 213 + 214 + When the server handles READ request and sends 215 + ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy 216 + the server buffer (pages) read to the IO request pages. 217 + 218 + Future development 219 + ================== 220 + 221 + Container-aware ublk deivice 222 + ---------------------------- 223 + 224 + ublk driver doesn't handle any IO logic. Its function is well defined 225 + for now and very limited userspace interfaces are needed, which is also 226 + well defined too. It is possible to make ublk devices container-aware block 227 + devices in future as Stefan Hajnoczi suggested [#stefan]_, by removing 228 + ADMIN privilege. 229 + 230 + Zero copy 231 + --------- 232 + 233 + Zero copy is a generic requirement for nbd, fuse or similar drivers. A 234 + problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace 235 + can't be remapped any more in kernel with existing mm interfaces. This can 236 + occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that 237 + big requests (IO size >= 256 KB) may benefit a lot from zero copy. 238 + 239 + 240 + References 241 + ========== 242 + 243 + .. [#userspace] https://github.com/ming1/ubdsrv 244 + 245 + .. [#userspace_lib] https://github.com/ming1/ubdsrv/tree/master/lib 246 + 247 + .. [#userspace_nbdublk] https://gitlab.com/rwmjones/libnbd/-/tree/nbdublk 248 + 249 + .. [#userspace_readme] https://github.com/ming1/ubdsrv/blob/master/README 250 + 251 + .. [#stefan] https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/ 252 + 253 + .. [#xiaoguang] https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/
+1
MAINTAINERS
··· 20764 20764 M: Ming Lei <ming.lei@redhat.com> 20765 20765 L: linux-block@vger.kernel.org 20766 20766 S: Maintained 20767 + F: Documentation/block/ublk.rst 20767 20768 F: drivers/block/ublk_drv.c 20768 20769 F: include/uapi/linux/ublk_cmd.h 20769 20770
+2
drivers/nvme/host/pci.c
··· 3517 3517 .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, 3518 3518 { PCI_DEVICE(0xc0a9, 0x540a), /* Crucial P2 */ 3519 3519 .driver_data = NVME_QUIRK_BOGUS_NID, }, 3520 + { PCI_DEVICE(0x1d97, 0x2263), /* Lexar NM610 */ 3521 + .driver_data = NVME_QUIRK_BOGUS_NID, }, 3520 3522 { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0061), 3521 3523 .driver_data = NVME_QUIRK_DMA_ADDRESS_BITS_48, }, 3522 3524 { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0065),
+1
drivers/nvme/target/auth.c
··· 196 196 if (IS_ERR(ctrl->ctrl_key)) { 197 197 ret = PTR_ERR(ctrl->ctrl_key); 198 198 ctrl->ctrl_key = NULL; 199 + goto out_free_hash; 199 200 } 200 201 pr_debug("%s: using ctrl hash %s key %*ph\n", __func__, 201 202 ctrl->ctrl_key->hash > 0 ?
+3
drivers/nvme/target/tcp.c
··· 1506 1506 goto done; 1507 1507 1508 1508 switch (sk->sk_state) { 1509 + case TCP_FIN_WAIT2: 1510 + case TCP_LAST_ACK: 1511 + break; 1509 1512 case TCP_FIN_WAIT1: 1510 1513 case TCP_CLOSE_WAIT: 1511 1514 case TCP_CLOSE: