Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
"Mostly just NVMe, but also a single fixup for BFQ for a regression
that happened during the merge window. In detail:

- NVMe pull requests via Christoph:
- Fix doorbell buffer value endianness (Klaus Jensen)
- Fix Linux vs NVMe page size mismatch (Keith Busch)
- Fix a potential use memory access beyong the allocation limit
(Keith Busch)
- Fix a multipath vs blktrace NULL pointer dereference (Yanjun
Zhang)
- Fix various problems in handling the Command Supported and
Effects log (Christoph Hellwig)
- Don't allow unprivileged passthrough of commands that don't
transfer data but modify logical block content (Christoph
Hellwig)
- Add a features and quirks policy document (Christoph Hellwig)
- Fix some really nasty code that was correct but made smatch
complain (Sagi Grimberg)

- Use-after-free regression in BFQ from this merge window (Yu)"

* tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux:
nvme-auth: fix smatch warning complaints
nvme: consult the CSE log page for unprivileged passthrough
nvme: also return I/O command effects from nvme_command_effects
nvmet: don't defer passthrough commands with trivial effects to the workqueue
nvmet: set the LBCC bit for commands that modify data
nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it
nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
docs, nvme: add a feature and quirk policy document
nvme-pci: update sqsize when adjusting the queue depth
nvme: fix setting the queue depth in nvme_alloc_io_tag_set
block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq
nvme: fix multipath crash caused by flush request when blktrace is enabled
nvme-pci: fix page size checks
nvme-pci: fix mempool alloc size
nvme-pci: fix doorbell buffer value endianness

+185 -58
+1
Documentation/maintainer/maintainer-entry-profile.rst
··· 104 104 ../riscv/patch-acceptance 105 105 ../driver-api/media/maintainer-entry-profile 106 106 ../driver-api/vfio-pci-device-specific-driver-acceptance 107 + ../nvme/feature-and-quirk-policy
+77
Documentation/nvme/feature-and-quirk-policy.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======================================= 4 + Linux NVMe feature and and quirk policy 5 + ======================================= 6 + 7 + This file explains the policy used to decide what is supported by the 8 + Linux NVMe driver and what is not. 9 + 10 + 11 + Introduction 12 + ============ 13 + 14 + NVM Express is an open collection of standards and information. 15 + 16 + The Linux NVMe host driver in drivers/nvme/host/ supports devices 17 + implementing the NVM Express (NVMe) family of specifications, which 18 + currently consists of a number of documents: 19 + 20 + - the NVMe Base specification 21 + - various Command Set specifications (e.g. NVM Command Set) 22 + - various Transport specifications (e.g. PCIe, Fibre Channel, RDMA, TCP) 23 + - the NVMe Management Interface specification 24 + 25 + See https://nvmexpress.org/developers/ for the NVMe specifications. 26 + 27 + 28 + Supported features 29 + ================== 30 + 31 + NVMe is a large suite of specifications, and contains features that are only 32 + useful or suitable for specific use-cases. It is important to note that Linux 33 + does not aim to implement every feature in the specification. Every additional 34 + feature implemented introduces more code, more maintenance and potentially more 35 + bugs. Hence there is an inherent tradeoff between functionality and 36 + maintainability of the NVMe host driver. 37 + 38 + Any feature implemented in the Linux NVMe host driver must support the 39 + following requirements: 40 + 41 + 1. The feature is specified in a release version of an official NVMe 42 + specification, or in a ratified Technical Proposal (TP) that is 43 + available on NVMe website. Or if it is not directly related to the 44 + on-wire protocol, does not contradict any of the NVMe specifications. 45 + 2. Does not conflict with the Linux architecture, nor the design of the 46 + NVMe host driver. 47 + 3. Has a clear, indisputable value-proposition and a wide consensus across 48 + the community. 49 + 50 + Vendor specific extensions are generally not supported in the NVMe host 51 + driver. 52 + 53 + It is strongly recommended to work with the Linux NVMe and block layer 54 + maintainers and get feedback on specification changes that are intended 55 + to be used by the Linux NVMe host driver in order to avoid conflict at a 56 + later stage. 57 + 58 + 59 + Quirks 60 + ====== 61 + 62 + Sometimes implementations of open standards fail to correctly implement parts 63 + of the standards. Linux uses identifier-based quirks to work around such 64 + implementation bugs. The intent of quirks is to deal with widely available 65 + hardware, usually consumer, which Linux users can't use without these quirks. 66 + Typically these implementations are not or only superficially tested with Linux 67 + by the hardware manufacturer. 68 + 69 + The Linux NVMe maintainers decide ad hoc whether to quirk implementations 70 + based on the impact of the problem to Linux users and how it impacts 71 + maintainability of the driver. In general quirks are a last resort, if no 72 + firmware updates or other workarounds are available from the vendor. 73 + 74 + Quirks will not be added to the Linux kernel for hardware that isn't available 75 + on the mass market. Hardware that fails qualification for enterprise Linux 76 + distributions, ChromeOS, Android or other consumers of the Linux kernel 77 + should be fixed before it is shipped instead of relying on Linux quirks.
+1
MAINTAINERS
··· 14916 14916 S: Supported 14917 14917 W: http://git.infradead.org/nvme.git 14918 14918 T: git://git.infradead.org/nvme.git 14919 + F: Documentation/nvme/ 14919 14920 F: drivers/nvme/host/ 14920 14921 F: drivers/nvme/common/ 14921 14922 F: include/linux/nvme*
+1 -1
block/bfq-iosched.c
··· 5317 5317 unsigned long flags; 5318 5318 5319 5319 spin_lock_irqsave(&bfqd->lock, flags); 5320 - bfq_exit_bfqq(bfqd, bfqq); 5321 5320 bic_set_bfqq(bic, NULL, is_sync); 5321 + bfq_exit_bfqq(bfqd, bfqq); 5322 5322 spin_unlock_irqrestore(&bfqd->lock, flags); 5323 5323 } 5324 5324 }
+1 -1
drivers/nvme/host/auth.c
··· 953 953 goto err_free_dhchap_secret; 954 954 955 955 if (!ctrl->opts->dhchap_secret && !ctrl->opts->dhchap_ctrl_secret) 956 - return ret; 956 + return 0; 957 957 958 958 ctrl->dhchap_ctxs = kvcalloc(ctrl_max_dhchaps(ctrl), 959 959 sizeof(*chap), GFP_KERNEL);
+27 -7
drivers/nvme/host/core.c
··· 1074 1074 return 0; 1075 1075 } 1076 1076 1077 + static u32 nvme_known_nvm_effects(u8 opcode) 1078 + { 1079 + switch (opcode) { 1080 + case nvme_cmd_write: 1081 + case nvme_cmd_write_zeroes: 1082 + case nvme_cmd_write_uncor: 1083 + return NVME_CMD_EFFECTS_LBCC; 1084 + default: 1085 + return 0; 1086 + } 1087 + } 1088 + 1077 1089 u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode) 1078 1090 { 1079 1091 u32 effects = 0; ··· 1093 1081 if (ns) { 1094 1082 if (ns->head->effects) 1095 1083 effects = le32_to_cpu(ns->head->effects->iocs[opcode]); 1084 + if (ns->head->ids.csi == NVME_CAP_CSS_NVM) 1085 + effects |= nvme_known_nvm_effects(opcode); 1096 1086 if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC)) 1097 1087 dev_warn_once(ctrl->device, 1098 - "IO command:%02x has unhandled effects:%08x\n", 1088 + "IO command:%02x has unusual effects:%08x\n", 1099 1089 opcode, effects); 1100 - return 0; 1101 - } 1102 1090 1103 - if (ctrl->effects) 1104 - effects = le32_to_cpu(ctrl->effects->acs[opcode]); 1105 - effects |= nvme_known_admin_effects(opcode); 1091 + /* 1092 + * NVME_CMD_EFFECTS_CSE_MASK causes a freeze all I/O queues, 1093 + * which would deadlock when done on an I/O command. Note that 1094 + * We already warn about an unusual effect above. 1095 + */ 1096 + effects &= ~NVME_CMD_EFFECTS_CSE_MASK; 1097 + } else { 1098 + if (ctrl->effects) 1099 + effects = le32_to_cpu(ctrl->effects->acs[opcode]); 1100 + effects |= nvme_known_admin_effects(opcode); 1101 + } 1106 1102 1107 1103 return effects; 1108 1104 } ··· 4946 4926 4947 4927 memset(set, 0, sizeof(*set)); 4948 4928 set->ops = ops; 4949 - set->queue_depth = ctrl->sqsize + 1; 4929 + set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1); 4950 4930 /* 4951 4931 * Some Apple controllers requires tags to be unique across admin and 4952 4932 * the (only) I/O queue, so reserve the first 32 tags of the I/O queue.
+24 -4
drivers/nvme/host/ioctl.c
··· 11 11 static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c, 12 12 fmode_t mode) 13 13 { 14 + u32 effects; 15 + 14 16 if (capable(CAP_SYS_ADMIN)) 15 17 return true; 16 18 ··· 45 43 } 46 44 47 45 /* 48 - * Only allow I/O commands that transfer data to the controller if the 49 - * special file is open for writing, but always allow I/O commands that 50 - * transfer data from the controller. 46 + * Check if the controller provides a Commands Supported and Effects log 47 + * and marks this command as supported. If not reject unprivileged 48 + * passthrough. 51 49 */ 52 - if (nvme_is_write(c)) 50 + effects = nvme_command_effects(ns->ctrl, ns, c->common.opcode); 51 + if (!(effects & NVME_CMD_EFFECTS_CSUPP)) 52 + return false; 53 + 54 + /* 55 + * Don't allow passthrough for command that have intrusive (or unknown) 56 + * effects. 57 + */ 58 + if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC | 59 + NVME_CMD_EFFECTS_UUID_SEL | 60 + NVME_CMD_EFFECTS_SCOPE_MASK)) 61 + return false; 62 + 63 + /* 64 + * Only allow I/O commands that transfer data to the controller or that 65 + * change the logical block contents if the file descriptor is open for 66 + * writing. 67 + */ 68 + if (nvme_is_write(c) || (effects & NVME_CMD_EFFECTS_LBCC)) 53 69 return mode & FMODE_WRITE; 54 70 return true; 55 71 }
+1 -1
drivers/nvme/host/nvme.h
··· 893 893 { 894 894 struct nvme_ns *ns = req->q->queuedata; 895 895 896 - if (req->cmd_flags & REQ_NVME_MPATH) 896 + if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio) 897 897 trace_block_bio_complete(ns->head->disk->queue, req->bio); 898 898 } 899 899
+24 -22
drivers/nvme/host/pci.c
··· 36 36 #define SQ_SIZE(q) ((q)->q_depth << (q)->sqes) 37 37 #define CQ_SIZE(q) ((q)->q_depth * sizeof(struct nvme_completion)) 38 38 39 - #define SGES_PER_PAGE (PAGE_SIZE / sizeof(struct nvme_sgl_desc)) 39 + #define SGES_PER_PAGE (NVME_CTRL_PAGE_SIZE / sizeof(struct nvme_sgl_desc)) 40 40 41 41 /* 42 42 * These can be higher, but we need to ensure that any command doesn't ··· 144 144 mempool_t *iod_mempool; 145 145 146 146 /* shadow doorbell buffer support: */ 147 - u32 *dbbuf_dbs; 147 + __le32 *dbbuf_dbs; 148 148 dma_addr_t dbbuf_dbs_dma_addr; 149 - u32 *dbbuf_eis; 149 + __le32 *dbbuf_eis; 150 150 dma_addr_t dbbuf_eis_dma_addr; 151 151 152 152 /* host memory buffer support: */ ··· 208 208 #define NVMEQ_SQ_CMB 1 209 209 #define NVMEQ_DELETE_ERROR 2 210 210 #define NVMEQ_POLLED 3 211 - u32 *dbbuf_sq_db; 212 - u32 *dbbuf_cq_db; 213 - u32 *dbbuf_sq_ei; 214 - u32 *dbbuf_cq_ei; 211 + __le32 *dbbuf_sq_db; 212 + __le32 *dbbuf_cq_db; 213 + __le32 *dbbuf_sq_ei; 214 + __le32 *dbbuf_cq_ei; 215 215 struct completion delete_done; 216 216 }; 217 217 ··· 343 343 } 344 344 345 345 /* Update dbbuf and return true if an MMIO is required */ 346 - static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, 347 - volatile u32 *dbbuf_ei) 346 + static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db, 347 + volatile __le32 *dbbuf_ei) 348 348 { 349 349 if (dbbuf_db) { 350 - u16 old_value; 350 + u16 old_value, event_idx; 351 351 352 352 /* 353 353 * Ensure that the queue is written before updating ··· 355 355 */ 356 356 wmb(); 357 357 358 - old_value = *dbbuf_db; 359 - *dbbuf_db = value; 358 + old_value = le32_to_cpu(*dbbuf_db); 359 + *dbbuf_db = cpu_to_le32(value); 360 360 361 361 /* 362 362 * Ensure that the doorbell is updated before reading the event ··· 366 366 */ 367 367 mb(); 368 368 369 - if (!nvme_dbbuf_need_event(*dbbuf_ei, value, old_value)) 369 + event_idx = le32_to_cpu(*dbbuf_ei); 370 + if (!nvme_dbbuf_need_event(event_idx, value, old_value)) 370 371 return false; 371 372 } 372 373 ··· 381 380 */ 382 381 static int nvme_pci_npages_prp(void) 383 382 { 384 - unsigned nprps = DIV_ROUND_UP(NVME_MAX_KB_SZ + NVME_CTRL_PAGE_SIZE, 385 - NVME_CTRL_PAGE_SIZE); 386 - return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); 383 + unsigned max_bytes = (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; 384 + unsigned nprps = DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE); 385 + return DIV_ROUND_UP(8 * nprps, NVME_CTRL_PAGE_SIZE - 8); 387 386 } 388 387 389 388 /* ··· 393 392 static int nvme_pci_npages_sgl(void) 394 393 { 395 394 return DIV_ROUND_UP(NVME_MAX_SEGS * sizeof(struct nvme_sgl_desc), 396 - PAGE_SIZE); 395 + NVME_CTRL_PAGE_SIZE); 397 396 } 398 397 399 398 static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, ··· 709 708 sge->length = cpu_to_le32(entries * sizeof(*sge)); 710 709 sge->type = NVME_SGL_FMT_LAST_SEG_DESC << 4; 711 710 } else { 712 - sge->length = cpu_to_le32(PAGE_SIZE); 711 + sge->length = cpu_to_le32(NVME_CTRL_PAGE_SIZE); 713 712 sge->type = NVME_SGL_FMT_SEG_DESC << 4; 714 713 } 715 714 } ··· 2333 2332 if (dev->cmb_use_sqes) { 2334 2333 result = nvme_cmb_qdepth(dev, nr_io_queues, 2335 2334 sizeof(struct nvme_command)); 2336 - if (result > 0) 2335 + if (result > 0) { 2337 2336 dev->q_depth = result; 2338 - else 2337 + dev->ctrl.sqsize = result - 1; 2338 + } else { 2339 2339 dev->cmb_use_sqes = false; 2340 + } 2340 2341 } 2341 2342 2342 2343 do { ··· 2539 2536 2540 2537 dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1, 2541 2538 io_queue_depth); 2542 - dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ 2543 2539 dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap); 2544 2540 dev->dbs = dev->bar + 4096; 2545 2541 ··· 2579 2577 dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n", 2580 2578 dev->q_depth); 2581 2579 } 2582 - 2580 + dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ 2583 2581 2584 2582 nvme_map_cmb(dev); 2585 2583
+20 -15
drivers/nvme/target/admin-cmd.c
··· 164 164 165 165 static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log) 166 166 { 167 - log->acs[nvme_admin_get_log_page] = cpu_to_le32(1 << 0); 168 - log->acs[nvme_admin_identify] = cpu_to_le32(1 << 0); 169 - log->acs[nvme_admin_abort_cmd] = cpu_to_le32(1 << 0); 170 - log->acs[nvme_admin_set_features] = cpu_to_le32(1 << 0); 171 - log->acs[nvme_admin_get_features] = cpu_to_le32(1 << 0); 172 - log->acs[nvme_admin_async_event] = cpu_to_le32(1 << 0); 173 - log->acs[nvme_admin_keep_alive] = cpu_to_le32(1 << 0); 167 + log->acs[nvme_admin_get_log_page] = 168 + log->acs[nvme_admin_identify] = 169 + log->acs[nvme_admin_abort_cmd] = 170 + log->acs[nvme_admin_set_features] = 171 + log->acs[nvme_admin_get_features] = 172 + log->acs[nvme_admin_async_event] = 173 + log->acs[nvme_admin_keep_alive] = 174 + cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); 174 175 175 - log->iocs[nvme_cmd_read] = cpu_to_le32(1 << 0); 176 - log->iocs[nvme_cmd_write] = cpu_to_le32(1 << 0); 177 - log->iocs[nvme_cmd_flush] = cpu_to_le32(1 << 0); 178 - log->iocs[nvme_cmd_dsm] = cpu_to_le32(1 << 0); 179 - log->iocs[nvme_cmd_write_zeroes] = cpu_to_le32(1 << 0); 176 + log->iocs[nvme_cmd_read] = 177 + log->iocs[nvme_cmd_flush] = 178 + log->iocs[nvme_cmd_dsm] = 179 + cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); 180 + log->iocs[nvme_cmd_write] = 181 + log->iocs[nvme_cmd_write_zeroes] = 182 + cpu_to_le32(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC); 180 183 } 181 184 182 185 static void nvmet_get_cmd_effects_zns(struct nvme_effects_log *log) 183 186 { 184 - log->iocs[nvme_cmd_zone_append] = cpu_to_le32(1 << 0); 185 - log->iocs[nvme_cmd_zone_mgmt_send] = cpu_to_le32(1 << 0); 186 - log->iocs[nvme_cmd_zone_mgmt_recv] = cpu_to_le32(1 << 0); 187 + log->iocs[nvme_cmd_zone_append] = 188 + log->iocs[nvme_cmd_zone_mgmt_send] = 189 + cpu_to_le32(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC); 190 + log->iocs[nvme_cmd_zone_mgmt_recv] = 191 + cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); 187 192 } 188 193 189 194 static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
+5 -6
drivers/nvme/target/passthru.c
··· 334 334 } 335 335 336 336 /* 337 - * If there are effects for the command we are about to execute, or 338 - * an end_req function we need to use nvme_execute_passthru_rq() 339 - * synchronously in a work item seeing the end_req function and 340 - * nvme_passthru_end() can't be called in the request done callback 341 - * which is typically in interrupt context. 337 + * If a command needs post-execution fixups, or there are any 338 + * non-trivial effects, make sure to execute the command synchronously 339 + * in a workqueue so that nvme_passthru_end gets called. 342 340 */ 343 341 effects = nvme_command_effects(ctrl, ns, req->cmd->common.opcode); 344 - if (req->p.use_workqueue || effects) { 342 + if (req->p.use_workqueue || 343 + (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))) { 345 344 INIT_WORK(&req->p.work, nvmet_passthru_execute_cmd_work); 346 345 req->p.rq = rq; 347 346 queue_work(nvmet_wq, &req->p.work);
+3 -1
include/linux/nvme.h
··· 7 7 #ifndef _LINUX_NVME_H 8 8 #define _LINUX_NVME_H 9 9 10 + #include <linux/bits.h> 10 11 #include <linux/types.h> 11 12 #include <linux/uuid.h> 12 13 ··· 640 639 NVME_CMD_EFFECTS_NCC = 1 << 2, 641 640 NVME_CMD_EFFECTS_NIC = 1 << 3, 642 641 NVME_CMD_EFFECTS_CCC = 1 << 4, 643 - NVME_CMD_EFFECTS_CSE_MASK = 3 << 16, 642 + NVME_CMD_EFFECTS_CSE_MASK = GENMASK(18, 16), 644 643 NVME_CMD_EFFECTS_UUID_SEL = 1 << 19, 644 + NVME_CMD_EFFECTS_SCOPE_MASK = GENMASK(31, 20), 645 645 }; 646 646 647 647 struct nvme_effects_log {